Machinevision 801F20 Lecturenotes
Machinevision 801F20 Lecturenotes
Lecture Notes
2 Lecture 3: Time to Contact, Focus of Expansion, Direct Motion Vision Methods, Noise Gain 12
2.1 Noise Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Forward and Inverse Problems of Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Scalar Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Vector Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Review from Lecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Two-Pixel Motion Estimation, Vector Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Constant Brightness Assumption, and Motion Equation Derivation . . . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Optical Mouse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Perspective Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Time to Contact (TTC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Lecture 4: Fixed Optical Flow, Optical Mouse, Constant Brightness Assumption, Closed Form Solution 18
3.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Constant Brightness Assumption Review with Generalized Isophotes . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Time to Contact (TTC) Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Increasing Generality of TTC Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Multiscale and TTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Aliasing and Nyquist’s Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2 Applications of TTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Vanishing Points (Perspective Projection) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5.1 Applications of Vanishing Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Calibration Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6.1 Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6.2 Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7 Additional Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7.1 Generalization: Fixed Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7.2 Generalization: Time to Contact (for U = 0, V = 0, ω 6= 0) . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Lecture 5: TTC and FOR Montivision Demos, Vanishing Point, Use of VPs in Camera Calibration 25
4.1 Robust Estimation and Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Line Intersection Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Dealing with Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 Reprojection and Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.4 Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Magnification with TTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Perspective Projection and Vanishing Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Lines in 2D and 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.3 Application: Multilateration (MLAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.4 Application: Understand Orientation of Camera w.r.t. World . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 What if We Can’t use Multiple Orientations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Lecture 6: Photometric Stereo, Noise Gain, Error Amplification, Eigenvalues and Eigenvectors Review 32
5.1 Applications of Linear Algebra to Motion Estimation/Noise Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.1 Application Revisited: Photometric Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Lambertian Objects and Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Surface Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Surface Orientation Isophotes/Reflectance Maps for Lambertian Surfaces . . . . . . . . . . . . . . . . . . 34
5.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1
6 Lecture 7: Gradient Space, Reflectance Map, Image Irradiance Equation, Gnomonic Projection 35
6.1 Surface Orientation Estimation (Cont.) & Reflectance Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.1 Forward and Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.2 Reflectance Map Example: Determining the Surface Normals of a Sphere . . . . . . . . . . . . . . . . . . 36
6.1.3 Computational Photometric Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2 Photometry & Radiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3.1 Thin Lenses - Introduction and Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3.2 Putting it All Together: Image Irradiance from Object Irradiance . . . . . . . . . . . . . . . . . . . . . . . 38
6.3.3 BRDF: Bidirectional Reflectance Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.3.4 Helmholtz Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Lecture 8: Shape from Shading, Special Cases, Lunar Surface, Scanning Electron Microscope, Green’s
Theorem in Photometric Stereo 39
7.1 Review of Photometric and Radiometric Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Ideal Lambertian Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2.1 Foreshortening Effects in Lambertian Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2.2 Example: Distant Lambertian Point Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.3 Hapke Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.3.1 Example Application: Imaging the Lunar Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.3.2 Surface Orientation and Reflectance Maps of Hapke Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.3.3 Rotated (p0 , q 0 ) Coordinate Systems for Brightness Measurements . . . . . . . . . . . . . . . . . . . . . . . 43
7.4 “Thick” & Telecentric Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.4.1 “Thick” Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.4.2 Telecentric Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8 Lecture 9: Shape from Shading, General Case - From First Order Nonlinear PDE to Five ODEs 45
8.1 Example Applications: Transmission and Scanning Electron Microscopes (TEMs and SEMs, respectively) . . . . 46
8.2 Shape from Shading: Needle Diagram to Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.2.1 Derivation with Taylor Series Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.2.2 Derivation with Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.3 Shape with Discrete Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.3.1 “Computational Molecules” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.3.2 Iterative Optimization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.3.3 Reconstructing a Surface From a Single Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9 Lecture 10: Characteristic Strip Expansion, Shape from Shading, Iterative Solutions 51
9.1 Review: Where We Are and Shape From Shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.2.1 Reducing General Form SfS for Hapke Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.2.2 Applying General Form SfS to SEMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9.3 Base Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9.4 Analyzing “Speed” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
9.5 Generating an Initial Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.5.1 Using Edge Points to Autogenerate an Initial Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.5.2 Using Stationary Points to Autogenerate an Initial Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.5.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10 Lecture 11: Edge Detection, Subpixel Position, CORDIC, Line Detection, (US 6,408,109) 58
10.1 Background on Patents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10.2 Patent Case Study: Detecting Sub-Pixel Location of Edges in a Digital Image . . . . . . . . . . . . . . . . . . . . 59
10.2.1 High-Level Overview of Edge Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
10.3 Edges & Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.3.1 Finding a Suitable Brightness Function for Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.3.2 Brightness Gradient Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2
10.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
11 Lecture 12: Blob analysis, Binary Image Processing, Use of Green’s Theorem, Derivative and Integral as
Convolutions 64
11.1 Types of Intellectual Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
11.2 Edge Detection Patent Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.2.1 Finding Edge with Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.2.2 More on “Stencils”/Computational Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
11.2.3 Mixed Partial Derivatives in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
11.2.4 Laplacian Estimators in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
11.2.5 Non-Maximum Suppression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
11.2.6 Plane Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
11.2.7 Bias Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
11.2.8 Edge Transition and Defocusing Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
11.2.9 Multiscale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
11.2.10 Effect on Image Edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
11.2.11 Addressing Quantization of Gradient Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
11.2.12 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
11.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
12 Lecture 13: Object detection, Recognition and Pose Determination, PatQuick (US 7,016,539) 72
12.1 Motivation & Preliminaries for Object Detection/Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 72
12.1.1 “Blob Analysis”/”Binary Image Processing” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.1.2 Binary Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.1.3 Normalized Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.2 Patent 7,016,539: Method for Fast, Robust, Multidimensional Pattern Recognition . . . . . . . . . . . . . . . . . 75
12.2.1 Patent Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
12.2.2 High-level Steps of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
12.2.3 Framework as Programming Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
12.2.4 Other Considerations for this Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
12.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
13 Lecture 14: Inspection in PatQuick, Hough Transform, Homography, Position Determination, Multi-Scale 77
13.1 Review of “PatQuick” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
13.1.1 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
13.1.2 Additional System Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
13.1.3 Another Application of “PatQuick”: Machine Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
13.2 Intro to Homography and Relative Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
13.2.1 How many degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
13.3 Hough Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
13.3.1 Hough Transforms with Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
13.3.2 Hough Transforms with Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
13.3.3 Hough Transforms with Searching for Center Position and Radius . . . . . . . . . . . . . . . . . . . . . . 82
13.4 Sampling/Subsampling/Multiscale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
14 Lecture 15: Alignment, recognition in PatMAx, distance field, filtering and sub-sampling (US 7,065,262) 83
14.1 PatMAx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
14.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
14.1.2 Training PatMAx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
14.1.3 Estimating Other Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
14.1.4 Attraction Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
14.1.5 PatMAx Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
14.1.6 Comparing PatMAx to PatQuick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
14.1.7 Field Generation for PatMAx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
14.2 Finding Distance to Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
14.3 Fast Convolutions Through Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
14.3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
14.3.2 Integration and Differentiation as Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
14.3.3 Sparse Convolution as Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3
14.3.4 Effects on Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
14.3.5 Filtering (For Multiscale): Anti-Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
14.3.6 Extending Filtering to 2D and An Open Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15 Lecture 16: Fast Convolution, Low Pass Filter Approximations, Integral Images, (US 6,457,032) 92
15.1 Sampling and Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15.1.1 Nyquist Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15.1.2 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
15.1.3 How Can We Mitigate Aliasing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.2 Integral Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.2.1 Integral Images in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.2.2 Integral Images in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.3 Fourier Analysis of Block Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15.4 Repeated Block Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
15.4.1 Warping Effects and Numerical Fourier Transforms: FFT and DFT . . . . . . . . . . . . . . . . . . . . . 101
15.5 Impulses and Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
15.5.1 Properties of Delta Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
15.5.2 Combinations of Impulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
15.5.3 Convolution Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
15.5.4 Analog Filtering with Birefringent Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
15.5.5 Derivatives and Integrals as Convolution Operators and FT Pairs . . . . . . . . . . . . . . . . . . . . . . . 104
15.5.6 Interpolation and Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
15.5.7 Rotationally-Symmetric Lowpass Filter in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
15.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
16 Lecture 17: Photogrammetry, Orientation, Axes of Inertia, Symmetry, Absolute, Relative, Interior, and
Exterior Orientation 105
16.1 Photogrammetry Problems: An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.1.1 Absolute Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.1.2 Relative Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.1.3 Exterior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.1.4 Interior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.2 Absolute Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.2.1 Binocular Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
16.2.2 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
16.2.3 Transformations and Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
16.2.4 Procedure - “Method 1” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
16.2.5 Procedure - “Method 2” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
16.2.6 Computing Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
16.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
17 Lecture 18: Rotation and How to Represent it, Unit Quaternions, the Space of Rotations 114
17.1 Euclidean Motion and Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
17.2 Basic Properties of Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
17.2.1 Isomorphism Vectors and Skew-Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
17.3 Representations for Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
17.3.1 Axis and Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
17.3.2 Euler Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
17.3.3 Orthonormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
17.3.4 Exponential Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
17.3.5 Stereography Plus Bilinear Complex Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
17.3.6 Pauli Spin Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
17.3.7 Euler Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
17.4 Desirable Properties of Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
17.5 Problems with Some Rotation Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
17.6 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
17.6.1 Hamilton and Division Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
17.6.2 Hamilton’s Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
17.6.3 Representations of Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4
17.6.4 Representations for Quaternion Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
17.6.5 Properties of 4-Vector Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
17.7 Quaternion Rotation Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
17.7.1 Relation of Quaternion Rotation Operation to Rodrigues Formula . . . . . . . . . . . . . . . . . . . . . . 123
17.8 Applying Quaternion Rotation Operator to Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
17.8.1 Least Squares Approach to Find R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
17.8.2 Quaternion-based Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
17.9 Desirable Properties of Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
17.9.1 Computational Issues for Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
17.9.2 Space of Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
17.10References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
18 Lecture 19: Absolute Orientation in Closed Form, Outliers and Robustness, RANSAC 127
18.1 Review: Absolute Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
18.1.1 Rotation Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
18.1.2 Quaternion Representations: Axis-Angle Representation and Orthonormal Rotation Matrices . . . . . . . 128
18.2 Quaternion Transformations/Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
18.3 Transformations: Incorporating Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
18.3.1 Solving for Scaling Using Least Squares: Asymmetric Case . . . . . . . . . . . . . . . . . . . . . . . . . . 129
18.3.2 Issues with Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
18.3.3 Solving for Scaling Using Least Squares: Symmetric Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
18.4 Solving for Optimal Rotation in Absolute Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
18.4.1 How Many Correspondences Do We Need? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
18.4.2 When do These Approaches Fail? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
18.4.3 What Happens When Points are Coplanar? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
18.4.4 What Happens When Both Coordinate Systems Are Coplanar . . . . . . . . . . . . . . . . . . . . . . . . 134
18.5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
18.6 Sampling Space of Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
18.6.1 Initial Procedure: Sampling from a Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
18.6.2 Improved Approach: Sampling from a Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
18.6.3 Sampling From Spheres Using Regular and Semi-Regular Polyhedra . . . . . . . . . . . . . . . . . . . . . 137
18.6.4 Sampling in 4D: Rotation Quaternions and Products of Quaternions . . . . . . . . . . . . . . . . . . . . . 138
18.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
19 Lecture 20: Space of Rotations, Regular Tessellations, Critical Surfaces in Motion Vision and Binocular
Stereo 139
19.1 Tessellations of Regular Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
19.2 Critical Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
19.3 Relative Orientation and Binocular Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
19.3.1 Binocular Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
19.3.2 How Many Correspondences Do We Need? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
19.3.3 Determining Baseline and Rotation From Correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . 144
19.3.4 Solving Using Weighted Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
19.3.5 Symmetries of Relative Orientation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
19.3.6 When Does This Fail? Critical Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
19.3.7 (Optional) Levenberg-Marquadt and Nonlinear Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 148
19.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
20 Lecture 21: Relative Orientation, Binocular Stereo, Structure from Motion, Quadrics, Camera Calibra-
tion, Reprojection 149
20.1 Interior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
20.1.1 Radial Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
20.1.2 Tangential Distortion and Other Distortion Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
20.2 Tsai’s Calibration Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
20.2.1 Interior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
20.2.2 Exterior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
20.2.3 Combining Interior and Exterior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
20.2.4 “Squaring Up” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
20.2.5 Planar Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5
20.2.6 Aspect Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
20.2.7 Solving for tz and f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
20.2.8 Wrapping it Up: Solving for Principal Point and Radial Distortion . . . . . . . . . . . . . . . . . . . . . . 158
20.2.9 Noise Sensitivity/Noise Gain of Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
21 Lecture 22: Exterior Orientation, Recovering Position and Orientation, Bundle Adjustment, Object
Shape 159
21.1 Exterior Orientation: Recovering Position and Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
21.1.1 Calculating Angles and Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
21.1.2 Finding Attitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
21.2 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
21.3 Recognition in 3D: Extended Gaussian 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
21.3.1 What Kind of Representation Are We Looking For? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
21.3.2 2D Extended Circular Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
21.3.3 Analyzing Gaussian Curvature in 2D Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
21.3.4 Example: Circle of Radius R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
21.3.5 Example: Ellipse in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
21.3.6 3D Extended Gaussian Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
22 Lecture 23: Gaussian Image and Extended Gaussian Image, Solids of Revolution, Direction Histograms,
Regular Polyhedra 170
22.1 Gaussian Curvature and Gaussian Images in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
22.1.1 Gaussian Integral Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
22.1.2 How Do We Use Integral Gaussian Curvature? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
22.1.3 Can We Have Any Distribution of G on the Sphere? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
22.2 Examples of EGI in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
22.2.1 Sphere: EGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
22.2.2 Ellipsoid: EGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
22.3 EGI with Solids of Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
22.4 Gaussian Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
22.4.1 Example: Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
22.4.2 EGI Example Torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
22.4.3 Analyzing the Density Distribution of the Sphere (For Torus) . . . . . . . . . . . . . . . . . . . . . . . . . 178
22.5 How Can We Compute EGI Numerically? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
22.5.1 Direction Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
22.5.2 Desired Properties of Dividing Up the Sphere/Tessellations . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6
24.2.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
24.2.5 Hough Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
24.3 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
24.3.1 Absolute Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
24.3.2 Relative Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
24.3.3 Exterior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
24.3.4 Interior Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
24.4 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
24.4.1 Axis and Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
24.4.2 Orthonormal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
24.4.3 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
24.4.4 Hamilton and Division Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
24.4.5 Properties of 4-Vector Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
24.4.6 Quaternion Rotation Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
24.5 3D Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
24.5.1 Extended Gaussian Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
24.5.2 EGI with Solids of Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
24.5.3 Sampling From Spheres Using Regular and Semi-Regular Polyhedra . . . . . . . . . . . . . . . . . . . . . 198
24.5.4 Desired Properties of Dividing Up the Sphere/Tessellations . . . . . . . . . . . . . . . . . . . . . . . . . . 199
24.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7
6.801/6.866: Machine Vision, Lecture Notes
1 1
r= R (vector form)
f R · ẑ
If we differentiate these perspective projection equations:
1 dx 1 dX X dZ
= − 2
f dt Z dt Z dt
What are these derivatives? They correspond to velocities. Let’s define some of these velocities:
∆ dx
• u= dt
∆ dy
• v= dt
∆ dX
• U= dt
∆ dY
• V = dt
∆ dZ
• W = dt
Now, rewriting the differentiated perspective projection equations with these velocity terms, we first write the equation for the
x component:
1 1 X
u = U − 2W
f Z Z
Similarly, for y:
1 1 Y
v = V − 2W
f Z Z
Why are these equations relevant? They allow us to find parts of the image that don’t exhibit any motion - i.e. stationary
points. Let’s find where U = V = 0. Let the point (x0 , y0 ) correspond to this point. Then:
x0 U y0 V
= , =
f W f W
8
Focus of Expansion (FOE): Point in image space given by (x0 , y0 ). This point is where the 3D motion vector intersects with
the line given by z = f .
Why is FOE useful? If you know FOE, you can derive the direction of motion by drawing a vector from the origin to
FOE.
Additionally, we can rewrite the differentiated perspective projection equations with FOE:
1 x0 − x W 1 y0 − y W
u= (x comp.), v= (y comp.)
f f Z f f Z
Cancelling out the focal length (f ) terms:
W W
u = (x0 − x) (x comp.), v = (y0 − y) (y comp.)
Z Z
A few points here:
• You can draw the vector diagram of the motion field in the image plane.
• All vectors in the motion field expand outward from FOE.
• Recall that perspective projection cannot give us absolute distances.
Z Z
For building intuition, let’s additionally consider what each of these quantities mean. The inverse term W = dZ has units of
dt
meters
meters = seconds - i.e. Time of Impact.
second
Let’s now revisit these equations in vector form, rather than in the component form derived above:
1 dr 1 R d
= − (R · r̂)
f dt R · ẑ (R · r̂)2 dt
Let’s rewrite this with dots for derivatives. Fun fact: The above notation is Leibniz notation, and the following is Newtonian
notation:
1 1 R
ṙ = Ṙ − (Ṙ · ẑ)
f R · ẑ (R · ẑ)2
1 1 1
ṙ = (Ṙ − W r)
f Z f
One way for reasoning about these equations is that motion is magnified by the ratio of the distance terms.
Next, we’ll reintroduce the idea of Focus of Expansion, but this time, for the vector form. FOE in the vector form is
given at the point where ṙ = 0:
1 1
ṙ = Ṙ
f W
We can use a dot product/cross product identity to rewrite the above expression in terms of cross products. The identity is as
follows for any a, b, c ∈ Rn
a × (b × c) = (c · a)b − (a · b)c
Using this identity, we rewrite the expression above to solve for FOE:
1 1
ṙ = (ẑ × (Ṙ × R))
f (R · ẑ)2
What is this expression? This is image motion expressed in terms of world motion. Note the following identities/properties
of this motion, which are helpful for building intuition:
• ṙ · ẑ = 0 =⇒ Image motion is perpendicular to the z-axis. This makes since intuitively because otherwise the image would
be coming out of/going into the image plane.
• ṙ ⊥ ẑ
• Ṙ k R =⇒ ṙ = 0 (this condition results in there being no image motion).
9
1.2 Brightness and Motion
Let’s now consider how brightness and motion are intertwined. Note that for this section, we will frequently be switching between
continuous and discrete. The following substitutions/conversions are made:
• Representations of brightness functions: E(x, y) ↔ E[x, y]
R R P P
• Integrals and Sums: x y ↔ x y
1.2.1 1D Case
dx
= U =⇒ δx = U δ
dt
By taking a linear approximation of the local brightness:
∂E
δE = Ex δx = uEx δt (note here that Ex = )
∂x
Dividing each side by δt, we have:
∂E
Ex ∂t
uEx + Et = 0 =⇒ U = − = − ∂E
Et ∂x
Note that in the continuous domain, the sums in the weighted and unweighted average values are simply replaced with in-
tegrals.
10
1.2.2 2D Case
While these results are great, we must remember that images are in 2D, and not 1D. Let’s look at the 2D case. First and
foremost, let’s look at the brightness function, since it now depends on x, y, and t: E(x, y, t). The relevant partial derivatives
here are thus:
∂E
• ∂x - i.e. how the brightness changes in the x direction.
∂E
• ∂y - i.e. how the brightness changes in the y direction.
∂E
• ∂t - i.e. how the brightness changes w.r.t. time.
As in the previous 1D case, we can approximate these derivatives with finite forward first differences:
∂E 1
• ∂x = Ex ≈ δx (E(x + δx, y, t) − E(x, y, t))
∂E 1
• ∂y = Ey ≈ δy (E(x, y + δy, t) − E(x, y, t))
∂E 1
• ∂t = Et ≈ δt (E(x, y, t + δt) − E(x, y, t))
Furthermore, let’s suppose that x and y are parameterized by time, i.e. x = x(t), y = y(t). Then we can compute the First-Order
Condition (FOC) given by:
dE(x, y, t)
=0
dt
Here, we can invoke the chain rule, and we obtain the result given by:
dE(x, y, t) dx ∂E dy ∂E ∂E
= + + =0
dt dt ∂x dt ∂y ∂t
Rewriting this in terms of u, v from above:
uEx + vEy + Et = 0
Objective here: We have a time-varying sequence of images, and our goal is to find and recover motion.
To build intuition, it is also common to plot in velocity space given by (u, v). For instance, a linear equation in the 2D
world corresponds to a line in velocity space. Rewriting the equation above as a dot product:
Normalizing the equation on the right by the magnitude of the brightness derivative vectors, we obtain the brightness gradient:
!
Ex Ey Et
(u, v) · q ,q = −q
Ex2 + Ey2 Ex2 + Ey2 Ex2 + Ey2
• Measures spatial changes in brightness in the image in the image plane x and y directions.
Isophotes: A curve on an illuminated surface that connects points of equal brightness (source: Wikipedia).
As we saw in the previous case with 1D, we don’t want to just estimate with just one pixel. For multiple pixels, we will
solve a system of N equations and two unknowns:
11
Solving this as a standard Ax = b problem, we have:
U 1 Ey2 −Ey1 −Et1
=
V (Ex1 Ey2 − Ey1 Ex2 ) −Ex2 Ex1 −Et2
1
Note that the expression given by (Ex Ey −E y1 Ex2 )
is the determinant of the partial derivatives matrix, since we are taking its
1 2
inverse (in this case, simply a 2x2 matrix).
When can/does this fail? It’s important to be cognizant of edge cases in which this motion estimation procedure/algo-
rithm fails. Some cases to consider:
• When brightness partial derivatives / brightness gradients are parallel to one another ↔ The determinant goes to zero ↔
E E
This corresponds to linear dependence in the observations. This occurs when Ex1 Ey2 = Ey1 Ex2 =⇒ Exy1 = Exy2 .
1 2
This issue can be mitigated by weighting the pixels as we saw in the 1D case above. However, a more robust solution is to search
for a minima of motion, rather than the point where it has zero motion. The intuition here is that even if we aren’t able to find
a point of zero motion, we can still get as close to zero as possible. Mathematically, let us define the following objective:
Z Z
∆
J(u, v) = (uEx + vEy + Et )2 dxdy
x∈X y∈Y
Since this is an unconstrained optimization problem, we can solve by finding the minimum of the two variables using two
First-Order Conditions (FOCs):
∂J(u,v)
• ∂u =0
∂J(u,v)
• ∂v =0
Here, we have two equations and two unknowns. When can this fail?
• When we have linear independence. This occurs when:
– E = 0 everywhere
– E = constant
– Ex = 0
– Ey = 0
– Ex = Ey
– Ex = kEy
• When E = 0 everywhere (professor’s intuition: “You’re in a mine.”)
• When Ex , Ey = 0 (constant brightness).
• Mathematically, this fails when: x x Ex2 y y Ey2 − ( x y Ex Ey )2 = 0
R R R R R R
When is this approach possible? Only when isophotes are not parallel straight lines - i.e. want isophote curva-
ture/rapid turning of brightness gradients.
Noise Gain: Intuition - if I change a value by this much in the image, how much does this change in the result?
Dilution of Precision:
12
• How far off GPS is w.r.t. your location.
• Important to note that your dilution of precision can vary in different directions - e.g. horizontal precision is oftentimes
greater than vertical position.
When it is possible to express the inverse of a function in closed form or via a matrix/coefficient, we can simply solve the
inverse problem using: x = f −1 (y).
More importantly, to build a robust machine vision system to solve this inverse problem, it is critical that small perturbations in
y = f (x) do not need to large changes in x. Small perturbations need to be taken into account in machine vision problems be-
cause the sensors we use exhibit measurement noise. The concept of noise gain can come in to help deal with this uncertainty.
Consider a perturbation δy that leads to a perturbation δx when we solve the inverse problem. In the limit, as δ ∈ R → 0, then
we arrive at the definition of noise gain:
δx 1 1
noise gain = = 0 = df (x)
(1)
δy f (x)
dx
Like other concepts/techniques we’ve studied so far, let’s understand when this system fails. Below are two cases; we encourage
to consider why they fail from both a mathematical and intuitive perspective (hint: for the mathematical component, look at
the formula above, and for the intuitive component, think about how the effect on x from a small change in y in a curve that is
nearly flat):
• f 0 (x) = 0 (flat curve)
• f 0 (x) ≈ 0 (nearly flat curve)
But how good is this answer/approach? If x changes, how much does b change? We quantify noise gain in this case as follows:
||δb||
“noise gain” → ,δ ∈ R (2)
||δx||
*NOTE: This multidimensional problem is more nuanced because we may be in the presence of an anisotropic (spatially
non-uniform) noise gain - e.g. there could be little noise gain in the x1 direction, but a lot of noise gain in the x2 direction.
13
As in the previous case, let’s analyze when this approach fails. To do this, let’s consider M −1 to help build intuition for
why this fails:
· · ·
1
M −1 = · · · (3)
det|M |
· · ·
Let’s ignore the specific entries of M −1 for now, and focus on the fact that we need to compute the determinant of M . When is
this determinant zero? This will be the case whenever there exists linear dependence in the columns of M. As we saw before,
two cases that can yield to poor performance will be:
• det|M | = 0: This corresponds to a non-invertible matrix, and also causes the noise term to blow up.
• det|M | ≈ 0: Though this matrix may be invertible, it may cause numerical instability in the machine vision system, and
can also cause the noise term to blow up.
Let’s also revisit, just as a refresher, the inverse of a 2 × 2 matrix:
a b
A= (4)
c d
(5)
1 d −b 1 d −b
A−1 = = (6)
detA −c a ad − bc −c a
Now let’s verify that this is indeed the inverse:
1 d −b a b 1 ad − bc −ab + ab 1 0
A−1 A = = = = I2 (7)
ad − bc −c a c d ad − bc cd − cd ad − bc 0 1
Intuition Behind This: As the object/camera moves, the physical properties of the camera do not change and therefore
the total derivative of the brightness w.r.t. time is 0. From the chain rule, we can rewrite this total derivative assumption:
dE
= 0 =⇒ uEx + vEy + Et = 0 (12)
dt
*(Recall this is for when x and y are parameterized w.r.t. time, i.e. x = x(t), y = y(t).)
The above constraint is known as the Brightness Change Constraint Equation (BCCE).
14
2.3.3 Optical Mouse Problem
Recall our motion estimation problem with the optical mouse, in which our objective is no longer to find the point where the
BCCE is strictly zero (since images are frequently corrupted by noise through sensing), but to minimize the LHS of the BCCE,
i.e:
ZZ
∆
min{J(u, v) = (uEx + vEy + Et )2 dxdy} (13)
u,v
We solve the above using unconstrained optimization and by taking a “least-squares” approach (hence why we square the LHS
of the BCCE). Solve by setting the derivatives of the two optimizing variables to zero:
dJ(u, v) dJ(u, v)
= 0, =0 (14)
du dv
What if these quantities are changing in the world w.r.t. time? Take time derivatives:
1 dx 1 dX 1 dZ
= − 2X (16)
f dt Z dt Z dt
Writing these for x and y:
1 1 W X
• x: fu = ZU − Z Z
1 1 W Y
• y: fv = ZV − Z Z
15
The denominator in the derivation of C is the “radial gradient”:
Building Intuition: If we conceptualize 2D images as topographic maps, where brightness is the third dimension of the surface
(and the spatial dimensions x and y comprise the other two dimensions), then the brightness gradient is the direction of steepest
ascent up the brightness surface.
Another note: (x, y) is a radial vector, say in a polar coordinate system. Hence why the above dot product term is coined
the name “radial gradient”. This gradient is typically normalized by its L2 /Euclidean norm to illustrate the multiplication of
the brightness gradients with a radial unit vector):
p x y
g= x2 + y 2 ( p ,p ) · (Ex , Ey ) (24)
2
x +y 2 x + y2
2
This g quantity can be thought of as: “How much brightness variation is in an outward direction from the center of the image?”
For a more robust estimate, let us again employ the philosophy that estimating from more data points is better. We’ll again
take a least-squares approach, and minimize across the entire image using the parameterized velocities we had before. In this
case, since we are solving for inverse Time to Contact, we will minimize the error term over this quantity:
ZZ
∆
min{J(C) = (C(xEx + yEy ) + Et )2 dxdy} (25)
C
Without the presence of measurement noise, the optimal value of C gives us an error of zero, i.e. perfect adherence to the
BCCE. However, as we’ve seen with other cases, this is not the case in practice due to noise corruption. We again will use
unconstrained optimization to solve this problem.
Taking the derivative of the objective J(C) and setting it to zero, we obtain:
ZZ
dJ(C)
= 0 =⇒ 2 (C(xEx + yEy ) + Et )(xEx + yEy )dxdy = 0 (26)
dC
This in turn gives us:
ZZ ZZ
2
C (xEx + yEy ) dxdy + (xEx + yEy )Et dxdy = 0 (27)
RR
1 (xEx + yEy )Et dxdy
= C = − RR (28)
TTC (xEx + yEy )2 dxdy
16
• x-component:
u U XW
= − (29)
f Z f Z
fU W
u= −X (30)
Z Z
u = A − XC (31)
∆ fU ∆ W
Where: A = ,C = (32)
Z Z
• y-component:
v V Y W
= − (33)
f Z f Z
fV W
v= −Y (34)
Z Z
v =B−YC (35)
∆ fV ∆ W
Where: B = ,C = (36)
Z Z
Note that for the world quantities A and B, we also have the following identities (note that the Focus of Expansion (FOE) is
given by the point (x0 , y0 ):
fU
• A= Z = Cx0
fV
• B= Z = Cy0
Building Intuition: “As I approach the wall, it will loom outward and increase in size.”
We can again use least-squares to minimize the following objective enforcing the BCCE. This time, our optimization aim is to
minimize the objective function J(A, B, C) using the quantities A, B, and C:
ZZ
∆
min {J(A, B, C)} = (AEx + BEy + C(xEx + yEy ) + Et )2 dxdy} (38)
A,B,C
Use unconstrained optimzation with calculus and partial derivatives to solve. Since we have three variables to optimize over, we
have three first-order conditions (FOCs):
dJ(A,B,C)
• dA =0
dJ(A,B,C)
• dB =0
dJ(A,B,C)
• dC =0
Using the Chain rule for each of these FOCs, we can derive and rewrite each of these conditions to obtain 3 equations and 3
∆
unknowns. Note that G = xEx + yEy .
• A variable:
ZZ
2 (AEx + BEy + C(xEx + yEy ))Ex = 0 (39)
ZZ ZZ ZZ ZZ
A Ex2 + B Ex Ey + C GEx = − Ex Et (40)
• B variable:
ZZ
2(AEx + BEy + C(xEx + yEy ))Ex = 0 (41)
ZZ ZZ ZZ ZZ
A Ey Ex + B Ey2 + C GEy = − Ey Et (42)
17
• C variable:
ZZ
2 (AEx + BEy + C(xEx + yEy ))Ex = 0 (43)
ZZ ZZ ZZ ZZ
A GEx + B GEy + C G2 = − GEt (44)
As in the time-to-contact problem above, this can again be implemented using accumulators.
Let’s end on a fun fact: Did you know that optical mice have frame rates of 1800 fps?
• Image formation
– Where in the image? Recall perspective projection:
x X y Y
= , =
f Z f Z
Differentiating this expression gives:
u U XW v V Y W
= − , = −
f Z Z Z f Z Z Z
From these, we can find the Focus of Expansion (FOE), or, more intuitively: “The point in the image toward
which you are moving.”
How long until we reach this point? This is given by Time to Contact (TTC):
Z 1
Time to Contact = =
W C
– How bright in the image? For this, let us consider an image solid, where the brightness function is parameterized by
x, y, and t: E(x, y, t).
18
3.1.1 Constant Brightness Assumption Review with Generalized Isophotes
Recall the constant brightness assumption, which says that the total derivative of brightness with respect to time is zero:
dE(x,y,t)
dt = 0. By chain rule we obtain the BCCE:
dx ∂E dy ∂E ∂E
+ + =0
dt ∂x dt ∂y ∂t
∆ dx ∆ dy
Recall our variables: u = dt , v = dt . Then BCCE rewritten in the standard notation we’ve been using:
uEx + yEy + Et = 0
Recall our method of using least-squares regression to solve for optimal values of u, v that minimize the total computed sum
of the LHS of the BCCE over the entire image (note that integrals become discrete in the presence of discretized pixels, and
derivatives become differences):
ZZ
u∗ , v ∗ = arg min (uEx + yEy + Et )dxdy
u,v
(New) Now, to introduce a new variation on this problem, let us suppose we have the following spatial parameterization of
brightness (you’ll see that this brightness function creates linear isophotes) for linear f :
If f is differentiable over the domain, then the spatial derivatives Ex and Ey can be computed as follows, using the chain rule:
• Ex = f 0 (ax + by)a
• Ey = f 0 (ax + by)b
Where f 0 is the derivative of this scalar-valued function (i.e, we can define the input to be z = ax + by, and the derivative f 0 is
therefore equivalent to dfdz
(z)
).
Isophote Example: If E(x, y) = ax + by + c, for a, b, c ∈ R+ , then the isophotes of this brightness function will be lin-
ear.
• TTC = w 1 dZ
Z = Z dt =
d
dt loge (z) , therefore we can simply take the slope of the line corresponding to the logarithm of Z to
compute TTC.
Now, let’s suppose that objects are moving both in the world and in the image. Let’s denote s as our image coordinate and S
as our world coordinate. Then:
s S
=
f Z
Then we can write:
sZ + sf = 0
19
Differentiating:
ds dZ
ds dZ
Z +s = 0 =⇒ dt = dt
dt dt S Z
The above relationship between derivative ratios can be interpreted as: “The change in the image’s size is the same as the change
in distance.”
Another motivating question for developing TTC methods: What if the surface is non-planar? This is a common scenario for
real-world TTC systems. In this case, we have two options:
• Parameterize the geometric models of these equations with polynomials, rather than planes.
• Leave the planar solution, and look for other ways to account for errors between the modeled and true surfaces.
In practice, the second option here actually works better. The first option allows for higher modelling precision, but is less robust
to local optima, and can increase the sensitivity of the parameters we find through least-squares optimization.
If you want to draw an analog to machine learning/statistics, we can think of modeling surfaces with more parameters (e.g.
polynomials rather than planes) as creating a model that will overfit or not generalize well to the data it learns on, and create
a problem with too many unknowns and not enough equations.
Additionally, multiscale is computationally-efficient: Using the infinite geometric series, we can see that downsampling/down-
scaling by a factor of 2 each time and storing all of these smaller image representations requires only 33% more stored data than
the full size image itself:
∞
X 1 1 4 1
(( )2 )n = 1 = =1+
n=0
2 1 − 4
3 3
1
More generally, for any downsampling factor r ∈ N, we only add r 2 −1 × 100% amount of additional data:
∞
X 1 1 r2 (r2 − 1) + 1 1
( 2 )n = 1 = 2 = 2
=1+ 2
n=0
r 1 − r2 r −1 r −1 r −1
(Note that we have r2 rather than r in the denominator because we are downsampling across both the x and y dimensions.)
20
3.3.1 Aliasing and Nyquist’s Sampling Theorem
Though multiscale is great, we also have to be mindful of aliasing. Recall from 6.003 (or another Signals and Systems course)
that aliasing causes overlap and distortion between signals in the frequency domain, and it is required that we sample at a spatial
frequency that is high enough to not produce aliasing artifacts.
Nyquist’s Sampling Theorem states that we must sample at twice the frequency of the highest-varying component of our image
to avoid aliasing and consequently reducing spatial artifacts.
Z dZ 1
dZ
= T =⇒ = Z
dt
dt T
Since the derivative of Z is proportional to Z, the solution to this ODE will be an exponential function in time:
−t
Z(t) = Z0 e T
This method requires that deceleration is not uniform, which is not the most energy efficient approach for solving this problem.
As you can imagine, energy conservation is very important in space missions, so let’s next consider a constant deceleration
2 ∆
approach. Note that under constant deceleration, we have ddt2z = a = 0. Then we can express the first derivative of Z w.r.t. t as:
dZ
= at + v0
dt
Where v0 is an initial velocity determined by the boundary/initial conditions. Here we have the following boundary condition:
dZ
= a(t − t0 )
dt
This boundary condition gives rise to the following solution:
1 2 1
Z= at − at0 t + c = a(t − t0 )2
2 2
Therefore, the TTC for this example becomes:
1
Z 2 a(t− t0 )2 1
T = dZ
= = (t − t0 )
dt
a(t − t0 ) 2
21
3.4 Optical Flow
Motivating question: What if the motion of an image is non-constant, or it doesn’t move together? We have the Brightness
Change Constraint Equation (BCCE), but this only introduces one constraint to solve for two variables, and thus creates
an under-constrained/ill-posed problem.
How can we impose additional constraints? To do this, let us first understand how motion relates across pixels, and
information that they share. Pixels don’t necessarily move exactly together, but they move together in similar patterns, partic-
ularly if pixels are close to one another. We’ll revisit this point in later lectures.
What Else Can We Do? One solution is to divide the images into equal-sized patches and apply the Fixed Flow Paradigm,
as we’ve done with entire images before. When selecting patch size, one trade-off to be mindful of is that the smaller the patch,
the more uniform the brightness patterns will be across the patch, and patches may be too uniform to detect motion (note: this
is equivalent to the matrix determinants we’ve been looking at evaluating to zero/near-zero).
To build intuition, let’s consider what happens when we travel far along the lines (i.e. as s gets very large) in our parametric
definition of lines:
x 1 αs α
• limx→∞ f = limx→∞ Z0 +γs (x0 + αs) = γs = γ (x-coordinate)
y 1 βs β
• limy→∞ f = limy→∞ Z0 +γs (x0 + βs) = γs = γ (x-coordinate)
The 2D point ( αγ , βγ ) is the vanishing point in the image plane. As we move along the line in the world, we approach this point
in the image, but we will never reach it. More generally, we claim that parallel lines in the world have the same vanishing
point in the image.
22
3.6 Calibration Objects
Let’s discuss two calibration objects: spheres and cubes:
3.6.1 Spheres
:
• If image projection is directly overhead/straight-on, the projection from the world sphere to the image plane is a circle. If
it is not overhead/straight on, it is elliptic.
3.6.2 Cube
:
• Harder to manufacture, but generally a better calibration object than a sphere.
• Cubes can be used for detecting edges, which in turn can be used to find vanishing points (since edges are lines in the
world).
• Cubes have three sets of four parallel lines/edges each, and each of these sets of lines are orthogonal to the others. This
implies that we will have three vanishing points - one for each set of parallel lines.
• For each of these sets of lines, we can pick a line that goes through the Center of Projection (COP), denoted p ∈ R3 (in
the world plane). We can then project the COP onto the image plane (and therefore now p ∈ R2 ).
• Let us denote the vanishing points of the cube in the image plane as a, b, c ∈ R2 . Then, because of orthogonality between
the different sets of lines, we have the following relations between our three vanishing points and p:
– (p − a) · (p − b) = 0
– (p − b) · (p − c) = 0
– (p − c) · (p − a) = 0
In other words, the difference vectors between p and the vanishing points are all at right angles to each other.
To find p, we have three equations and three unknowns. We have terms that are quadratic in p. Using Bézout’s
Theorem (The maximum number of solutions is the product of the polynomial order of each equation in the system of
equations), we have (2)3 = 8 possible solutions for our system of equations. More generally:
E
Y
number of solutions = oe
e=1
Whhere E is the number of equations and oe is the polynomial order of the eth equation in the system.
This is too many equations to work with, but we can subtract these equations from one another and create a system
of 3 linearly dependent equations. Or, even better, we can leave one equation in its quadratic form, and 2 in their linear
form, and this maintains linear independence of this system of equations:
– (a − p) · (c − b) = 0
– (b − p) · (a − c) = 0
– (p − c) · (p − a) = 0
23
3.7.1 Generalization: Fixed Flow
The motivating example for this generalization is a rotating optimal mouse. We’ll see that instead of just solving for our two
velocity parameters u and v, we’ll also need to solve for our rotational velocity, ω.
• v = v0 + wx
Note that we can also write the radial vector of x and y, as well as the angle in this 2D plane to show how this connects to
rotation:
p
r = (x, y) = x2 + y 2
θ = arctan 2(y, x)
dθ
ω=
dt
With this rotation variable, we leverage the same least-squares approach as before over the entirety of the image, but now we
also optimize over the variable for ω:
ZZ
∆
u∗0 , v0∗ , ω ∗ = arg min {J(u0 , v0 , ω) = (u0 Ex + v0 Ey + wH + Et )2 dxdy}
u0 ,v0 ,ω
Like the other least-squares optimization problems we’ve encountered before, this problem can be solved by solving a system of
first-order conditions (FOCs):
dJ(u0 ,v0 ,ω)
• du0 =0
dJ(u0 ,v0 ,ω)
• dv0 =0
dJ(u0 ,v0 ,ω)
• dω =0
Recall the following derivations for the image coordinate velocities u and v, which help us relate image motion in 2D to world
motion in 3D:
u U X W
• f = Z − Z Z
v V Y W
• f = Z − Z Z
Some additional terms that are helpful when discussing these topics:
• Motion Field: Projection of 3D motion onto the 2D image plane.
• Optical Flow:
– What we can sense
– Describes motion in the image
We can transform this into image coordinates:
1 1
u= (f U − xw), v = (f V − yw)
2 2
24
Let’s take U = V = 0, u = −X w W
Z , v = −Y Z . Z (world coordinates) is not constant, so we can rewrite this quantity by
substituting the image coordinates in for our expression for Z:
Z = Z0 + px + qy
X Y
= Z0 + p Z + q Z
f f
Now, we can isolate Z and solve for its closed form:
X Y
Z(1 − p − q ) = Z0
f f
Z0
Z=
1 − pX Y
f −qf
From this we can conclude that Z1 is linear in x and y (the image coordinates, not the world coordinates). This is helpful for
methods that operate on finding solutions to linear systems. If we now apply this to the BCCE given by uEx + vEy + Et = 0,
we can first express each of the velocities in terms of this derived expression for Z:
1
• u= Z0 (1 − pX Y
f − q f )(−xω)
1
• v= Z0 (1 − pX Y
f − q f )(−yω)
∆
• P = −p Zw0
∆
• Q = −q Zw0
Using these definitions, the BCCE with this paramterization becomes:
0 = (R + P x + Qy)G − Et
Now, we can again take our standard approach of solving these kinds of problems by applying least-squares to estimate the free
variables P, Q, R over the entire continuous or discrete image space. Like other cases, this fails when the determinant of the
system involving these equations is zero.
4 Lecture 5: TTC and FOR Montivision Demos, Vanishing Point, Use of VPs
in Camera Calibration
In this lecture, we’ll continue going over vanishing points in machine vision, as well as introduce how we can use brightness
estimates to obtain estimates of a surface’s orientation. We’ll introduce this idea with Lambertian surfaces, but we can discuss
how this can generalize to many other types of surfaces as well.
25
4.1 Robust Estimation and Sampling
We’ll start by covering some of the topics discussed during lecture.
A good choice of algorithm for dealing with outliers is RANSAC, or Random Sample Consensus [1]. This algorithm
is essentially an iterative and stochastic variation of Least-Squares. By randomly selecting points from an existing dataset to fit
lines and evaluate the fit, we can iteratively find line fits that minimize the least-squares error while distinguishing inliers from
outliers.
Rr = r0
r = RT r
Where r is the vector from the Center of Projection (COP) to the image plane.
4.1.4 Resampling
Resampling is also a valuable application in many facets of computer vision and robotics, especially if we seek to run any kind
of interpolation or subsampling algorithms. Some approaches for this:
• Nearest Neighbor: This is a class of methods in which we interpolate based off of the values of neighboring points. This
can be done spatially (e.g. by looking at adjacent pixels) as well as other image properties such as brightness and color. A
common algorithm used here is K-Nearest Neighbors (KNN), in which interpolation is done based off of the K-nearest
points in the desired space.
• Bilinear Interpolation: An extension of linear interpolation used for functions in two-dimensional grids/of two variables
(e.g. (x, y) or (i, j))) [2], such as the brightness or motion in images.
26
• Bicubic Interpolation: Similar to bilinear interpolation, bicubic interpolation is an extension of cubic interpolation
of functions in two-dimensional grids/of two variables (e.g. (x, y) or (i, j))) [3], such as the brightness or motion in
images. Bicubic interpolation tends to perform much better than bilinear interpolation, though at the cost of additional
computational resources.
• Recall from the previous lectures that the percent change of size in the world is the percent change of size in the image.
We can derive this through perspective projection. The equation for this is:
s S
=
f Z
Where s is the size in the image plane and S is the size in the world. Differentiating with respect to time gives us (using
the chain rule):
d ds dZ
(sZ = f S) → Z +s =0
dt dt dt
Rearranging terms:
dz ds
dt
f = − dt
Z S
Recall that the intuition here is that the rate of change of size is the same in the image and the world.
• Principle Point: The orthogonal projection of the Center of Projection (COP) onto the image plane.
• f : Similar to the focal length we’ve seen in other perspective projection examples, this f is the perpendicular distance from
the COP to the image plane.
Recall that a common problem we can solve with the use of vanishing points is finding the Center of Projection (COP).
Solving this problem in 3D has 3 degrees of freedom, so consequently we’ll try to solve it using three equations.
Intuitively, this problem of finding the Center of Projection can be thought of as finding the intersection of three spheres,
each of which have two vanishing points along their diameters. Note that three spheres can intersect in up to two places - in this
case we have defined the physically-feasible solution, and therefore the solution of interest, to be the solution above the image
plane (the other solution will be a mirror image of this solution and can be found below the image plane).
Application of this problem: This problem comes up frequently in photogrammetry, in that simply having two locations as
your vanishing points isn’t enough to uniquely identify your location on a sphere.
27
4.2.2 Lines in 2D and 3D
Next, let’s briefly review how we can parameterize lines in 2D and 3D:
• 2D: (2 Degrees of Freedom)
– y = mx + c
– ax + by + c = 0
– sin θx − cos θy + ρ = 0
∆
– If n̂ = (− sin θ, cos θ)T , then n̂ · r = ρ.
Like the other problems we’ve looked at, this problem can be solved by finding the intersection of 3 spheres. Let’s begin
with:
Next, let’s square both sides of this equation and rewrite the left-hand side with dot products:
Recall from Bezout’s Theorem that this means that are 8 possible solutions here, since we have three equations of second-order
polynomials. To get rid of the 2nd order terms, we simply subtract the equations:
r · r − 2r · ri + ri · ri = ρ2i
− r · r − 2r · rj + rj · rj = ρ2j ∀ i, j ∈ {1, 2, 3}, i 6= j
∆
(Where the scalar Rj2 = rj · rj .)
Putting these equations together, this is equivalent to finding the intersection of three different spheres:
(r2 − r1 )T
2
(ρ2 − ρ21 ) − (R22 − R12 )
1
r3 − r2 )T r = (ρ23 − ρ22 ) − (R32 − R22 )
2
r1 − r3 )T (ρ21 − ρ23 ) − (R12 − R32 )
However, even though we’ve eliminated the second-order terms from these three equations, we still have two solutions. Recall
from linear algebra equations don’t have a unique solution when there is redundancy or linear dependence between the equations.
If we add up the rows on the right-hand side of the previous equation, we get 0, which indicates that the matrix on the left-hand
side is singular:
(r2 − r1 )T
∆
A = r3 − r2 )T ∈ R3×3
r1 − r3 )T
28
To solve this linear dependence problem, we again use Bezout’s Theorem and keep one of the second-order equations:
(r − r1 ) · (r − r2 ) = 0
(r − r2 ) · (r − r3 ) = 0
(r − r2 ) · (r − r2 ) = 0 → (r − r2 ) ⊥ (r3 − r1 )
I.e. the plane passes through r2 - this intersecting point is the solution and is known as the orthocenter or the principal
point. Now, all we need is to find the quantity f to find the Center of Projection.
Next, note the following relations between the vanishing points in the inverse plane and ẑ, which lies perpendicular to the
image plane:
r1 · ẑ = 0
r2 · ẑ = 0
r3 · ẑ = 0
What else is this useful for? Here are some other applications:
• Camera calibration (this was the example above)
• Detecting image cropping - e.g. if you have been cropped out of an image
• Photogrammetry - e.g. by verifying if the explorer who claimed he was the first to make it “all the way” to the North Pole
actually did (fun fact: he didn’t).
Next, now that we’ve determined the principal point, what can we say about f (the “focal length”)?
For this, let us consider the 3D simplex, which is triangular surface in R3 given by the unit vectors:
∆ T
e1 = 1 0 0
∆ T
e2 = 0 1 0
∆ T
e3 = 0 0 1
√
Using this 3D simplex, let us suppose that each side of the triangles formed
√ by this√ simplex take length v = 2, which is
consistent with the l2 norm of the triangles spanned by the unit simplex ( 12 + 12 = 2).
Next, we solve for f by finding the value of a such that the dot product between the unit vector perpendicular to the sim-
plex and a vector of all a is equal to 1:
T h 1 iT
√1 √1
a a a √ =1
3 3 3
3a
√ =1
3
√
3 1
a= =√
3 3
This value of a = √13 = f . Then we can relate the lengths of the sides v (which correspond to the magnitudes of the vectors
between the principal point and the vanishing points (||r − ri ||2 )) and f :
√ 1 v
v= 2, f = √ =⇒ f = √
3 6
With this, we’ve now computed both the principal point and the “focal length” f for camera calibration.
29
(Note that the c superscript refers to the camera coordinate system.) If the location of the Center of Projection (COP) is given
above the image plane as the vector pc and the vanishing points are given as vectors rc1 , rc2 , and rc3 (note that these vanishing
points must be in the same frame of reference in order for this computation to carry out), then we can derive expressions for the
unit vectors through the following relations:
pc − rc1
(pc − r1 )// x̂c =⇒ x̂c =
||pc − rc1 ||2
pc − rc2
(pc − r2 )// ŷc =⇒ ŷc =
||pc − rc2 ||2
pc − rc3
(pc − r3 )//ẑc =⇒ ẑc =
||pc − rc3 ||2
Then, after deriving the relative transformation between the world frame (typically denoted w in robotics) and the camera frame
(typically denoted c in robotics), we can express the principal point/Center of Projection in the camera coordinate frame:
Where (α, β, γ) are the coordinates in the object coordinate system (since x̂c , ŷc , and ẑc comprise the orthogonal basis of this
coordinate system). Then we can express the relative coordinates of objects in this coordinate system:
T
r0 = α
β γ
4.3 Brightness
We’ll now switch to the topic of brightness, and how we can use it for surface estimation. Recall that for a Lambertian surface
(which we’ll assume we use here for now), the power received by photoreceptors (such as a human eye or an array of photodiodes)
depends both on the power emitted by the source, but also the angle between the light source and the object.
This is relevant for foreshortening (the visual effect or optical illusion that causes an object or distance to appear shorter
than it actually is because it is angled toward the viewer [6]): the perceived area of an object is the true area times the cosine
of the angle between the light source and the object/viewer. Definition: Lambertian Object: An object that appears equally
bright from all viewing directions and reflects all incident light, absorbing none [5].
Let’s look at a simple case: a Lambertian surface. If we have the brightness observed and we have this modeled as:
E = n̂ · s,
Where s is the vector between the light source and the object, then can we use this information to recover the surface orientation
given by n̂. This unit vector surface orientation has degrees of freedom, since it is a vector in the plane.
30
It is hard to estimate this when just getting a single measurement for brightness. But what if we test different lighting conditions?:
E1 = n̂ · s1
E2 = n̂ · s2
n̂ · n̂ = ||n̂||2 = 1
This is equivalent, intuitively, to finding the intersection between two cones for which we have different angles, which forms a
planar curve. We then intersect this planar curve with the unit sphere corresponding to the constraint ||n̂||2 = 1. By Bezout’s
Theorem, this will produce two solutions.
One problem with this approach, however, is that these equations are not linear. We can use the presence of reflectance
to help us solve this problem. Let us define the following:
Definition: Albedo: This is the ratio of power out of an object divided by power into an object:
∆ ∆ power in
“albedo” = ρ = ∈ [0, 1]
power out
Though the definition varies across different domains, in this class, we define albedo to be for a specific orientation, i.e. a specific
si .
Fun fact: Although an albedo greater than 1 technically violates the 2nd Law of Thermodynamics, superluminous sur-
faces such as those sprayed with flourescent paint can exhibit ρ > 1.
Using this albedo term, we can now solve our problem of having nonlinearity in our equations. Note that below we use
three different measurements this time, rather than just two:
E1 = ρn̂ · s1
E2 = ρn̂ · s2
E3 = ρn̂ · s3
This creates a system of 3 unknowns and 3 Degrees of Freedom. We also add the following constraints:
n = ρn̂
n
n̂ =
||n||2
Combining these equations and constraints, we can rewrite the above in matrix/vector form:
−ŝT1 −ŝT1
E1 E1
T −1 ∆
−ŝ2 n = E2 =⇒ n = S E2 (Where S = −ŝT2 .)
T T
−ŝ3 E3 E3 −ŝ3
A quick note on the solution above: like other linear algebra-based approaches we’ve investigated so far, the matrix S above
isn’t necessarily invertible. This matrix is not invertible when light sources are in a coplanar orientation relative to the object.
If this is the case, then the matrix S becomes singular/linearly dependent, and therefore non-invertible.
31
• The object may not be uniformly-colored (which, practically, is quite often the case).
However, despite the drawbacks, this approach enables us to recover the surface orientation of an object from a single RGB
monocular camera.
A final note: we originally assumed this object was Lambertian, and because of this used the relation that an object’s per-
ceived area is its true area scaled by the cosine of the angle between the viewer/light source and the object, Does this apply for
real surfaces? No, because many are not ideal Lambertian surfaces. However, in practice, we can just use a lookup table that
can be calibrated using real images.
4.4 References
1. RANSAC Algorithm: http://www.cse.yorku.ca/ kosta/CompVis Notes/ransac.pdf
For the sake of brevity, the notes from the linear algebra review will not be given here, but they are in the handwritten
notes for this lecture, as well as in the linear algebra handout posted above. These notes cover:
• Definitions of eigenvalues/eigenvectors
• Characteristic Polynomials
• Rewriting vectors using an eigenbasis
32
Recall that for such a system, suppose we have three brightness measurements of the form:
1. E1 = ρn̂ · ŝ1 - Intuitively, look at pixel with light source located at s1 and take measurement E1 .
2. E2 = ρn̂ · ŝ2 - Intuitively, look at pixel with light source located at s2 and take measurement E2 .
3. E3 = ρn̂ · ŝ3 - Intuitively, look at pixel with light source located at s3 and take measurement E3 .
Let’s review the linear algebraic system of equations by combining the equations above into a matrix-vector product.
−sT1
E1
T
−s2 n = E2
−sT3 E3
• n = ρn̂, i.e. we do not need to deal with the second order constraint n̂T n̂ = 1. This eliminates the second-order constraint
from our set of equations and ensures we are able to derive a unique solutions by solving a system of only linear equations.
−sT1
∆
• Define S = −sT2
T
−s3
Like many other linear system of equations we encounter of the form Ax = b, we typically want to solve for x. In this case, we
want to solve for n, which provides information about the surface orientation of our object of interest:
Sn = E −→ n = S−1 E
Like the other “inverse” problems we have solved so far in this class, we need to determine when this problem is ill-posed, i.e.
when S is not invertible. Recall from linear algebra that this occurs when the columns of S are not linearly independent/the
rank of S is not full.
An example of when this problem is ill-posed is when the light source vectors are coplanar with one another, i.e.
s1 + s2 + s3 = 0 (Note that this can be verified by simply computing the vector sum of any three coplanar vectors.)
Therefore, for optimal performance and to guarantee that S is invertible:
• We make the vectors from the light sources to the objects (si ) as non-coplanar as possible.
• The best configuration for this problem is to have the vectors from the light sources to the surface be orthogonal to one
another. Intuitively, consider that this configuration captures the most variation in the angle between a given surface
normal and three light sources.
Example: Determining Depth of Craters on the Moon:
As it turns out, we cannot simply image the moon directly, since the plane between the earth/moon and the sun makes it
such that all our vectors si will lie in the same plane/are coplanar. Thus, we cannot observe the moon’s tomography from the
surface of the earth.
“Lambert’s Law”:
Ei ∝ cos θi = n̂ · ŝi
To better understand this law, let us talk more about surface orientation.
33
5.2.1 Surface Orientation
• For an extended surface, we can take small patches of the surface that are “approximately planar”, which results in us
constraining the surface to be a 2-manifold [1] (technically, the language here is “locally planar”).
• Using these “approximately planar” patches of the surface results in many unit normal vectors n̂i pointing in a variety of
different directions for different patches in the surface.
• We can also understand surface orientation from a calculus/Taylor Series point of view. Consider z(x + δx, y + δy), i.e. an
estimate for the surface height at (x, y) perturbed slightly by a small value δ. The Taylor Series expansion of this is given
by:
∂z ∂z
z(x + δx, y + δy) = z(x, y) + δx + δy + · · · = z(x, y) + pδx + qδy
∂x ∂y
∆ ∂z ∆ ∂z
Where p = ,q =
∂x ∂y
∆ ∂z ∆ ∂z
• The surface orientation (p = ∂x , q = ∂y ) captures the gradient of the height of the surface z.
• Note that the surface unit normal n̂ is perpendicular to any two lines in the surface, e.g. tangent lines.
We can use the cross product of the two following tangent lines of this surface to compute the the unit surface normal, which
describes the orientation of the surface:
1. (δx, 0, pδx)T = δx(1, 0, p)T
2. (0, δy, qδx)T = δy(0, 1, q)T
Since we seek to find the unit surface normal, we can remove the scaling factors out front (δx and δy). Then the cross product
of these two vectors is:
x̂ ŷ ẑ −p
(1, 0, p)T × (0, 1, p)T = det 1 0 p = n = −q
0 1 q 1
We can now compute the unit surface normal by normalizing the vector n:
T
n −p −q 1
n̂ = = p
||n||2 p2 + q 2 + 1
With this surface normal computed, we can also go the opposite way to extract the individual components of the surface
orientation (p and q):
−n · x̂ −n · ŷ
p= , q=
n · ẑ n · ẑ
Note: Do the above equations look similar to something we have encountered before? Do they look at all like the coordinate-wise
perspective projection equations?
We will leverage these (p, q)-space plots a lot, using reflectance maps (which we will discuss later). Now, instead of plot-
ting in velocity space, we are plotting in surface orientation/spatial derivative space. Individual points in this plane correspond
to different surface orientations of the surface.
If we expand our light source vector ŝ, we get the following for this dot product:
T T
−p −q 1 −ps −qs 1
n̂ · ŝ = p · p = E1
p2 + q 2 + 1 p2s + qs2 + 1
34
Carrying out this dot product and squaring each side, we can derive a parametric form of these isophotes in (p, q) space (note
that the light source orientation (ps , qs ) is fixed (at least for a single measurement), and thus we can treat it as constant:
If we plot these isophotes in (p, q) space, we will see they become conic sections. Different isophotes will generate different
curves. When we have multiple light sources, we can have intersections between these isophotes, which indicate solutions.
5.3 References
[1] Manifolds, https://en.wikipedia.org/wiki/Manifold.
We can generate plots of these isophotes in surface orientation/(p, q) space. This is helpful in computer graphics, for
instance, because it allows us to find not only surface heights, but also the local spatial orientation of the surface. In
practice, we can use lookup tables for these reflectance maps to find different orientations based off of a measured brightness.
In most settings, the second problem is of stronger interest, namely because it is oftentimes much easier to measure brightness
than to directly measure surface orientation. To measure the surface orientation, we need many pieces of information about the
local geometries of the solid, at perhaps many scales, and in practice this is hard to implement. But it is straightforward to
simply find different ways to illuminate the object!
• We can use the reflectance maps to find intersections (either curves or points) between isophotes from different light source
orientations. The orientations are therefore defined by the interesections of these isophote curves.
• Intuitively, the reflectance map answers the question “How bright will a given orientation be?”
35
• We can use an automated lookup table to help us solve this inverse problem, but depending on the discretization level of
the reflectance map, this can become tedious. Instead, wwe can use a calibration object, such as a sphere, instead.
• For a given pixel, we can take pictures with different lighting conditions and repeatedly take images to infer the surface
orientation from the sets of brightness measurements.
• This problem is known as photometric stereo: Photometric stereo is a technique in computer vision for estimating the
surface normals of objects by observing that object under different lighting conditions [1].
y−y0
• a= z−z0
(Do the above equations carry a form similar to that of perspective projection?)
While this approach has many advantages, it also has some drawbacks that we need to be mindful of:
• Discretization levels often need to be made coarse, since the number of (p, q) voxel cells scales as a cubic in 3D, i.e.
f (d) ∈ Ospace (d3 ), where d is the discretization level.
• Not all voxels may be filled in - i.e. mapping (let us denote this as φ) from 3D to 2D means that some kind of surface will
be created, and therefore the (E1 , E2 , E3 ) space may be sparse.
One way we can solve this latter problem is to reincorporate albedo ρ! Recall that we leveraged albedo to transform a system
with a quadratic constraint into a system of all linear equations. Now, we have a mapping from 3D space to 3D space:
It is worth noting, however, that even with this approach, that there will still exist brightness triples (E1 , E2 , E3 ) that do not
map to any (p, q, ρ) triples. This could happen, for instance, when we have extreme brightness from measurement noise.
Alternatively, an easy way to ensure that this map φ is injective is to use the reverse map (we can denote this by ψ):
36
• Irradiance: This quantity concerns with power emitted/reflected by the object when light from a light source is incident
upon it. Note that for many objects/mediums, the radiation reflection is not isotropic - i.e. it is not emitted equally in
different directions.
δP
Irradiance is given by: E = δA (Power/unit area, W/m2 )
• Intensity: Intensity can be thought of as the power per unit angle. In this case, we make use of solid angles, which have
units of Steradians. Solid angles are a measure of the amount of the field of view from some particular point that a given
object covers [4]. We can compute a solid angle Ω using: Ω = Asurface
R2 .
δP
Intensity is given by: Intensity = δΩ .
Radiance: Oftentimes, we are only interested in the power reflected from an object that actually reaches the light
sensor. For this, we have radiance, which we contrast with irradiance defined above. We can use radiance and irradiance
to contrast the “brightness” of a scene and the “brightness” of the scene perceived in the image plane.
δ2 P
Radiance is given by: L = δAδΩ
These quantities provide motivation for our next section: lenses. Until now, we have considered only pinhole cameras, which are
infinitisimally small, but to achieve nonzero irradiance in the image plane, we need to have lenses of finite size.
6.3 Lenses
Lenses are used to achieve the same photometric properties of perspective projection as pinhole cameras, but do so using a finite
“pinhole” size and thus by encountering a finite number of photons from light sources in scenes. One caveat with lenses that we
will cover in greater detail: lenses only work for a finite focal length f . From Gauss, we have that it is impossible to make a
perfect lens, but we can come close.
– Barrel distortion
– Mustache distortion
– Pincushion distortion
• Lens Defects: These occur frequently when manufacturing lenses, and can originate from a multitude of different issues.
37
6.3.2 Putting it All Together: Image Irradiance from Object Irradiance
Here we show how we can relate object irradiance (amount of radiation emitted from an object) to image plane irradiance of
that object (the amount of radiation emitted from an object that is incident on an image plane). To understand how these two
quantities relate to each other, it is best to do so by thinking of this as a ratio of “Power in / Power out”. We can compute this
ratio by matching solid angles:
δI cos α δO cos θ δI cos α δO δO cos α z 2
= =⇒ = =⇒ =
(f sec α)2 (z sec α)2 f2 z2 δI cos θ f
The equality on the far right-hand side can be interpreted as a unit ratio of “power out” to “power in”. Next, we’ll compute the
subtended solid angle Ω:
π 2
4d cos α π d 2
Ω= = cos3 α
(z sec α)2 4 z
πd 2
δP = LδOΩ cos θ = LδO ) cos3 α cos θ
4 z
We can relate this quantity to the irradiance brightness in the image:
δP δO π d 2
E= =L cos3 α cos θ
δI δI 4 z
π d 2
= L cos4 α
4 f
A few notes about this above expression:
• The quantity on the left, E is the irradiance brightness in the image.
• The quantity L on the righthand side is the radiance brightness of the world.
• The ratio fd is the inverse of the so-called “f-stop”, which describes how open the aperture is. Since these terms are squared,
√
they typically come in multiples of 2.
• The reason why we can be sloppy about the word brightness (i.e. radiance vs. irradiance) is because the two quantities
are proportional to one another: E ∝ L.
• When does the cos4 α term matter in the above equation? This term becomes non-negligible when we go off-axis from
the optical axis - e.g. when we have a wide-angle axis. Part of the magic of DSLR lenses is to compensate for this effect.
• Radiance in the world is determined by illumination, orientation, and reflectance.
Let’s think about this ratio for a second. Please take note of the following as a mnemonic to remember which angles go where:
• The emitted angles (between the object and the camera/viewer) are parameters for radiance (L), which makes sense
because radiance is measured in the image plane.
• The incident angles (between the light source and the object) are parameters for irradiance (E), which makes sense because
irradiance is measured in the scene with the object.
38
Practically, how do we measure this?
f (θi , θe , φi , φe ) = f (θe , θi , φe , φi ) ∀ θi , θe , φi , φe
In other words, if the ray were reversed, and the incident and emitted angles were consequently switched, we would obtain the
same BRDF value. A few concluding notes/comments on this topic:
• If there is not Helmholtz reciprocity/symmetry between a ray and its inverse in a BRDF, then there must be energy
transfer. This is consistent with the 2nd Law of Thermodynamics.
• Helmholtz reciprocity has good computational implications - since you have symmetry in your (possibly) 4D table across
incident and emitted angles, you only need to collect and populate half of the entries in this table. Effectively, this is a
tensor that is symmetric across some of its axes.
6.4 References
1. Photometric Stereo, https://en.wikipedia.org/wiki/Photometric stereo
2. Voxels, https://whatis.techtarget.com/definition/voxel
3. Photometry, https://en.wikipedia.org/wiki/Photometry (optics)
39
7.1 Review of Photometric and Radiometric Concepts
Let us begin by reviewing a few key concepts from photometry and radiometry:
• Photometry: Photometry is the science of measuring visible radiation, light, in units that are weighted according to the
sensitivity of the human eye. It is a quantitative science based on a statistical model of the human visual perception of
light (eye sensitivity curve) under carefully controlled conditions [3].
• Radiometry: Radiometry is the science of measuring radiation energy in any portion of the electromagnetic spectrum.
In practice, the term is usually limited to the measurement of ultraviolet (UV), visible (VIS), and infrared (IR) radiation
using optical instruments [3].
∆
• Irradiance: E = δP 2
δA (W/m ). This corresponds to light falling on a surface. When imaging an object, irradiance is
converted to a grey level.
∆ δP
• Intensity: I = δW (W/ster). This quantity applied to a point source and is often directionally-dependent.
∆ δ2 P
• Radiance: L = δAδΩ (W/m2 × ster). This photometric quantity is a measure of how bright a surface appears in an image.
π d 2
E= L cos4 α
4 f
Where the irradiance of the image E is on the lefthand side and the radiance of the object/scene L is on the right. The
BRDF must also satisfy Helmholtz reciprocity, otherwise we would be violating the 2nd Law of Thermodynamics.
Mathematically, recall that Helmholtz reciprocity is given by:
f (θi , θe , φi , φe ) = f (θe , θi , φe , φi ) ∀ θi , θe , φi , φe
With this, we are now ready to discuss our first type of surfaces: ideal Lambertian surfaces.
f (θi , θe , φi , φe ) = f (θe , θi , φe , φi ) ∀ θi , θe , φi , φe
AND
∂f (θi , θe , φi , φe ) ∂f (θi , θe , φi , φe )
f (θi , θe , φi , φe ) = K ∈ R with respect to θe , φe , i.e. = =0
∂θe ∂φe
• If the surface is ideal, the Lambertian surface reflects all incident light. We can use this to compute f . Suppose we take
a small slice of the light reflected by a Lambertian surface, i.e. a hemisphere for all positive z. The area of this surface is
given by: sin θe δθe δφe , which is the determinant of the coordinate transformation from euclidean to spherical coordinates.
40
Then, we have that:
π
Z π Z 2
(f E cos θi sin θe dθe )dφe = E cos θi (Integrate all reflected light)
−π 0
π
Z π Z 2
E cos θi (f sin θe dθe )dφe = E cos θi (Factor out E cos θi )
−π 0
Z π Z π
2
(f sin θe dθe )dφe = 1 (Cancel out on both sides)
−π 0
Z π
2
2π f sin θe cos θe dθe = 1 (No dependence on φ)
0
Z π
2
πf 2 sin θe cos θe dθe = 1 (Rearranging terms)
0
Z π
2
πf sin(2θe )dθe = 1 (Using identity 2 sin θ cos θ = sin(2θ))
0
1 θ =π
πf [− cos(2θe )]θee =02 = 1
2
1
πf = 1 =⇒ f =
π
Suppose we have a point source radiating isotropically over the positive hemisphere in a 3-dimensional spherical coordinate
system. Then, the solid angle spanned by this hemisphere is:
Asurface 2πr2
Ω= = = 2π Steradians
r2 r2
1
If the point source is radiating isotropically in all directions, then f = 2π . But we saw above that f = π1 for an ideal Lambertian
surface. Why do we have this discrepancy? Even if brightness is the same/radiation emitted is the same in all directions, this
does not mean that the power radiated in all directions is hte same. This is due to foreshortening, since the angle between
the object’s surface normal and the camera/viewer changes.
A = A0 cos θi
Therefore, our expression for the the radiance (how bright the object appears in the image) is given by:
1 1
L= Es = E0 cos θi
π π
41
We can see that the Hapke BRDF also satisfies Helmholtz Reciprocity:
1 1
f (θi , φi , θe , φe ) = √ =√ = f (θe , φe , θi , φi )
cos θe cos θi cos θi cos θe
Using the Hapke BRDF, we can use this to compute illumination (same as radiance, which is given by:
L = E0 cos θi f (θi , φi , θe , φe )
1
= E0 cos θi √
cos θe cos θi
r
cos θi
= E0
cos θe
Where L is the radiance/illumination of the object in the image, E0 cos θi is the irradiance of the object, accounting for fore-
shortening effects, and √cos θ1 cos θ is the BRDF of our Hapke surface.
i e
Where n̂ is the surface normal vector of the object being imaged, ŝ is the unit vector describing the position of the light
source, and v̂ is the unit vector describing the position of the vertical. We can derive a relationship between n̂ and ŝ:
Next, we will use spherical coordinates to show how the isophotes of this surface will be longitudinal:
sin θ cos φ
n̂ = sin θ sin φ ∈ R3
cos θ
Since normal vectors n̂ ∈ R3 only have two degrees of freedom (since ||n̂||2 = 1), we can fully parameterize these two degrees
of freedom by the azimuth and colatitude/polar angles φ and θ, respectively. Then we can express our coordinate system
given by the orthogonal basis vectors n̂, ŝ, v̂ as:
T
n̂ = sin θ cos φ sin θ sin φ cos θ
T
ŝ = cos θs sin θs 0
T
v̂ = cos θv sin θv 0
We can use our orthogonal derivations from above to now show our longitudinal isophotes:
r
cos θi n̂ · ŝ sin θ cos φ cos φs + sin θ sin φ sin φs cos(φ − φs )
= = = = c for some c ∈ R
cos θe n̂ · v̂ sin θ cos φ cos φv + sin θ sin φ sin φv cos(φ − φv )
From the righthand side above, we can deduce that isophotes for Hapke surfaces correspond to points with the same
azimuth having the same brightness, i.e. the isophotes of the imaged object (such as the moon) are along lines of constant
longitude.
42
7.3.2 Surface Orientation and Reflectance Maps of Hapke Surfaces
Next, we will take another look at our “shape from shading” problem, but this time using the surface normals of Hapke surfaces
and relating this back to reflectance maps and our (p, q) space. From our previous derivations, we already have that the BRDF
of a Hapke surface is nothing more than the square root of the dot product of the surface normal with the light source vector
divided by the dot product of the surface normal with the vertical vector, i.e.
r r
cos θi n̂ · ŝ
=
cos θe n̂ · v̂
For one thing, let us use our linear isophotes in gradient space to solve our photometric stereo problem, in this case with
two measurements under different lighting conditions. Photometric stereo is substantially easier with Hapke surfaces than with
Lambertian, because there is no ambiguity in where the solutions lie. Unlike Lambertian surfaces, because of the linearity in
(p, q) space we are guaranteed by Bezout’s Theorem to have only one unique solution.
We can prove (see the synchronous lecture notes for this part of the course) that the transforming the points in (x, y) via
43
a rotation matrix R is equivalent to rotating the gradient space (p, q) by the same matrix R. I.e.
0
x x
R : (x, y) =⇒ (x0 , y 0 ), 0 = R
y y
0
p p
R : (p, q) =⇒ (p0 , q 0 ), 0 = R
q q
(Where SO(2) is the Special Orthogonal Group [4].) Note that R−1 = RT since R is orthogonal and symmetric.
By rotating our coordinate system from (p, q) −→ (p0 , q 0 ), we are able to uniquely specify p0 , but not q 0 , since our isophotes lie
along a multitude/many q 0 values. Note that this rotation system is specified by having p0 lying along the gradient of our isophotes.
Returning to our Lunar Surface application with Hapke surfaces, we can use this surface orientation estimation framework
to take an iterative, incremental approach to get a profile of the lunar surface. This enables us to do depth profile shaping
of the moon simply from brightness estimates! There are a few caveats to be mindful of for this methodology, however:
• We can only recover absolute depth values provided we are given initial conditions (this makes sense, since we are effectively
solving a differential equation in estimating the depth z from the surface gradient (p, q)T ). Even without initial conditions,
however, we can still recover the general shape of profiling, simply without the absolute depth.
• Additionally, we can get profiles for each cross-section of the moon/object we are imaging, but it is important to recognize
that these profiles are effectively independent of one another, i.e. we do not recover any information about the relative
surface orientation changes between/at the interfaces of different profiles we image. We can use heuristic-based approaches
to get a topographic mapping estimate. We can then combine these stitched profiles together into a 3D surface.
Motivation for these types of lenses: When we have perspective projection, the degree of magnification depends on dis-
tance. How can we remove this unwanted distance-dependent magnification due to perspective projection? We can do so by
44
effectively “moving” the Center of Projection (COP) to “−∞”. This requires using a lens with a lot of glass.
lim α=0
COP→+∞
• We can generalize this idea even more with “lenselets”: These each concentrate light into a confined area, and can be used
in an array fashion as we would with arrays of photodiodes. A useful term for these types of arrays is fill factor: The
fill factor of an image sensor array is the ratio of a pixel’s light sensitive area to its total area [5]. Lenselet arrays can be
useful because it helps us to avoid aliasing effects. We need to make sure that if we are sampling an object discretely,
that we do not have extremely high-frequency signals without some low pass filtering. This sort of low-pass filtering can be
achieved with using large pixels and averaging (a crude form of lowpass filtering). However, this averaging scheme does not
work well when light comes in at off-90 degree angles. We want light to come into the sensors at near-90 degrees (DSLRs
find a way to get a around this).
• Recall that for an object space telecentric device, we no longer have a distance-based dependence. Effectively, we are taking
perspective projection and making the focal length larger, resulting in approximately orthographic projection.
Recall that for orthographic projection, projection becomes nearly independent in position, i.e. (x, y) in image space has
a linear relationship with (X, Y ) in the world. This means that we can measure sizes of objects independently of how far
away they are!
f f
Having |∆Z| << |Z|, where∆Z is the variation in Z of an object =⇒ x = X, y = Y
Z0 Z0
f
Approximating = 1 =⇒ x = X, y = Y
Z0
7.5 References
1. Foreshortening, https://en.wikipedia.org/wiki/Perspective_(graphical)
2. Hapke Parameters, https://en.wikipedia.org/wiki/Hapke_parameters
_______________________________________
3. Understanding Radiance (Brightness), Irradiance and Radiant Flux
4. Orthogonal Group, https://en.wikipedia.org/wiki/Orthogonal group
5. https://en.wikipedia.org/wiki/Fill factor (image sensor)
8 Lecture 9: Shape from Shading, General Case - From First Order Nonlinear
PDE to Five ODEs
In this lecture, we will begin by exploring some applications of magnification, shape recovery, and optics through Transmission
and Scanning Electron Microscopes (TEMs and SEMs, respectively). Then, we will discuss how we can derive shape from shading
using needle diagrams, which capture surface orientations at each (x, y) pixel in the image. This procedure will motivate the use
of Green’s Theorem, “computational molecules”, and a discrete approach to our standard unconstrained optimization problem.
We will conclude by discussing more about recovering shape from shading for Hapke surfaces using initial curves and rotated
coordinate systems.
45
8.1 Example Applications: Transmission and Scanning Electron Microscopes (TEMs and
SEMs, respectively)
We will begin with a few motivating questions/observations:
• How do TEMs achieve amazing magnification? They are able to do so due to the fact that these machines are not
restricted by the wavelength of the light they use for imaging (since they are active sensors, they image using their own
“light”, in this case electrons.
• What are SEM images more enjoyable to look at than TEM images? This is because SEM images reflect shading,
i.e. differences in brightness based off of surface orientation. TEM images do not do this.
• How do SEMs work? Rely on an electron source/beam, magnetic-based scanning mechanisms, photodiode sensors to
measure secondary electron current. Specifically:
– Many electrons lose energy and create secondary electrons. Secondary electrons are what allow us to make measure-
ments.
– Secondary electron currents vary with surface orientation.
– Objects can be scanned in a raster-like format.
– Electron current is used to modulate a light ray. Magnification is determined by the degree of deflection.
– Gold plating is typically used to ensure object is conductive in a vacuum.
– Inclines/angles can be used for perturbing/measuring different brightness values.
– From a reflectance map perspective, measuring brightness gives is the slope (a scalar), but it does not give us the
gradient (a vector). This is akin to knowing speed, but not the velocity.
A needle diagram is a 2D representation of the surface orientation of an object for every pixel in an image, i.e. for ev-
∆ ∆ dz
ery (x, y) pair, we have a surface orientation (p, q), where p = dz
dt , q = dy . Recall from photometric stereo that we cannot simply
parameterize Z(x, y); we can only parameterize the surface gradients p(x, y) and q(x, y).
In this problem, our goal is that given (p, q) for each pixel (i.e. given the needle diagram), recover z for each pixel. Note
that this leads to an overdetermined problem (more constraints/equations than unknowns) [1]. This actually will allow us to
reduce noise and achieve better results.
Let us define δx0 = pdx0 + qdy. Next, we construct a contour in the (x, y) plane of our (p, q) measurements, where the contour
starts and ends at the origin, and passes through a measurement. Our goal is to have the integrals of p and q be zero over these
contours, i.e.
I
(pdx0 + qdy 0 ) = 0
But note that these measurements are noisy, and since we estimate p and q to obtain estimates for z, this is not necessar-
ily true.
46
Note that an easy way to break this problem down from one large problem into many smaller problems (e.g. for computa-
tional parallelization, greater accuracy, etc.) is to decompose larger contours into smaller ones - if z is conserved for a series of
smaller loops, then this implies z is conserved for the large loop as well.
∂p(x, y) ∂q(x, y)
Simplifying : δyδx = δxδy
∂y ∂x
∂p(x, y) ∂q(x, y)
Solution : =
∂y ∂x
∂z ∂z
This is consistent with theory, because since our parameters p ≈ ∂x and q ≈ ∂y , then the condition approximately becomes
(under perfect measurements):
∂p(x, y) ∂ ∂z ∂2z
= ( )=
∂y ∂y ∂x ∂y∂x
∂q(x, y) ∂ ∂z ∂2z
= ( )=
∂x ∂x ∂y ∂x∂y
Green’s Theorem is highly applicable in machine vision because we can reduce two-dimensional computations to one-dimensional
computations. For instance, Green’s Theorem can be helpful for:
• Computing the area of a contoured object/shape
• Computing the centroid of a blob or object in two-dimensional space, or more generally, geometric moments of a surface.
Moments can generally be computed just by going around the boundary of a contour.
47
Let us now apply Green’s Theorem to our problem:
I ZZ
∂q(x, y) ∂p(x, y)
(pdx + qdy) = − dxdy = 0
L D ∂x ∂y
∂q(x,y) ∂p(x,y) ∂q(x,y) ∂p(x,y)
This requires ∂x − ∂y = 0 =⇒ ∂x = ∂y ∀ x, y ∈ D.
We could solve for estimates of our unknowns of interest, p and q, using unconstrained optimization, but this will be more
difficult than before. Let us try using a different tactic, which we will call “Brute Force Least Squares”:
ZZ 2 ∂z 2
∂z
min −p + − q dxdy
z(x,y) D ∂x ∂y
I.e. we are minimizing the squared distance between the partial derivatives of z with respect to x and y and our respective
parameters over the entire image domain D.
However, this minimization approach requires having a finite number of variables, but here we are optimizing over a continuous
function (which has an infinite number of variables). Therefore, we have infinite degrees of freedom. We can use calculus of
variations here to help us with this. Let us try solving this as a discrete problem first.
Note that these discrete derivatives of z with respect to x and y present in the equation above use finite forward differences.
Even though we are solving this discretely, we can still think of this as solving our other unconstrained optimization prob-
lems, and therefore can do so by taking the first-order conditions of each of our unknowns, i.e. ∀ (k, l) ∈ D. The FOCs are
given by |D| equations (these will actually be linear!):
∂
(J({zi,j }(i,j)∈D ) = 0 ∀ (k, l) ∈ D
∂zk,l
Let us take two specific FOCs and use them to write a partial differential equation:
• (k, l) = (i, j):
∂ 2 zk,l+1 − zk,l 2z
k+1,l − zk,l
(J({z,j }(i,j)∈D ) = − pk,l + − qk,l = 0
∂zk,l
48
8.3.1 “Computational Molecules”
These are computational molecules that use finite differences [3] to estimate first and higher-order derivatives. They can be
thought of as filters, functions, and operators that can be applied to images or other multidimensional arrays capturing spatial
structural. Some of these are (please see the handwritten lecture notes for what these look like graphically):
1. zx = 1 (z(x, y) − z(x − 1, y)) (Backward Difference), 1 (z(x + 1, y) − z(x, y)) (Forward Difference)
2. zy = 1 (z(x, y) − z(x, y − 1)) (Backward Difference), 1 (z(x, y + 1) − z(x, y)) (Forward Difference)
1
3. ∆z = ∇2 z = 2 (4z(x, y) − (z(x − 1, y) + z(x + 1, y) + z(x, y − 1) + z(x, y + 1)))
∂2z 1
4. zxx = ∂x2 = 2 (z(x − 1, y) − 2(x, y) + z(x + 1, y))
∂2z 1
5. zyy = ∂y 2 = 2 (z(x, y − 1) − 2(x, y) + z(x, y + 1))
These computational molecules extend to much higher powers as well. Let us visit the Laplacian operator ∆(·). This operator
comes up a lot in computer vision:
∂z ∂z T ∂z ∂z ∂2z ∂2z
• Definition: ∆z = ∇2 z = ( ∂x , ∂y ) ( ∂x , ∂y ) = ∂x2 + ∂y 2
• The Laplacian is the lowest dimensional rotationally-invariant linear operator, i.e. for a rotated coordinate system
(x0 , y 0 ) rotated from (x, y) by some rotation matrix R ∈ SO(2), we have:
I.e. the result of the Laplacian is the same in both coordinate systems.
As we can see, the Laplacian is quite useful in our derived solution above.
Iterative Approach: The sparse structure in our First-Order Equations allows us to use an iterative approach for shape
estimation. Our “update equation” updates the current depth/shape estimate zk,l using its neighboring indices in two dimen-
sions:
(n+1) 1 (n) (n) (n) (n)
zk,l = (z + zk+1,l + zk,l−1 + zk−1,l ) − (pk,l − pk,l−1 ) − (qk,l − qk−1,l )
4 k,l+1
A few terminology/phenomenological notes about this update equation:
• The superscripts n and n + 1 denote the number of times a given indexed estimate has been updated (i.e. the number of
times this update equation has been invoked). It is essentially the iteration number.
• The subscripts k and l refer to the indices.
• The first term on the righthand side 14 (·) is the local average of zk,l using its neighbors.
• This iterative approach converges to the solution much more quickly than Gaussian elimination.
• This iterative approach is also used in similar ways for solving problems in the Heat and Diffusion Equations (also
PDEs).
• This procedure can be parallelized so long as the computational molecules do not overlap/touch each other. For instance,
we could divide this into blocks of size 3 x 3 in order to achieve this.
• From this approach, we can develop robust surface estimates!
49
8.3.3 Reconstructing a Surface From a Single Image
Recall this other shape from brightness problem we solved for Hapke surfaces (Lecture 8). For Hapke surfaces, we have that our
brightness in the image (radiance) L is given by:
r r
cos θi n̂ · ŝ
L= =
cos θe n̂ · v̂
Recall from our last lecture that this gives us a simple reflectance map of straight lines in gradient (p, q) space. By rotating this
gradient space coordinate system from (p, q) → (p0 , q 0 ), we can simplify our estimates for shape.
With this rotation, we also claimed that rotating the system in gradient space is equivalent to using the same rotation ma-
trix R in our image space (x, y). Here we prove this:
Then, in our rotated coordinate system where p0 is along the brightness gradient, we have that:
ps p + qs q rs E 2 − 1
p0 = p =p
p2s + qs2 p2s + qs2
∆
(Where p0 = ∂x∂z
0 is the slope of the surface of interest in a particular direction. This phenomenon only holds for Hapke surfaces
with linear isophotes. We can integrate this expression out for surface estimation:
Z x0
z(x) = z(x0 ) + p0 (x)dx
x
Integrating out as above allows us to build a surface height profile of our object of interest. Can do this for the y-direction as
well:
Z x0
z(x, y) = z(x0 , y) + p0 (x, y)dx
x
2. δy = √ p2s δξ
ps +qs2
[rs E 2 (x,y)−1]
3. δz = √ 2 2 δξ
ps +qs
Note that we can adjust the speed of motion here by adjusting the parameter δ.
Next time, we will generalize this from Hapke reflectance maps to arbitrary reflectance maps!
50
8.4 References
1. Overdetermined System, https://en.wikipedia.org/wiki/Overdetermined system
2. Fubini’s Theorem, https://en.wikipedia.org/wiki/Fubini%27s theorem
3. Finite Differences, https://en.wikipedia.org/wiki/Finite difference
For the Hapke example, we have a rotated coordinate system governed by the following ODEs (note that xi will be the variable
we parameterize our profiles along):
dx
1. X-direction: dξ = ps
dy
2. Y-direction: dξ = qs
dz ∂z dx ∂z dy
3. By chain rule, Z-direction: dξ = ∂x dξ + ∂y dξ = pps + qqs = ps p + qs q
dx dy dz
Intuition: Infinitesimal steps in the image given by ξ gives dξ , dξ , and we are interested in finding the change in height dξ ,
which can be used for recovering surface orientation.
Note: When dealing with brightness problems, e.g. SfS, we have implicitly shifted to orthographic projection (x = Zf0 X, y =
f
Z0 y). These methods can be applied to perspective projection as well, but the mathematics makes the intuition less clear. We
can model orthographic projection by having a telecentric lens, which effectively places the object really far away from the
image plane.
We can solve the 3 ODEs above using a forward Euler method with a given step size. For small steps, this approach will be
accurate enough, and accuracy here is not too important anyway since we will have noise in our brightness measurements.
51
Some cancellation and rearranging yields:
ps p + qs q = rs E 2 − 1
Therefore, we have a direct relationship between our measured brightness E and our unknowns of interest:
∂z ∂z
ps + qs = pps + qqs = rs E 2 − 1
∂x ∂y
Note here, however, that we do not know surface orientation based on this, since again for Hapke surfaces, we only know slope
in one of our two directions in our rotated gradient space (p0 , q 0 ). A forward Euler approach will generate a set of independent
profiles, and we do not have any information about the surface orientation at the interfaces of these independent profiles. This
necessitates an initial curve containing initial conditions for each of these independent profiles. In 3D, this initial curve is
parameterized by η, and is given by: (x(η), y(η), z(η)). Our goal is to find z(η, ξ), where:
E(x, y) = R(p, q)
Let us now consider taking a small step δx, δy in the image, and our goal is to determine how z changes from this small step.
We can do so using differentials and surface orientation:
∂z ∂z
δz = δx + δy = pδx + qδy
∂x ∂y
Therefore, if we know p and q, we can compute δz for a given δx and δy. But in addition to updating z using the equation
above, we will also need to update p and q (intuitively, since we are still moving, the surface orientation can change):
∂p ∂p
δp = δx + δy
∂x ∂y
∂q ∂q
δq = δx + δy
∂x ∂y
This notion of updating p and q provides motivation for keeping track/updating not only (x, y, z), but also p and q. We can
think of solving our problem as constructing a characteristic strip of the ODEs above, composed of (x, y, z, p, q) ∈ R5 . In
vector-matrix form, our updates to p and q become:
2
∂2z
∂ z
r s ∂x2 ∂y∂x
px py
δp δx δx δx δx
= = δy = =H
δq δy 2
∂ z 2
∂ z
δy δy
s t 2
qx qy
∂x∂y ∂y
Where H is the Hessian of second spatial derivatives (x and y) of z. Note that from Fubuni’s theorem and from our Taylor
expansion from last lecture, py = qx .
Intuition:
∂z ∂z
• First spatial derivatives ∂x and ∂y describe the surface orientation of an oject in the image.
52
One issue with using this Hessian approach to update p and q: how do we update the second derivatives r, s, and t? Can we use
3rd order derivatives? It turns out that seeking to update lower-order derivatives with higher-order derivatives will just generate
increasingly more unknowns, so we will seek alternative routes. Let’s try integrating our brightness measurements into what we
already have so far:
Notice that we have the same Hessian matrix that we had derived for our surface orientation update equation before!
Intuition: These equations make sense intuitively - brightness will be constant for constant surface orientation in a model
where brightness depends only on surface orientation. Therefore, changes in brightness correspond to changes in surface orien-
tation.
Where the vector on the lefthand side is our step in x and y, our vector on the righthand side is our gradient of the reflectance
map in gradient space (p, q), and ξ is the step size. Intuitively, this is the direction where we can “make progress”. Substituting
this equality into our update equation for p and q, we have:
δp R
=H p ξ
δq Rq
Ex
= δξ
Ey
Therefore, we can formally write out a system of 5 first-order ordinary differential equations (ODEs) that generate our charac-
teristic strip as desired:
dx
1. dξ = Rp
dy
2. dξ = Rq
dp
3. dξ = Ex
dq
4. dξ = Ey
53
dz
5. dξ = pRp + qRq (“Output Rule”)
Though we take partial derivatives on many of the righthand sides, we can think of these quantities as measurements or derived
variations of our measurements, and therefore they do not correspond to partial derivatives that we actually need to solve for.
Thus, this is why we claim this is a system of ODEs, and not PDEs.
• This system of 5 ODEs explores the surface along the characteristic strip generated by these equations.
• Algorithmically, we (a) Look at/compute the brightness gradient, which helps us (b) Compute p and q, which (c) Informs
us of Rp and Rq for computing the change in height z.
• ODEs 1 & 2 and ODEs 3 & 4 are two systems of equations that each determine the update for the other system (i.e. there
is dynamic behavior between these two systems). The 5th ODE is in turn updated from the results of updating the other
4 ODE quantities of interest.
• The step in the image (x, y) depends on the gradient of the reflectance map in (p, q).
• Analogously, the step in the reflectance map (p, q) depends on the gradient of the image in (x, y).
• This system of equations necessitates that we cannot simply optimize with block gradient ascent or descent, but rather a
process in which we dynamically update our variables of interest using our other updated variables of interest.
• This approach holds generally for any reflectance map R(p, q).
• We can express our image irradiance equation as a first order, nonlinear PDE:
∂z ∂z
E(x, y) = R ,
∂x ∂y
Next, we will show how this general approach applied to a Hapke surface reduces to our previous SfS results for this problem.
Taking derivatives using the system of 5 ODEs defined above, we have from the multivariate chain rule):
dx ∆ ∂R
1. : Rp = = √1 1
rs 2 1+ps p+qs q ps
√
dξ ∂p
dy ∆ ∂R
2. : Rq = = √1 1
rs 2 1+ps p+qs q qs
√
dξ ∂q
dz p p+qs q
3. dξ : pRp + qRq = √ √s
2 rs 1+ps p+qs q
Since the denominator is common in all three of these derivative equations, we can just conceptualize this factor as a speed/step
size factor. Therefore, we can simply omit this factor when we update these variables after taking a step. With this, our updates
become:
1. δx ← ps
2. δy ← qs
3. δz ← (ps p + qs q) = rs E 2 − 1
Which are consistent with our prior results using our Hapke surfaces. Next, we will apply this generalized approach to Scanning
Electron Microscopes (SEMs):
54
9.2.2 Applying General Form SfS to SEMs
For SEMs, the reflectance map is spherically symmetric around the origin:
∆ ∂R
1. dx
dξ : Rp = ∂p = 2f 0 (p2 + q 2 )p
dy ∆ ∂R
2. dξ : Rq = ∂q = 2f 0 (p2 + q 2 )q
3. dz
dξ : pRp + qRq = 2f 0 (p2 + q 2 )(p2 + q 2 )
Again, here we can also simplify these updates by noting that the term 2f 0 (p2 + q 2 ) is common to all three derivatives, and
therefore this factor can also be interpreted as a speed factor that only affects the step size. Our updates then become:
1. δx ← p
2. δy ← q
3. δz ← p2 + q 2
This tells us we will be taking steps along the brightness gradient. Our solution generates characteristic strips that contain
information about the surface orientation. To continue our analysis of this SfS problem, in addition to defining characteristic
strips, it will also be necessary to define the base characteristic.
Another important component when discussing our solution to this SfS problem is the base characteristic, which is the
projection of the characteristic strip onto the x, y image plane:
T T
x(ξ) y(ξ) = projectionx,y {characteristic strip} = projectionx,y { x y z p q }
2. Constant step size inqimage: A few issues with this approach. First, curves may run at different rates. Second,
methodology fails when Rp2 + Rq2 = 0.
q p
d
Achieved by: Dividing by Rp2 + Rq2 =⇒ dξ ( δx2 + δy 2 ) = 1.
55
3. Constant Step Size in 3D/Object:
q Runs into issues when Rp = Rq = 0.
p
Achieved by: Dividing by Rp2 + Rq2 + (pRp + qRq )2 =⇒ (δx)2 + (δy)2 + (δz)2 = 1.
4. Constant Step Size in Isophotes: Here, we are effectively taking constant steps in brightness. We will end up dividing
by the dot product of the brightness gradient and the gradient of the reflectance map in (p, q) space.
Achieved by: Dividing by ( ∂E ∂R ∂E ∂R
∂x ∂p + ∂y ∂q )δξ = ((Ex , Ey ) · (Rp , Rq ))δξ =⇒ δE = 1.
E(x, y) = R(p, q)
∂z ∂z ∂x ∂z ∂y
= +
∂η ∂x ∂η ∂y ∂η
∂x ∂y
=p +q
∂η ∂η
Where p and q in this case are our unknowns. Therefore, in practice, we can get by with just initial curve, and do not need the
full initial characteristic strip - i.e. if we have x(η), y(η), and z(η), then we can compute the orientation from the reflectance
map using our brightness measurements.
It turns out, unfortunately, that we cannot. This is due to the fact that as we infinisimally approach the edge of the boundary,
we have that ∂x∂z
→ ∞, ∂y∂z
→ ∞. Even though the slope pq is known, we cannot use it numerically/iteratively with our step
updates above.
56
This in turn implies that p and q cannot be stepped:
dp ∂E
= =0
dξ ∂x
dq ∂E
= =0
dξ ∂y
Intuition: Intuitively, what is happening with these two systems (each of 2 ODEs ((x, y) or p, q), as we saw in our 5 ODE
system above) is that using stationary points in one domain amounts to updates in the other domain going to zero, which in turn
prevents the other system’s quantities of interest from stepping. Since δz depends on δx, δy, δp, δq, and since δx, δy, δp, δq = 0
when we use stationary points as starting points, this means we cannot ascertain any changes in δz along these points.
Note: The extremum corresponding to a stationary point can be maximum, as it is for Lambertian surfaces, or a mini-
mum, as it is for Scanning Electron Microscopes (SEM).
However, even though we cannot use stationary points themselves as starting points, could we use the area around them?
If we stay close enough to the stationary point, we can approximate that these neighboring points have nearly the same surface
orientation.
• Approach 1: Construct a local tangent plane by extruding a circle in the plane centered at the stationary point with
radius - this means all points in this plane will have the same surface orientation as the stationary point. Note that
mathematically, a local 2D plane on a 3D surface is equivalent to a 2-manifold [1]. This is good in the sense that we know
the surface orientation of all these points already, but not so great in that we have a degenerate system - since all points
have the same surface orientation, under this model they will all have the same brightness as well. This prevents us from
obtaining a unique solution.
• Approach 2: Rather than constructing a local planar surface, let us take a curved surface with non-constant surface
orientation and therefore, under this model, non-constant brightness.
9.5.3 Example
Suppose we have a reflectance map and surface function given by:
Can we use the brightness gradient to estimate local shape? It turns out the answer is no, again because of stationary points.
But if we look at the second derivatives of brightness:
∂2E ∂E
Exx = 2
= (8x) = 8
∂x ∂x
∂2E ∂E
Eyy = = (32y) = 32
∂y 2 ∂y
∂2E ∂ ∂
Exy = = (32y) = (8y) = 0
∂x∂y ∂x ∂y
57
These second derivatives, as we will discuss more in the next lecture, will tell us some information about the object’s shape.
As we have seen in previous lectures, these second derivatives can be computed by applying computational molecules to our
brightness measurements.
High-Level Algorithm
(NOTE: This will be covered in greater detail next lecture) Let’s walk through the steps to ensure we can autogenerate an
initial condition curve without the need to measure it:
1. Compute stationary points of brightness.
2. Use 2nd derivatives of brightness to estimate local shape.
3. Construct small cap (non-planar to avoid degeneracy) around the stationary point.
4. Begin solutions from these points on the edge of the cap surrounding the stationary point.
9.6 References
1. Manifold, https://en.wikipedia.org/wiki/Manifold
58
10.2 Patent Case Study: Detecting Sub-Pixel Location of Edges in a Digital Image
To put this problem into context, consider the following:
• Recall that images typically have large regions of uniform/homogeneous intensity
• Image arrays are very memory-dense. A more sparse way to transfer/convey information about an image containing edges
is to use the locations of edges as region boundaries of the image. This is one application of edge finding.
• Robert’s Cross: This approximates derivatives in a coordinate system rotated 45 degrees (x0 , y 0 ). The derivatives can
be approximated using the Kx0 and Ky0 kernels:
∂E 0 −1
→ K x 0 =
∂x0 −1 0
∂E 1 0
→ Ky 0 =
∂y 0 0 −1
• Sobel Operator: This computational molecule requires more computation and it is not as high-resolution. It is also more
robust to noise than the computational molecules used above:
−1 0 1
∂E
→ Kx = 2 0 2
∂x
−1 0 1
−1 2 −1
∂E
→ Ky = 0 0 0
∂y
1 2 1
• Silver Operators: This computational molecule is designed for a hexagonal grid. Though these filters have some advan-
tages, unfortunately, they are not compatible with most cameras as very few cameras have a hexagonal pixel structure.
For this specific application, we can compute approximate brightness gradients using the filters/operators above, and then we can
convert these brightness gradients from Cartesian to polar coordinates to extract brightness gradient magnitude and direction
(which are all we really need for this system). In the system, this is done using the CORDIC algorithm [1].
59
10.2.1 High-Level Overview of Edge Detection System
At a high level, we can divide the system into the following chronological set of processes/components:
1. Estimate Brightness Gradient: Given an image, we can estimate the brightness gradient using some of the filters
defined above.
2. Compute Brightness Gradient Magnitude and Direction: Using the CORDIC algorithm, we can estimate the
brightness gradient magnitude and direction. The CORDIC algorithm does this iteratively through a corrective feedback
mechanism (see reference), but computationally, only uses SHIFT, ADD, SUBTRACT, and ABS operations.
3. Choose Neighbors and Detect Peaks: This is achieved using brightness gradient magnitude and direction and a pro-
cedure called non-maximum suppression [2].
First, using gradient magnitude and direction, we can find edges by looking across the 1D edge (we can search for this edge
using the gradient direction Gθ , which invokes Non-Maximum Suppression (NMS). We need to quantize into 8 (Cartesian)
or 6 (Polar) regions - this is known as coarse direction quantization.
Finally, we can find a peak by fitting three points with a parabola (note this has three DOF). This approach will end up
giving us accuracy up to 1/10th of a pixel. To go further, we must look at the assumptions of gradient variation with
position, as well as:
• Camera optics
• Fill factor of the chip sensor
• How in-focus the image is
• How smooth the edge transition is
The authors of this patent claim that edge detection performance is improved using an optimal value of “s” (achieved through
interpolation and bias correction), which we will see later. For clarity, the full system diagram is here:
Figure 3: Aggregate edge detection system. The steps listed in the boxes correspond to the steps outlined in the procedure
above.
60
• Gradient estimator
• Peak detector
• Sub-pixel interpolator
Next, let us dive in more to the general edge detection problem.
u(x)
1
x
−2 −1 0 1 2
Using this cross-section across the edge to model the edge actually causes problems arising from aliasing: since we seek to find
the location of an edge in a discrete, and therefore, sampled image, and since the edge in the case of u(x) is infinitely thin, we
will not be able to find it due to sampling. In Fourier terms, if we use a perfect step function, we introduce artificially high
(infinite) frequencies that prevent us from sampling without aliasing effects. Let us instead try a “soft” step function, i.e. a
“sigmoid” function: σ(x) = 1+e1−x . Then our u(x) takes the form:
0.8
0.6
u(x)
0.4
0.2
0
−6 −4 −2 0 2 4 6
x
The gradient of this brightness across the edge, given by ∇u(x) (or du
dx in one dimension), is then given by the following. Notice
that the location of the maximum matches the inflection point in the graph above:
61
Gradient of “Soft” Unit Step Function, ∇u(x)
0.25
0.2
0.15
∇u(x)
0.1
5 · 10−2
0
−6 −4 −2 0 2 4 6
x
As we mentioned above, we can find the location of this edge by looking at where the second derivative of brightness crosses
zero, a.k.a. where ∇(∇u(x)) = ∇2 u(x) = 0. Notice that the location of this zero is given by the same location as the inflection
point of u(x) and the maximum of ∇u(x):
0.1
5 · 10−2
∇2 u(x)
−5 · 10−2
−0.1
−6 −4 −2 0 2 4 6
x
For those curious, here is the math behind this specific function, assuming a sigmoid for u(x):
1
1. u(x) = 1+exp (−x)
du d 1 exp(−x)
2. ∇u(x) = dx = dx 1+exp −x = (1+exp(−x))2
Building on top of this framework above, let us now move on to brightness gradient estimation.
Robert’s Cross Gradient: Since this estimates derivatives at 45 degree angles, the pixels are effectively further apart, and
this means there will be a constant of proportionality difference between the magnitude of the gradient estimated here and with
a normal (x, y) coordinate system:
q q
Ex20 + Ey20 ∝ Ex2 + Ey2
62
Next, we will look at the Sobel operator. For this analysis, it will be helpful to recall the following result from Taylor Series:
∞
(δx)2 00 (δx)3 000 (δx)4 (4) X (δx)i f (i) (x) ∆
f (x + δx) = f (x) + δxf 0 (x) + f (x) + f (x) + f (x) + ... = , where 0! = 1
2! 3! 24 i=0
i!
Let us first consider the simple two-pixel difference operator (in the x-direction/in the one-dimensional case),
i.e. dE 1
dx → Kx = δ (−1 1). Let us look at the forward difference and backward difference when this operator is applied:
Notice that for both of these, if f 00 (x) is large, i.e. if f (x) is nonlinear, then we will have second-order error terms that appear in
our estimates. In general, we want to aim for removing these lower-order error terms. If we average the forward and backward
differences, however, we can see that these second-order error terms disappear:
f (x+δx)−f (x) f (x)−f (x−δx)
+ (δx)2 000
δx δx
= f 0 (x) + f (x) + ...
2 6
Now we have increased the error term to 3rd order, rather than 2nd order! As a computational molecule, this higher-order filter
Sobel operator looks like dE 1
dx → Kx = 2δ (−1 0 1). But we can do even better! So long as we do not need to have a pixel at our
proposed edge, we can use a filter of three elements spanning (x − 2δ x x + 2δ ). There is no pixel at x but we can still compute
the derivative here. This yields an error that is 0.25 the error above due to the fact that our pixels are 2δ apart, as opposed to δ
apart:
( xδ
2 )
2
error = f 000 (x)
6
This makes sense intuitively, because the closer together a set of gradient estimates are, the more accurate they will be. We
can incorporate y into the picture, making this amenable for two-dimensional methods as desired, by simply taking the center
of four pixels, given for each dimension as:
∂E 1 −1 1
≈ Kx =
∂x 2δx −1 1
∂E 1 −1 −1
≈ Ky =
∂y 2δy 1 1
The proposed edge is in the middle of both of these kernels, as shown below:
Figure 4: We can estimate the brightness gradient with minimal error by estimating it at the point at the center of these 2D
filters.
Estimating these individually in each dimension requires 3 operations each for a total of 6 operations, but if we are able to
take the common operations from each and combine them either by addition or subtraction, this only requires 4 operations.
Helpful especially for images with lots of pixels.
Next, we will discuss the 3-by-3 Sobel operator. We can think of this Sobel operator (in each dimension) as being the dis-
crete convolution of a 2-by-2 horizontal or vertical highpass/edge filer with a smoothing or averaging filter:
−1 0 1
−1 1 1 1
1. x-direction: 2δ1x ⊗ = −2 0 2
−1 1 1 1
−1 0 1
−1 −2 −1
−1 −1 1 1
2. y-direction: 2δ1y ⊗ = 0 0 0
1 1 1 1
1 2 1
63
A few notes about the derivation above:
• The convolution used is a “padded convolution” [3], in which, when implemented, when the elements of the filter/kernel
(in this case, the averaging kernel) are not aligned with the image, they are simply multiplied by zero. Zero padding is the
most common padding technique, but there are other techniques as well, such as wraparound padding.
• This approach avoids the half-pixel (in which we estimate an edge that is not on a pixel) that was cited above.
• Smoothing/averaging is a double edge sword, because while it can reduce/remove high-frequency noise by filtering, it can
also introduce undesirable blurring.
Next, we will look at how the brightness gradient is converted from Cartesian to Polar coordinates:
(Ex , Ey ) → (E0 , Eθ )
q
E0 = Ex2 + Ey2
E
y
Eθ = tan−1
Ey
Finally, we conclude this lecture by looking at appropriate values of s for quadratic and triangular functions. This assumes we
have three gradient measurements centered on G0 : (1) G− , (2) G0 , and (3) G+ . Let us look at the results for these two types
of functions:
G+ −G−
1. Quadratic: s = 4(G0 − 21 (G+ −G− ))
, this results in s ∈ [− 12 , 12 ].
G+ −G−
2. Triangular: s = 2(G0 −min(G+ ,G− ))
10.4 References
1. CORDIC Algorithm, https://www.allaboutcircuits.com/technical-articles/an-introduction-to-the-cordic-algorithm/
2. Non-Maximum Supression, http://justin-liang.com/tutorials/canny/#suppression
3. Padded Convolution, https://medium.com/@ayeshmanthaperera/what-is-padding-in-cnns-71b21fb0dd7
11 Lecture 12: Blob analysis, Binary Image Processing, Use of Green’s The-
orem, Derivative and Integral as Convolutions
In this lecture, we will continue our discussion of intellectual property, and how it relevant for all scientists and engineers. We
will then elaborate on some of the specific machine vision techniques that were used in this patent, as well as introduce some
possible extensions that could be applicable for this patent as well.
64
Some “rules” of patents:
• Copyright:
– Books, song recordings, choreographs
– Exceptions: presenting (fractional pieces of) information from another author
• Trademarks:
– Must be unique for your field (e.g. Apple vs. Apple).
– Cannot use common words - this is actually one reason why many companies have slightly misspelled combinations
of common words.
– Can use pictures, character distortions, and color as part of the trademark.
– No issues if in different fields.
• Trade Secret
– No protections, but not spilled, lasts forever
– Can enforce legal recourse with Non-Disclosure Agreement (NDA)
65
A “Soft” Unit Step Function, u(x)
0.8
0.6
u(x)
0.4
0.2
0
−6 −4 −2 0 2 4 6
x
The gradient of this brightness across the edge, given by ∇u(x) (or du
dx in one dimension), is then given by the following. Notice
that the location of the maximum matches the inflection point in the graph above:
0.25
0.2
0.15
∇u(x)
0.1
5 · 10−2
0
−6 −4 −2 0 2 4 6
x
As we mentioned above, we can find the location of this edge by looking at where the second derivative of brightness crosses
zero, a.k.a. where ∇(∇u(x)) = ∇2 u(x) = 0. Notice that the location of this zero is given by the same location as the inflection
point of u(x) and the maximum of ∇u(x):
66
Gradient2 of “Soft” Unit Step Function, ∇u(x)
0.1
5 · 10−2
∇2 u(x)
0
−5 · 10−2
−0.1
−6 −4 −2 0 2 4 6
x
For those curious, here is the math behind this specific function, assuming a sigmoid for u(x):
1
1. u(x) = 1+exp (−x)
du d 1 exp(−x)
2. ∇u(x) = dx = dx 1+exp −x = (1+exp(−x))2
1
2. Ex = 2 −1 0 1
1 −1 1
3. Ex = 2
−1 1
Where for molecule 2, the best point for estimating derivatives lies directly in the center pixel, and for molecules 1 and 3, the
best point for estimating derivatives lies halfway between the two pixels.
1. Taylor Series: From previous lectures we saw that we could use averaging to reduce the error terms from 2nd order
derivatives to third order derivatives. This is useful for analytically determining the error.
2. Test functions: We will touch more on these later, but these are helpful for testing your derivative estimates using
analytical expressions, such as polynomial functions.
3. Fourier domain: This type of analysis is helpful for understanding how these “stencils”/molecules affect higher (spatial)
frequency image content.
Note that derivative estimators can become quite complicated for high-precision estimates of the derivative, even for low-order
derivatives. We can use large estimators over many pixels, but we should be mindful of the following tradeoffs:
67
We can also look at some derivative estimators for higher-order derivatives. For 2nd-order derivatives, we just apply another
derivative operator, which is equivalent to convolution of another derivative estimator “molecule”:
∂2 ∂ ∂(·) 1 1 1
2
(·) = ⇐⇒ −1 1 ⊗ −1 1 = 2 1 −2 1
∂x ∂x ∂x
For deriving the sign here and understanding why we have symmetry, remember that convolution “flips” one of the two filters/-
operators!
Sanity Check: Let us apply this to some functions we already know the 2nd derivative of.
• f (x) = x2 :
f (x) = x2
f 0 (x) = 2x
f 00 (x) = 2
f (x) = x
f 0 (x) = 1
f 00 (x) = 0
f (x) = 1
f 0 (x) = 0
f 00 (x) = 0
68
11.2.3 Mixed Partial Derivatives in 2D
First, it is important to look at the linear, shift-invariant property of these operators, which we can express for each quality:
• Shift-Invariant:
d
(f (x + δ)) = f 0 (x + δ), for some δ ∈ R
dx
Derivative of shifted function = Derivative equivalently shifted by same amount
• Linear :
d
(af1 (x) + bf2 (x)) = af10 (x) + bf20 (x) for some a, b ∈ R
dx
Derivative of scaled sum of two functions = Scaled sum of derivatives of both functions
We will exploit this linear, shift-invariant property frequently in machine vision. Because of this joint property, we can treat
derivative operators as convolutions in 2D:
∂2
∂ ∂ 1 1 1 1 −1 +1
(·) = (·) ⇐⇒ −1 1 ⊗ = 2
∂x∂y ∂x ∂y −1 +1 −1
A few notes here:
• The second operator corresponding to Ey has been flipped in accordance with the convolution operator.
• If we project this derivative onto a “diagonal view”, we find that it is simply√ the second
√
derivative of x0 , where x0 is x
0 2 2
rotated 45 degrees counterclockwise in the 2D plane: x = x cos 45+y cos 45 = 2 x+ 2 y. In other words, in this 45-degree
rotated coordinate system, Ex0 x0 = Exy .
• Intuition for convolution: If convolution is a new concept for you, check out reference [2] here. Visually, convolution
is equivalent to “flipping and sliding” one operator across all possible (complete and partial) overlapping configurations of
the filters with one another.
lowest-order rotationally-symmetric derivative operator. Therefore, our finite difference/computational molecule estimates
should reflect this property if they are to be accurate. Two candidate estimators of this operator are:
0 1 0
1. “Direct Edge”: 12 1 −4 1
0 1 0
1 0 1
2. “Indirect Edge”: 212 0 −4 0
1 0 1
√
Note that the second operator has a factor of 212 in front of it because the distance between edges is 2 rather than 1, therefore,
√
we effectively have 102 , where 0 = 2.
How do we know which of these approximations is better? We can go back to our analysis tools:
• Taylor Series
• Test functions
• Fourier analysis
Intuitively, we know that neither of these estimators will be optimal, because neither of these estimators are rotationally-
symmetric. Let us combine these intelligently to achieve rotational symmetry. Adding four times the first one with one times
the second:
1 0 1 0 1 1 0 1 1
1 4 1
4 2 1 −4 1 + 1 0 −4 0 = 4 −20 4
22 62
0 1 0 1 0 1 1 4 1
69
Using Taylor Series, we can show that this estimator derived from this linear combination of estimators above results in an error
term that is one derivative higher than suing either of the individual estimators above, at the cost of more computation. Note
that the sum of all the entries here is zero, as we expect for derivative estimators.
For a hexagonal grid, this is scaled by 212 and has entries of all 1s on the outer ring, and an entry of -6 in the center. An
example application of a hexagonal grid - imaging black holes! Leads to π4 greater efficiency.
It turns out the authors discourage thresholding, and in their work they remove all but the maximum estimated gradient
(note that this is quantized at the octant level). Note that the quantized gradient direction is perpendicular to the edge. In this
case, for a candidate gradient point G0 and the adjacent pixels G− and G+ , we must have:
G0 > G− , G0 ≥ G+
This forces − 12 ≤ s ≤ 12 . Note that we have the asymmetric inequality signs to break ties arbitrarily. Next we plot the quantized
profile that has been interpolated parabolically - i.e. sub-pixel interpolation.
To find this point above (please take a look at the handwritten lecture notes for this lecture), we project from the quantized
gradient direction to the actual gradient direction. This is the “plane position” component.
In addition to cubic interpolation, we can also consider piecewise linear interpolation with “triangle” functions. For some
different values of b:
• b = 0 → s0 = s
• b = 1 → s0 = 2sign(s)s2
• b = 2 → s0 = 4sign(s)s3
Where different interpolation methods give us different values of b.
d2
R=δ (Point Spread Function (PSF))
f
70
This pillbox image is given mathematically by:
1
(1 − u(r − R))
πR2
Where u(·) is the unit step function. Where f is the focal length of the lens, d is the diameter of the lens (assumed to be conic),
and δ is the distance along the optical axis between the actual image plane and the “in focus” plane.
11.2.9 Multiscale
Note: We will discuss this in greater detail next lecture.
Multiscale is quite important in edge detection, because we can have edges at different scales. To draw contrasting exam-
ples, we could have an image such that:
• We have very sharp edges that transition over ≈ only 1 pixel
• We have blurry edges that transition over many pixels
We can slide a circle across a binary image - the overlapping regions inside the circle between the 1-0 edge controls how
bright things appear. We can use this technique to see how accurately the algorithm plots the edge position - this allows for
error calculation since we have ground truth results that we can compute using the area of the circle. Our area of interest is
given by the area enclosed by the chord whose radial points intersect with the binary edge:
√
2 R2 − X 2 X
A = R2 θ −
2
√R2 − x2
θ = arctan
x
Another way to analyze this is to compute the analytical derivatives of this brightness function:
√
1. ∂E 2
∂x = 2 R − x
2
∂2E √ −2x
2. ∂x2 = R2 −x2
What can we do with this? We can use this as input into our algorithm to compute teh error and compensate for the degree
of defocusing of the lens. In practice, there are other factors that lead to fuzzy edge profiles aside from defocusing, but this
defocusing compensation helps.
Linear 1D Interpolation:
f (a)(b − x) + f (b)(x − a)
f˜(x) =
b−a
We can also leverage more sophisticated interpolation methods, such as cubic spline.
71
11.2.12 CORDIC
As we discussed in the previous lecture, CORDIC is an algorithm used to estimate vector direction by iteratively rotating a
vector into a correct angle. For this patent, we are interested in using CORDIC to perform a change of coordinates from cartesian
to polar:
(Ex , Ey ) → (E0 , Eθ )
Idea: Rotate a coordinate system to make estimates using test angles iteratively. Note that we can simply compute these with
square roots and arc tangents, but these can be prohibitively computationally-expensive:
q
E0 = Ex2 + Ey2
E
y
Eθ = arctan
Ex
Rather than computing these directly, it is faster to iteratively solve for the desired rotation θ by taking a sequence of iterative
rotations {θi }ni=1,2,··· . The iterative updates we have for this are, in matrix-vector form:
" # " #
(i+1) (i)
Ex cos θi sin θi Ex
(i+1) = − sin θ cos θi (i)
Ey i Ey
Gradients at next step = Rotation R by θi × Gradients at current step
How do we select {θi }ni=1,2,··· ? We can select progressively smaller angles. We can accept the candidate angle and invoke the
iterative update above if each time the candidate angle reduces |Ey | and increases |Ex |.
P
The aggregate rotation θ is simply the sum of all these accepted angles: θ = i θi
One potential practical issue with this approach is that it involves a significant number of multiplications. How can we avoid
this? We can pick the angles carefully - i.e. if our angles are given successively by π2 , π4 , π8 , ..., then:
2−i
sin θu 1 1 1
= i → rotation matrix becomes :
cos θi 2 cos θi −2−i 1
Note that this reduces computation to 2 additions per iteration. Angle we turn through becomes successively smaller:
r r
1 Y Y 1
cos θi = 1 + 2i → R = cos θi = 1 + 2i ≈ 1.16 (precomputed)
2 i i
2
11.3 References
1. Finite Differences, https://en.wikipedia.org/wiki/Finite difference
2. Convolution, https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-
learning-1f6f42faee1
• Recognize object
72
• Determine pose of detected/recognized object
• Inspect object
Motivation for these approaches: In machine vision problems, we often manipulate objects in the world, and we want to
know what and where these objects are in the world. In the case of these specific problems, we assume prior knowledge of the
precise edge points of these objects (which, as we discussed in the two previous lectures, we know how to compute!)
• Note that these methods are oftentimes applied to processed, not raw images.
Idea: Try all possible positions/configurations of the pose space to create a match between the template and runtime im-
age of the object. If we are interested in the squared distance between the displaced template and the image in the other
object (for computational and analytic simplicity, let us only consider rotation for now), then we have the following optimization
problem:
ZZ
min (E1 (x − δx , y − δy ) − E2 (x, y))2 dxdy
δx ,δy D
73
Where we have denoted the two images separately as E1 and E2 .
In addition to framing this optimization mathematically as minimizing the squared distance between the two images, we can
also conceptualize this as maximizing the correlation between the displaced image and the other image:
ZZ
max E1 (x − δx , y − δy )E2 (x, y)dxdy
δx ,δy D
We can prove mathematically that the two are equivalent. Writing out the first objective as J(δx , δy ) and expanding it:
ZZ
J(δx , δy ) = (E(x − δx , y − δy ) − E2 (x, y))2 dxdy
D
ZZ ZZ ZZ
= (E12 (x − δx , y − δy ) − 2 E1 (x − δx , y − δy )E2 (x, y)dxdy + E22 (x, y)dxdy
D D D
ZZ
=⇒ arg min J(δx , δy ) = arg max E1 (x − δx , y − δy )E2 (x, y)dxdy
δx ,δy δx ,δy D
Since the first and third terms are constant, and since we are minimizing the negative of a scaled correlation objective, this is
equivalent to maximizing the correlation of the second objective.
We can also relate this to some of the other gradient-based optimization methods we have seen using Taylor Series. Suppose
δx , δy are small. Then the Taylor Series Expansion of first objective gives:
ZZ ZZ
2 ∂E1 ∂E1
(E1 (x − δx , y − δy ) − E2 (x, y)) dxdy = (E1 (x, y) − δx − + · · · − E2 (x, y))2 dxdy
D D ∂x ∂y
If we now consider that we are looking between consecutive frames with time period δt , then the optimization problem becomes
(after simplifying out E1 (x, y) − E2 (x, y) = −δt ∂E
∂t ):
ZZ
min (−δx Ex − δy Ey − δt Et )2 dxdy
δx ,δy D
A few notes about the methods here and the ones above as well:
• Note that the term under the square directly above looks similar to our BCCE constraint from optical flow!
• Gradient-based methods are cheaper to compute but only function well for small deviations δx , δy .
• Correlation methods are advantageous over least-squares methods when we have scaling between the images (e.g. due to
optical setting differences): E1 (x, y) = kE2 (x, y) for some k ∈ R .
Another question that comes up from this: How can we match at different contrast levels? We can do so with normalized
correlation. Below, we discuss each of the elements we account for and the associated mathematical transformations:
1. Offset: We account for this by subtracting the mean from each brightness function:
RR
0 E1 (x, y)dxdy
E1 (x, y) = E1 (x, y) − Ē1 , Ē1 = D RR
D
dxdy
RR
0 E2 (x, y)dxdy
E2 (x, y) = E2 (x, y) − Ē2 , Ē2 = D RR
D
dxdy
This removes offset from images that could be caused by changes to optical setup.
2. Contrast: We account for this by computing normalized correlation, which in this case is the Pearson correlation coefficient:
RR 0 0
E (x − δx , y − δy )E2 (x, y)dxdy
D 1
qRR qRR ∈ [−1, 1]
0 0
D 1
E (x − δ x , y − δ y )dxdy E
D 2
(x, y)dxdy
Where a correlation coefficient of 1 denotes a perfect match, and a correlation coefficient of -1 denotes a perfectly imperfect
match.
74
Are there any issues with this approach? If parts/whole images of objects are obscured, this will greatly affect correlation
computations at these points, even with proper normalization and offsetting.
With these preliminaries set up, we are now ready to move into a case study: a patent for object detection and pose esti-
mation using probe points and template images.
12.2 Patent 7,016,539: Method for Fast, Robust, Multidimensional Pattern Recognition
This patent aims to extend beyond our current framework since the described methodology can account for more than just
translation, e.g. can account for:
• Rotation
• Scaling
• Shearing
• We can also see in the detailed block diagram from this patent document that we greatly leverage gradient estimation
techniques from the previous patent on fast and accurate edge detection.
• For generalizability, we can run this at multiple scales/levels of resolution.
75
5. Remove short or weak chains.
6. Divide chains into segments of low curvature separated by conrner of high curvature.
7. Create evenly-spaced along segments and store them in model.
8. Determine pattern contrast and store in model.
76
12.2.4 Other Considerations for this Framework
We should also consider how to run our translational search. This search should be algorithmically conducted:
• Efficiently
• At different levels of resolution
• Hexagonally, rather than on a square grid - there is a π4 advantage of work done vs. resolution. Here, hexagonal peak
detection is used, and to break ties, we arbitrarily set 3 of the 6 inequalities as ≥, and the other 3 as >.
What is pose?
Pose is short for position and orientation, and is usually determined with respect to a reference coordinate system. In the
patent’s definition, it is the “mapping from pattern to image coordinates that represents a specific transformation
and superposition of a pattern onto an image.”
Next, let us look into addressing “noise”, which can cause random matches to occur. Area under S(θ) curve captures the
probability of random matches, and we can compensate by calculating error and subtracting it out of the results. However, even
with this compensation, we are still faced with additional noise in the result.
Instead, we can try to assign scoring weights by taking the dot product between gradient vectors: v̂1 · v̂2 = cos θ. But one
disadvantage of this approach is that we end up quantizing pose space.
Finally, let us look at how we score the matches between template and runtime image configurations: scoring functions.
Our options are:
• Normalized correlation (above)
• Simple peak finding
• Removal of random matches (this was our “N” factor introduced above)
12.3 References
1. Histogram of Oriented Gradients, https://en.wikipedia.org/wiki/Histogram of oriented gradients
77
3. Running this for all/all sampled pose configurations from the pose space produces a multidimensional scoring surface. We
can find matches by looking for peak values in this surface.
A few more notes on this framework, before diving into the math:
• Training is beneficial here, because it allows for some degree of automated learning.
• Evidence collected from the probe points is cumulative and computed using many local operations.
• Accuracy is limited by the quantization level of the pose spanned. The non-redundant components of this pose space are:
– 2D Translation, 2 DOF
– Rotation, 1 DOF
– Scaling, 1 DOF,
– Skew, 1 DOF,
– Aspect Ratio, 1 DOF
Together, the space of all these components compose a general linear transformation, or an affine transformation:
x0 = a11 x + a12 y + a13
y 0 = a21 x + a22 y + a23
While having all of these options leads to a high degree of generality, it also leads to a huge number of pose configurations,
even for coarse quantization. This is due to the fact that the number of configurations grows exponentially with the number
of DOF.
Quick note: the function max(0, wi ) is known as the Rectified Linear Unit (ReLU), and is written as ReLU(wi ). This
function comes up frequently in machine learning.
78
• Works with “compiled probes”. With these “compiled probes”, we only vary translation - we have already mapped
pose according to the other DOFs above.
• Used in a “coarse step”.
2. Binary Weighting with Direction and Normalization
P
(wi > 0)Rdir (||D(a + pi ) − di ||2 )
S1a (a) = i P
i (wi > 0)
Where the predicate (wi > 0) returns 1 if this is true, and 0 otherwise.
3. “Preferred Embodiment:
P
i (wi > 0)(Rdir (||D(a + pi ) − di ||2 ) − N )
S(a) = P
(1 − N ) i (wi > 0)
Note that this scoring function is not normalized, and is used in the fine scanning step of the algorithm.
5. Raw Weights with Gradient Magnitude Scaling and Normalization
P
wi M (a + pi )Rdir (||D(a + pi ) − di ||2 )
S3 (a) = i P
i wi
3. Transform the isophote according to the generalized linear transformation above with the degrees of freedom we
consider for our pose space.
4. After computing this transformed isophote, we can find the transformed gradient by finding the direction orthogonal
to the transformed isophote by rotating back 90 degrees using the rotation matrix given by:
0 −1
RI→G =
1 0
79
13.1.3 Another Application of “PatQuick”: Machine Inspection
Let us consider some elements of this framework that make it amenable and applicable for industrial machine part inspection:
• How do we distinguish between multiple objects (a task more generally known as multiclass object detection and classifi-
cation)? We can achieve this by using multiple models/template images, i.e. one model/template for each type
of object we want to detect and find the relative pose of.
• With this framework, we can also compute fractional matches - i.e. how well does one template match another object in
the runtime image.
• We can also take an edge-based similarity perspective - we can look at the runtime image’s edge and compare to edge
matches achieved with the model.
Therefore, in the general case, to find the perspective projection from world coordinates onto our image, we can combine the
two previous equations, carrying out the matrix multiplication along the way:
x Xc r11 XW + r12 YW + r13 ZW + X0
= =
f Zc r31 XW + r32 YW + r33 ZW + Z0
In this case, we can fold translation and rotation into a single matrix! We call this matrix T, and it is called a Homog-
raphy Matrix that encodes both rotation and translation. We will revisit this concept when we begin our discussion of 3D
transformations. Note that while our rotation matrix R is orthogonal, this homography matrix T is not necessarily.
80
13.2.1 How many degrees of freedom
For determining the relative pose between camera and world frames, let us consider the number of degrees of freedom:
• 3 for translation, since we can shift in x, y, and z
• 3 for rotation, since our rotations can preserve the xz axis, xy axis, and yz axis
If we have 9 entries in the rotation matrix and 3 in the translation vector (12 unknowns total), and only 6 degrees of freedom, then
how do we solve for these entries? There is redundancy - the rotation matrix has 6 constraints from orthonormality
(3 from constraining the rows to have unit size, and 3 from having each row being orthogonal to the other).
Motivation: Edge and line detection for industrial machine vision. This was one of the first machine vision patents (sub-
mitted in 1960, approved in 1962). We are looking for lines in images, but our gradient-based methods may not necessarily work,
e.g. due to non-contiguous lines that have “bubbles” or other discontinuities. These discontinuities can show up especially for
smaller resolution levels.
Idea: The main idea of the Hough Transform is to intelligently map from image/surface space to parameter space for
that surface. Let us walk through the mechanics of how parameter estimation works for some geometric objects.
To estimate the parameters of a line/accomplish edge detection, we utilize the following high-level procedure:
1. Map the points in the image to lines in Hough parameter space and compute intersections of lines.
2. Accumulate points and treat them as “evidence” using accumulator arrays.
81
3. Take peaks of these intersections and determine what lines they correspond to, since points in Hough parameter space
define parameterizations of lines in image space. See the example below:
Figure 6: Example of finding parameters in Hough Space via the Hough Transform.
Motivating example: Localization with Long Term Evolution (LTE) Network. Some context to motivate this appli-
cation further:
• LTE uses Time Division Multiplexing to send signals, a.k.a “everyone gets a slot”.
• CDMA network does not use this.
• You can triangulate/localize your location based off of how long it takes to send signals to surrounding cellular towers.
We can see from the diagram below that we map our circles into Hough parameter space to compute the estimate of parameters.
Figure 7: Example of using Hough Transforms to find the parameters of circles for LTE.
As we have seen in other problems we have studied in this class, we need to take more than one measurement. We cannot
solve these problems with just one measurement, but a single measurement constrains the solution. Note that this problem
assumes the radius is known.
13.3.3 Hough Transforms with Searching for Center Position and Radius
Another problem of interest is finding the center of position of a circle’s radius R and its center position (x, y), which comprise
the 3 dimensions in Hough parameter space. In Hough Transform space, this forms a cone that expands upward from R0 = 0,
where each cross-section of Z is the equation (x2 + y 2 = R2 ) for the given values of x, y, and R.
Every time we find a point on the circle, we update the corresponding set of points on the cone that satisfy this equation.
The above results in many cone intersections with one another - as before, we collect evidence from these intersections, build a
score surface, and compute the peak of this surface for our parameter estimates.
82
13.4 Sampling/Subsampling/Multiscale
Sampling is another important aspect for machine vision tasks, particularly for problems involving multiple scales, such as edge
and line detection. Sampling is equivalent to working at different scales.
What does the total work look like for some of these values?
1
• rn = rm = r = 2
1 1 4
work = = 1 =
1 − r2 1− 4
3
1
But downsampling by 2 each time is quite aggressive, and can lead to aliasing. Let us also look at a less aggressive sampling
ratio.
• rn = rm = r = √1
2
1 1
work = 2
= 1 =2
1−r 1− 2
How do we sample in this case? This is equivalent to taking every other sample in an image when we downsample. We
can do this using a checkerboard/chess board pattern. We can even see the selected result as a square grid if we rotate
our coordinate system by 45 degrees.
The SIFT (Scale-Invariant Feature Transform) algorithm uses this less aggressive sampling technique. SIFT is a descriptor-
based feature matching algorithm for object detection using a template image.
14.1 PatMAx
Another patent we will look at for object inspection is PatMAx.
14.1.1 Overview
Some introductory notes on this:
• This framework builds off of the previous PatQuick patent.
• This framework, unlike PatQuick, does not perform quantization of the pose space, which is one key factor in enabling
sub-pixel accuracy.
83
• PatMAx assumes we already have an approximate initial estimate of the pose.
• PatMAx relies on an iterative process for optimizing energy, and each attraction step improves the fit of the configuration.
• Another motivation for the name of this patent is based off of electrostatic components, namely dipoles, from Maxwell. As
it turns out, however, this analogy works better with mechanical springs than with electrostatic dipoles.
• PatMAx performs an iterative attraction process to obtain an estimate of the pose.
• An iterative approach (e.g. gradient descent, Gauss-Newton, Levenberg-Marquadt) is taken because we likely will not
have a closed-form solution in the real world. Rather than solving for a closed-form solution, we will run this iterative
optimization procedure until we reach convergence.
• Relating this framework back to PatQuick, PatMAx can be run after PatQuick computes an initial pose estimate, which
we can then refine using PatMAx. In fact, we can view our patent workflow as:
Figure 8: An overview of how the patents we have looked at for object inspection fit together.
Now that we have a high-level overview, we are now ready to dive more into the specifics of the system.
3. We map the feature-detected runtime image’s features back to the field (this is more computationally-efficient than mapping
the field to the runtime image).
84
*For field generation, we can in turn discuss the steps needed to generate such a field:
1. Initialize
2. Seed
3. Connect
4. Chain
5. Filter
6. Segment
7. Propagate
Many of the steps outlined in this field generation process were also leveraged in the PatQuick method.
Another important aspect of training is computing field dipoles. A few notes on this:
• Field dipoles correspond to edge fragments.
• Field dipoles are created as a data structure of flags that provide information about proximity to other components, such
as the edge.
Some other notes on this framework:
• Edge detection is largely the same procedure that we have seen in the previous patents (e.g. PatQuick). However, note
that because this framework seeks to obtain highly-precise estimates accurate to the sub-pixel level, PatMAx does not use
CORDIC or quantized gradient directions.
• Field dipoles are computed during training.
• The chaining procedure used in PatMAx is similar to the process we saw before: (i) Link chains, and then (ii) Remove
short (weak) chains.
• For initialization, the array contains a vector field, but the vectors do not cover the entire array.
We will now explore some specific elements of this framework:
85
Figure 10: The Attraction module for the PatMAx system. Note that this produces a refined estimate of the pose at the output,
which is one of the main goals of the PatMAx system.
Intuition with Mechanical Springs: Scaling adjustments via scaled transformations can be conceptualized as a set of
mechanical springs (rather than electrostatic dipoles) that are adjusted until an optimal configuration of the degrees of freedom
is found.
86
14.1.6 Comparing PatMAx to PatQuick
To better understand these frameworks (and how they potentially fit together for cascaded object inspection, let us draw some
comparisons between PatQuick and PatMAx):
• PatQuick searched all pose space and does not require an initial guess - PatMAx does require an initial estimate/guess of
the pose in order to produce a refined estimate.
• For PatMAx, there is repeated emphasis on avoiding techniques such as thresholding and quantization of gradient directions
that have been used in the previous patents we have looked at. This makes sense, since PatMAx aims to output a more
refined estimate of the pose than these other frameworks (i.e. reach sub-pixel accuracy).
• Using “dipoles” for PatMAx is misguided - using physical springs as an analog is much more physically consistent.
• For PatMAx, we use evidence collected for determining the quality of alignment, which in turn determines the quality of
our refined pose estimate.
• PatMAx and PatQuick each have different methods for assigning weights.
• PatMAx is a nonlinear optimization problem, and therefore does not have a closed-form solution. PatMAx is also iterative
- alignment quality and matched edges get closer with more iterations of optimization.
Together, these three elements composed of weighted evidence from the dipoles compose our 4 DOF.
87
14.2 Finding Distance to Lines
One application of performing this task is to improve the performance of edge detection systems by combining shorter edge
fragments of objects into longer edge fragments.
x0 = x cos θ + y sin θ
y 0 = −x sin θ + y cos θ
0
x cos θ sin θ x
I.e. 0 =
y − sin θ cos θ y
x00 = x0
y = y0 − ρ
00
= −x sin θ + y cos θ − ρ
y 00 + x sin θ − y cos θ + ρ = 0
y 00 + x00 sin θ − y cos θ + ρ = 0
This problem can be solved through our standard calculus approaches of finding the first-order conditions of our objective J(ρ, θ)
on our degrees of freedom ρ and θ. Since we have two degrees of freedom, we have two First-Order Conditions:
∂J(ρ,θ)
1. ∂ρ = 0:
N
∂ ∂ X
(J(ρ, θ)) = (xi sin θ − yi cos θ + ρ)2
∂ρ ∂ρ i=1
N
X
=2 (xi sin θ − yi cos θ + ρ) = 0
i=1
N
X N
X N
X
= sin θ xi − cos θ yi + ρ =0
i=1 i=1 i=1
= N x̄ sin θ − N ȳ cos θ + N ρ = 0
= x̄ sin θ − ȳ cos θ + ρ = 0
∆ 1 PN ∆ 1 PN
(Where x̄ = N i=1 xi and ȳ =N i=1 yi .)
88
Though this does not give us the final answer, it does provide information on how our solution is constrained, i.e. the line
must pass through the centroid given by the mean (x̄, ȳ). Let us now look at the second FOC to combine insights from
that FOC with this FOC in order to obtain our solution.
∂J(ρ,θ)
2. ∂θ = 0:
Before computing this derivative, let us move our coordinates to the centroid, i.e. subtract the mean:
0 0
xi = xi − x̄ −→ xi = x̄ + xi
0 0
yi = yi − ȳ −→ yi = ȳ + yi
Plugging this substituted definition into our equations renders them such that the centroid cancels out. Let us now compute
the second FOC:
N
∂ ∂ X
(J(ρ, θ)) = (xi sin θ − yi cos θ + ρ)2
∂θ ∂θ i=1
N
X
=2 (xi sin θ − yi cos θ + ρ)(x0 cos θ + y 0 sin θ) = 0
i=1
N
X
= x02 sin θ cos θ + x0 y 0 sin2 θ − x0 y 0 cos2 θ − y 02 cos θ sin θ) = 0
i=1
N
X N
X
= (x2i − yi2 ) sin θ cos θ = xi yi (cos2 θ − sin2 θ) = 0
i=1 i=1
N N
1X 2 X
= (xi − yi2 ) sin(2θ) = xi yi cos(2θ) = 0
2 i=1 i=1
PN
sin(2θ) 2 xi yi
= = tan(2θ) = PN i=1
cos(2θ) 2 2
i=1 (xi − yi )
Therefore, solving the FOCs gives us a closed-form least squares estimate of this line parameterized by (ρ, θ). This solution,
unlike the Cartesian y = mx + b fitting of a line, is independent of the chosen coordinate system, allowing for further flexibility
and generalizability.
The goal of this system is to efficiently compute filters for multiscale. For this, we assume the form of an Nth -order piece-
wise polynomial, i.e. a Nth -order spline.
89
14.3.1 System Overview
The block diagram of this system can be found below:
Figure 12: Block diagram of this sparse/fast convolution framework for digital filtering. Note that this can be viewed as a
compression problem, in which differencing compresses the signal, and summing decompresses the signal.
This sparse structure makes convolutions much easier and more efficient to compute by reducing the size/cardinality of the
support (we will discuss what a support is in greater detail in the next lecture, as well as how the size of a support affects
computational efficiency, but effectively the support is the subset of the domain of a function that is not mapped to zero).
• Why do we apply an order-(N+1) summing operator? We apply this because we need to “invert” the effects of the order-
(N+1) difference. Intuitively, this makes sense that the order-(N+1) difference and the order-(N+1) sum commute, because
we are simply performing iterative rounds of subtraction and addition (respectively), which we know are commutative
algebraic operations. I.e, representing differencing and summing as linear operators where their order is the same, we have:
First Order : DS = I
Second Order : DDSS = DSDS = (DS)(DS) = II = I
..
.
Order K : (D)K (S)K = (DS)K = I K = I
90
14.3.2 Integration and Differentiation as Convolutions
Conceptualizing these differencing/differentiation and summing/integration as linear operators that are commutative and asso-
ciative, we can then extend this framework to conceptualizing these operators as convolutions:
• Integration: This corresponds to the convolution of our piecewise polynomial f (x) with a unit step function u(x).
• Differentiation: This corresponds to the convolution of our piecewise polynomial f (x) with two scaled impulses in
opposite directions: 21 δ(x + 2 ) + 12 δ(x − 2 ).
This motivates the discussion of some of the properties of convolution this system relies on in order to achieve high performance.
For operators A, B, and C, we have that:
1. Commutativity: A ⊗ B = B ⊗ A
2. Associativity: A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C
These properties stem from the fact that in the Fourier domain, convolution is simply multiplication. Therefore, convolution
obeys all the algebraic properties of multiplication.
Figure 13: Comparison of standard filtering and efficient/sparse filtering procedures, where the sparse filtering approach is
illustrated as a compression problem. Here, H represents the filter, X and Y represent the uncompressed inputs and outputs,
respectively, and x and y represent the compressed inputs and outputs.
91
14.3.5 Filtering (For Multiscale): Anti-Aliasing
We have now reduced the amount of computation needed to compute efficient digital filtering. We now only need final ingredient
for multiscale that is motivated by Shannon and Nyquist: anti-aliasing methods (filters).
Recall from Shannon/Nyquist (the Sampling Theorem) that in order to sample (for instance, when we subsample in multi-
scale problems) without aliasing and high-frequency artifacts, it is critical that we first remove high-frequency components from
the signal we are sampling. This high-frequency component removal can be achieved with approximate low pass filtering (which
we will cover in greater detail during the next lecture).
We will see in the next lecture that one way we can achieve approximate low pass filtering is by approximating a spatial
sinc function (which transforms into an ideal low pass filter in the frequency domain) as a spline.
One open research problem: how can we extend this sparse convolution structure to 2D?
Finally, we will finish this lecture with a fun fact on calcite crystals. Calcite crystals are a type of birefringent mate-
rial, which means that they have two indices of refraction that depend on two polarizations (one in the x-direction and one
in the y-direction), and therefore reflect light into two different ways. As we will see in the next lecture, adding birefringent
lenses in the analog domain can prevent aliasing affects from occurring that would otherwise be unavoidable. DSLRs have these
birefringent lenses affixed to them for this specific anti-aliasing purpose.
• If we can sample a signal at a high enough frequency, we can recover the signal exactly through reconstruction.
• How is this reconstruction performed? We will convolve samples from the signal with sinc functions, and then superimpose
these convolved results with one another.
• It is hard to sample from a signal with infinite support.
• What frequency do we need for this? Intuitively, to pick out how fast the signal needs to be moving, we certainly need to
sample as quickly as the signal’s fastest-varying component itself. But do we need to sample even faster? It turns out the
answer is yes. As we will see below:
fsample
fmax < =⇒ fsample > 2fmax
2
I.e. we will need to sample at more than twice the frequency of the highest-varying component of the signal.
Let us look at this graphically. What happens if we sample at the frequency of the signal?
92
Cosine Function, cos(x)
2
True Function
Interpolated Function
1
f (x)
0
−1
−2
0 2 4 6 8 10 12 14
x
Figure 14: Sampling only once per period provides us with a constant interpolated function, from which we cannot recover the
original. Therefore, we must sample at a higher frequency.
Note that this holds at points not on the peaks as well:
−1
−2
0 2 4 6 8 10 12 14
x
Figure 15: Sampling only once per period provides us with a constant interpolated function, from which we cannot recover the
original. Therefore, we must sample at a higher frequency.
93
Cosine Function, cos(x)
2
True Function
Interpolated Function
1
f (x)
0
−1
−2
0 2 4 6 8 10 12 14
x
Figure 16: Sampling at twice the rate as the highest-varying component almost gets us there! This is known as the Nyquist
Rate. It turns out we need to sample at frequencies that are strictly greater than this frequency to guarantee no aliasing - we
will see why in the example below.
Is this good enough? As it turns out, the inequality for Nyquist’s Sampling Theorem is there for a reason: we need to sample
at greater than twice the frequency of the original signal in order to uniquely recover it:
Cosine Function, cos(x)
2
True Function
Interpolated Function
1
f (x)
−1
−2
0 2 4 6 8 10 12 14
x
Figure 17: It turns out we need to sample at frequencies that are strictly greater than this frequency to guarantee no aliasing -
we will see why in the example below.
Therefore, any rate above 2 times the highest-varying frequency component of the signal will be sufficient to completely avoid
aliasing. As a review, let us next discuss aliasing.
15.1.2 Aliasing
Aliasing occurs when higher frequencies become indistinguishable from lower frequencies, and as a result they add interference
and artifacts to the signal that are caused by sampling at too low of a frequency.
94
Now let us consider what happens when we add multiples of 2π to this:
f
0
sk−2π = cos 2π k − 2πk
fs
f
0
= cos 2π −1 k
fs
f − f
0 s
= cos 2π k
fs
f − f
s 0
= cos 2π k , since cos(x) = cos(−x) ∀ x ∈ R
fs
Another way to put this - you cannot distinguish multiples of base frequencies with the base frequencies themselves if you sample
at too low a frequency, i.e. below the Nyquist Rate.
It turns out this computationally-simpler solution is through integral images. An integral image is essentially the sum of
values from the first value to the ith value, i.e if gi defines the ith value in 1D, then:
i
∆
X
Gi = gk ∀ i ∈ {1, · · · , K}
k=1
Why is this useful? Well, rather than compute averages (normalized sums) by adding up all the pixels and then dividing, we
simply need to perform a single subtraction between the integral image values (followed by a division by the number of elements
we are averaging). For instance, if we wanted to calculate the average of values between i and j, then:
j
1 X 1
ḡ[i,j] = gk = (Gj − Gi )
j−i j−i
k=i
This greatly reduces the amortized amount of computation, because these sums only need to be computed once, when we
calculate the initial values for the integral image.
95
Let us now see how block averaging looks in 2D - in the diagram below, we can obtain a block average for a group of pix-
els in the 2D range (i, j) in x and (k, l) in y using the following formula:
j l
1 XX
ḡ([i,j],[k,l]) = gx,y
(j − i)(l − k) x=i
y=k
But can we implement this more efficiently? We can use integral images again:
j
i X
X
Gi,j = gk,l
k=1 l=1
Figure 18: Block averaging using integral images in 2D. As pointed out above, block averaging also extends beyond pixels! This
can be computed for other measures such as gradients (e.g. Histogram of Gradients).
Using the integral image values, the block average in the 2D range (i, j) in x and (k, l) in y becomes:
1
ḡ([i,j],[k,l]) = (Gj,l + Gi,k ) − (Gi,l + Gj,k )
(j − i)(l − k)
Visually:
96
Block averaging h(x) for δ = 2
0.6
0.4
h(x)
0.2
−2 −1 0 1 2
x
Where jω corresponds to complex frequency. Substituting our expression into this transform:
Z ∞
H(jω) = h(x)e−jωx dx
−∞
Z δ
2 1 −jωx
= e dx
− δ2 δ
1 1 −jωx x= δ2
= [e ]x=− δ
δ jω 2
jωδ jωδ
e− 2 −e 2
=
−jωδ
δω
sin 2
= δω
(Sinc function)
2
Where in the last equality statement we use the identity given by:
ejx − e−jx
sin(x) =
−2j
97
δω
sin 2
Sinc function H(jω) = δω
2
0.5
H(jω)
0
−8 −6 −4 −2 0 2 4 6 8
jω
Figure 20: Example H(jω) for δ = 2. This is the Fourier Transform of our block averaging “filter”.
Although sinc functions in the frequency domain help to attenuate higher frequencies, they do not make the best lowpass filters.
This is the case because:
• Higher frequencies are not completely attenuated.
• The first zero is not reached quickly enough. The first zero is given by:
ω0 δ π π
= =⇒ ω0 =
2 2 δ
Intuitively, the best lowpass filters perfectly preserve all frequencies up to the cutoff frequencies, and perfectly attenuate every-
thing outside of the passband. Visually:
δω
sin 2
Sinc function H(jω) = δω
2
1 Sinc Filter
Ideal lowpass filter
0.5
H(jω)
−15 −10 −5 0 5 10 15
jω
Figure 21: Frequency response comparison between our block averaging filter and an ideal lowpass filter. We also note that the
“boxcar” function and the sinc function are Fourier Transform pairs!
Although sinc functions in the frequency domain help to attenuate higher frequencies, they do not make the best lowpass filters.
This is the case because:
• Higher frequencies are not completely attenuated.
98
• The first zero is not reached quickly enough. The first zero is given by:
ω0 δ π π
= =⇒ ω0 =
2 2 δ
Where else might we see this? It turns out cameras perform block average filtering because pixels have finite width over which
to detect incident photons. But is this a sufficient approximate lowpass filtering technique? Unfortunately, oftentimes it is not.
We will see below that we can improve with repeated block averaging.
f (x) y1 (x)
b(x)
What happens if we add another filter? Then, we simply add another element to our convolution:
y2 (x) = (f (x) ⊗ b(x)) ⊗ b(x) = y1 (x) ⊗ b(x)
Adding this second filter is equivalent to convolving our signal with the convolution of two “boxcar” filters, which is a triangular
filter:
Triangular filter for δ = 2.
0.8
0.6
h(x)
0.4
0.2
−3 −2 −1 0 1 2 3
x
Figure 22: Example of a triangular filter resulting from the convolution of two “boxcar” filters.
Additionally, note that since convolution is associative, for the “two-stage” approximate lowpass filtering approach above, we
do not need to convolve our input f (x) with two “boxcar” filters - rather, we can convolve it directly with our trinagular filter
b2 (x) = b(x) ⊗ b(x):
y2 (x) = (f (x) ⊗ b(x)) ⊗ b(x)
= f (x) ⊗ (b(x) ⊗ b(x))
= f (x) ⊗ b2 (x)
Let us now take a brief aside to list out how discontinuities affect Fourier Transforms in the frequency domains:
F
• Delta Function: δ(x) ←
→1
Intuition: Convolving a function with a delta function does not affect the transform, since this convolution simply
produces the function.
99
F 1
• Unit Step Function: u(x) ←
→ jω
Intuition: Convolving a function with a step function produces a degree of averaging, reducing the high frequency
components and therefore weighting them less heavily in the transform domain.
F
→ − ω12
• Ramp Function: r(x) ←
Intuition: Convolving a function with a ramp function produces a degree of averaging, reducing the high frequency
d F
components and therefore weighting them less heavily in the transform domain. Derivative: dx f (x) ←
→ jωF (jω)
Intuition: Since taking derivatives will increase the sharpness of our functions, and perhaps even create discontinuities, a
derivative in the spatial domain corresponds to multiplying by jω in the frequency domain.
As we can see from above, the more “averaging” effects we have, the more the high-frequency components of the signal will be
filtered out. Conversely, when we take derivatives and create discontinuities in our spatial domain signal, this increases high
frequency components of the signal because it introduces more variation.
To understand how we can use repeated block averaging in the Fourier domain, please recall the following special properties of
Fourier Transforms:
1. Convolution in the spatial domain corresponds to multiplication in the frequency domain, i.e. for all
f (x), g(x), h(x) with corresponding Fourier Transforms F (jω), G(jω), H(jω), we have:
F
h(x) = f (x) ⊗ g(x) ←
→ H(jω) = F (jω)G(jω)
2. Multiplication in the spatial domain corresponds to convolution in the frequency domain, i.e. for all
f (x), g(x), h(x) with corresponding Fourier Transforms F (jω), G(jω), H(jω), we have:
F
h(x) = f (x)g(x) ←
→ H(jω) = F (jω) ⊗ G(jω)
For block averaging, we can use the first of these properties to understand what is happening in the frequency domain:
F
→ Y (jω) = F (jω)(B(jω)2 )
y2 (x) = f (x) ⊗ (b(x) ⊗ b(x)) ←
0.8
0.6
H 2 (jω)
0.4
0.2
−8 −6 −4 −2 0 2 4 6 8
jω
Figure 23: Example H 2 (jω) for δ = 2. This is the Fourier Transform of our block averaging “filter” convolved with itself in the
spatial domain.
.
100
This is not perfect, but it is an improvement. In fact, the frequencies with this filter drop off with magnitude ω1 )2 . What
happens if we continue to repeat this process with more block averaging filters? It turns out that for N “boxcar” filters that we
N
use, the magnitude will drop off as ω1 . Note too, that we do not want to go “too far” in this direction, because this repeated
block averaging process will also begin to attenuate frequencies in the passband of the signal.
15.4.1 Warping Effects and Numerical Fourier Transforms: FFT and DFT
Two main types of numerical transforms we briefly discuss are the Discrete Fourier Transform (DFT) and Fast Fourier Transform
(FFT). The FFT is an extension of the DFT that relies on using a “divide and conquer” approach to reduce the computational
runtime from f (N ) ∈ O(N 2 ) to f (N ) ∈ O(N log N ) [3].
Mathematically, the DFT is given as a transform that transforms a sequence of N complex numbers {xn }N n=1 into another
sequence of N complex numbers {Xk }Nk=1 [4]. The transform for the K th
value of this output sequence is given in closed-form
as:
N
2π
X
Xk = xn e−j N kn
i=1
And the inverse transform for the nth value of this input sequence is given as:
N
2π
X
xn = Xk ej N kn
k=1
One aspect of these transforms to be especially mindful of is that they introduce a wrapping effect, since transform values are
spread out over 2π intervals. This means that the waveforms produced by these transforms, in both the spatial (if we take the
inverse transform) and frequency domains may be repeated - this repeating can introduce undesirable discontinuities, such as
those seen in the graph below:
Repeated Function x2
8
6
f (x)
0
0 2 4 6 8
x
Figure 24: Example of a repeated waveform that we encounter when looking at DFTs and FFTs.
1
Fun fact: It used to be thought that natural images had a power spectrum (power in the frequency domain) that falls off as ω.
It turns out that this was actually caused by warping effects introduced by discrete transforms.
This begs the question - how can we mitigate these warping effects? Some methods include:
• Apodizing: This corresponds to multiplying your signal by a waveform, e.g. Hamming’s Window, which takes the form
akin to a Gaussian, or an inverted cosine.
• Mirroring: Another method to mitigate these warping effects is through waveform mirroring - this ensures continuity at
points where discontinuties occurred:
101
Repeated Function x2
8
f (x)
4
0
0 2 4 6 8
x
Figure 25: Example of a mirrored waveform that we can use to counter and mitigate the discontinuity effects of warping
from transforms such as the DFT and FFT.
1 1
With this approach, the power spectrum of these signals falls off at ω2 , rather than ω.
• Infinitely Wide Signal: Finally, a less practical, but conceptual helpful method is simply to take an “infinitely wide
signal”.
Let us now switch gears to talk more about the unit impulse and convolution.
An impulse can be conceptualized as the limit in which the variance of this Gaussian distribution σ 2 goes to 0, which corresponds
to a Fourier Transform of 1 for all frequencies (which is the Fourier Transform of a delta function).
Another way to consider impulses is that they are the limit of “boxcar” functions as their width goes to zero.
Let us next generalize from a single impulse function to combinations of these functions.
102
Correlating (*note that this is not convolution - if we were to use convolution, this derivative would be flipped) this combination
of impulse “filter” with an arbitrary function f (x), we compute a first-order approximation of the derivative:
Z ∞
f 0 (x) ≈ f (x)h(x)dx
−∞
Z ∞
1
= δ(x + ) − δ(x − ) dx
−∞ 2 2
Therefore, combinations of impulses can be used to represent the same behavior as the “computational molecules” we identified
before. It turns out that there is a close connection between linear, shift-invariant operators and derivative operators.
One way to achieve this analog filtering is through Birefringent Lenses. Here, we essentially take two “shifted” images
by convolving the image with a symmetric combination of offset delta functions, given mathematically by:
1 1
h(x) = δ(x + ) + δ(x − ) for some > 0
2 2 2 2
Let us look at the Fourier Transform of this filter, noting the following Fourier Transform pair:
F
→ e−jωx0
δ(x − x0 ) ←
With this we can then express the Fourier Transform of this filter as:
Z ∞
1
F (jω) = √ h(x)e−jωx dx
2π −∞
1 − jω jω
= e 2 +e 2
2
ω
= cos
2
π
With this framework, the first zero to appear here occurs at ω0 = . A few notes about these filters, and how they relate to
high-frequency noise suppression.
• When these birefringent lenses are cascaded with a block averaging filter, this results in a combined filtering scheme in
which the zeros of the frequency responses of these filters cancel out most of the high-frequency noise.
• In the 2D case, we will have 2 birefringent filters, one for the x-direction and one for the y-direction. Physically, these are
rotated 90 degrees off from one another, just as they are for a 2D cartesian coordinate system.
103
• High-performance lowpass filtering requires a large support (see definition of this below if needed) - the computational
costs grow linearly with the size of the support in 1D, and quadratically with the size of the support in 2D. The support
of a function is defined as the set where f (·) is nonzero [5]:
supp(f ) = {x : f (x) 6= 0, x ∈ R}
• Therefore, one way to reduce the computatonal costs of a filtering system is to reduce the size/cardinality of the support
|supp(f )| - in some sense to encourage sparsity. Fortunately, this does not necessarily mean looking over a narrower range,
but instead just considering less points overall.
Therefore, we can represent integral and derivative operators as Fourier Transform pairs too, denoted S for integration and D
for derivative:
F
• S←
→ 1
jω
F
• D←
→ jω
Niote that we can verify this by showing that convolving these filter operators corresponds to multiplying these transforms in
frequency space, which results in no effect when cascaded together:
F
1 F
(f (x) ⊗ D) ⊗ S = f (x) ⊗ (D ⊗ S) ←
→ F (jω) jω = F (jω) ←
→ f (x)
jω
R∞
f (x) 0
f (ξ)dξ f (x)
S D
d
f (x) dx f (x) f (x)
D S
Can we extend this to higher-order derivatives? It turns out we can. One example is the convolution of two derivative operators,
which becomes:
F
h(x) = δ(x + ) − 2δ(x) + δ(x − ) = D ⊗ D ← → H(jω) = D(jω)2 = (jω)2 = −ω 2 (Recall that j 2 = −1)
2 2
In general, this holds. Note that the number of integral operators S must be equal to the number of derivative operators D, e.g.
for K order:
i=1 S ⊗
⊗K i=1 D ⊗ f (x)
⊗K
104
• Recall that one key element of computational efficiency we pursue is to use integral images for block averaging, which is
much more efficient than computing naive sums, especially if (1) This block averaging procedure is repeated many times
(the amortized cost of computing the integral image is lessened) and (2) This process is used in higher dimensions.
• Linear interpolation can be conceptualized as connecting points together using straight lines between points. This
corresponds to piecewise-linear segments, or, convolution with a triangle filter, which is simply the convolution of two
“boxcar filters”:
f (x) = f (1)x + f (0)(1 − x)
Unfortunately, one “not-so-great” property of convolving with triangular filters for interpolation is that the noise in the
interpolated result varies depending on how far away we are from the sampled noise.
• Nearest Neighbor techniques can also be viewed through a convolutional lens - since this method produces piecewise-
constant interpolation, this is equivalent to convolving our sampled points with a “boxcar” filter!
The inverse transform of this can be thought of as a sinc function in polar coordinates:
B 2 J1 (ρB)
f (ρ, θ) =
2π ρB
A few notes about this inverse transform function:
• This is the point spread function of a microscope.
• J1 (·) is a 1st-order Bessel function.
• This relates to our defocusing problem that we encountered before.
• In the case of defocusing, we can use the “symmetry” property of the Fourier Transform to deduce that if we have a circular
point spread function resulting from defocusing of the lens, then we will have a Bessel function in the frequency/Fourier
domain.
• Though a pointspread function is a “pillbox” in the ideal case, in practice this is not perfect due to artifacts such as lens
aberrations.
15.6 References
1. Gibbs Phenomenon, https://en.wikipedia.org/wiki/Gibbs phenomenon
2. Summed-area Table, https://en.wikipedia.org/wiki/Summed-area table
3. Fast Fourier Transform, https://en.wikipedia.org/wiki/Fast Fourier transform
4. Discrete Fourier Transform, https://en.wikipedia.org/wiki/Discrete Fourier transform
5. Support, https://en.wikipedia.org/wiki/Support (mathematics)
105
16.1 Photogrammetry Problems: An Overview
Four important problems in photogrammetry that we will cover are:
• Absolute Orientation 3D ←→ 3D
• Relative Orientation 2D ←→ 2D
• Exterior Orientation 2D ←→ 3D
• Intrinsic Orientation 3D ←→ 2D
Below we discuss each of these problems at a high level. We will be discussing these problems in greater depth later in this and
following lectures.
More generally, with localization, our goal is to find where we are and how we are oriented in space given a 2D image
and a 3D model of the world.
106
16.2.1 Binocular Stereopsis
To motivate binocular stereo, we will start with a fun fact: Humans have ≈ 12 depth cues. One of these is binocular stereopsis,
or binocular stereo (binocular ≈ two sensors).
Figure 26: The problem of binocular stereo. Having two 2D sensors enables us to recover 3D structure (specifically, depth) from
the scene we image. A few key terms/technicalities to note here: (i) The origin is set to be halfway between the two cameras,
(ii) The distance between the cameras is called the baseline, and (iii) binocular disparity refers to the phenomenon of having
each camera generate a different image. Finally, note that in practice, it is almost impossible to line up these cameras exactly.
Goal: Calculate X and Z. This would not be possible with only monocular stereo (monocular ≈ one sensor). Using similar
triangles, we have:
b
X− 2 xl
=
Z f
b
X+ 2 xr
=
Z f
xr − xl b bf
−→ = =⇒ z =
f z xr − xl
Where the binocular disparity is given by (xr − xl ). Solving this system of equations becomes:
xr + xl 1
X=b , Z = bf
xr − xl xr − xl
Note that, more generally, we can calculate in the same way as well! From these equations, we can see that:
• Increasing the baseline increases X and Z (all things equal)
• Increasing the focal length increases Z.
But it is important to also be mindful of system constraints that determine what range, or the exact value, of what these variables
can be. For instance, if we have a self-driving car, we cannot simply have the baseline distance between our two cameras be 100
meters, because this would require mounting the cameras 100 meters apart.
107
Figure 27: General case of absolute orientation: Given the coordinate systems (xl , yl , zl ) ∈ R3×3 and (xr , yr , zr ) ∈ R3×3 , our
goal is to find the transformation, or pose, between them using points measured in each frame of reference pi .
Here, we also note the following, as they are important for establishing that we are not limited just to finding the transfor-
mation between two camera sensors:
• “Two cameras” does not just mean having two distinct cameras (known as “two-view geometry”); this could also refer to
having a single camera with images taken at two distinct points in time (known as “Visual Odometry” or “VO”).
• Note also that we could have the same scenario described above when introducing this problem, which is that we have
either one camera and multiple objects, or multiple objects and one cameras. Hence, there is a sense of duality here with
solving these problems:
1. One camera, two objects?
2. Two cameras, one object (Two-View Geometry)?
3. Camera moving (VO)?
4. Object moving?
To better understand this problem, it is first important to precisely define the transformation or pose (translation + rotation)
between the two cameras.
rr = R(rl ) + r0
Where:
• R describes the rotation. Note that this is not necessarily always an orthonormal rotation matrix, and we have more
generally parameterized it as a function.
• r0 ∈ R3 describes the translation.
• The translation vector and the rotation function comprise our unknowns.
When R is described by an orthonormal rotation matrix R, then we require this matrix to have the following properties:
108
2. R is skew-symmetric, so we end up having 3 unknowns instead of 9. Skew-symmetric matrices take the form:
a b c
R = −b d e ∈ R3×3
−c −e f
F = Ke (48)
Z
1
E = F = ke2 (49)
2
Where e (the “error”) is the distance between the measured point in the two frames of reference. Therefore, the solution to
this problem involves “minimizing the energy” of the system. Interestingly, energy minimization is analogous to least squares
regression.
Using the definition of our transformation, our error for the ith point is given by:
ei = (R(rl,i ) + r0 ) − rr,i
Then our objective for energy minimization and least squares regression is given by:
N
X
min ||ri ||22
R,r0
i=1
This is another instance of solving what is known as the “inverse” problem. The “forward” and “inverse” problems are given by:
• Forward Problem: R, r0 −→ {rr,i , rl,i }N
i=1 (Find correspondences)
Now that we have framed our optimization problem, can we decouple the optimization over translation and rotation? It turns
out we can by setting an initial reference point. For this, we consider two methods.
2. Take a second point, look at the distance between them, and compute the unit vector. Take unit vector as one axis of the
system:
xl
xl = rl,2 − rl,1 −→ x̂l =
||xl ||2
3. Take a third vector, and compute the component of the vector that is equal to the vector from point 1 to point 2. Now,
points 1, 2, and 3 from (x, y) plane, and the removed component from point 3 forms the y-component of the coordinate
system.
4. We can compute the y vector:
109
5. Obtain the z-axis by the cross product:
(Then we also have that ẑl · ŷl = 0 and ẑl · x̂l = 0. This then defines a coordinate system (x̂l , ŷl , ẑl ) for the left camera/point
of reference. Note that this only requires 3 points!
6. To calculate this for the righthand frame of reference, we can repeat steps 1-5 for the righthand side to obtain the coordinate
system (x̂r , ŷr , ẑr ).
From here, all we need to do is find the transformation (rotation, since we have artificially set the origin) between the coordinate
system (x̂r , ŷr , ẑr ) and (x̂l , ŷl , ẑl ). Mathematically, we have the following equations:
x̂r = R(x̂l )
ŷr = R(ŷl )
ẑr = R(ẑl )
We can condense these equations into a matrix equation, and subsequently a matrix inversion problem:
x̂r ŷr ẑr = R x̂l ŷl ẑl
How do we generalize this from 2D to 3D? How do we figure out these axes of inertia (see the example below)?
ZZZ
I= r2 dm
O
110
Figure 29: Computing the axes of inertia for a 3D blob - we can generalize the notion of inertia from 2D to 3D.
One trick we can use here is using the centroid as the origin.
r0 = (r · ω̂)ω̂
Then:
r − r0 = r − (r · ω̂)ω̂
r2 = (r − r0 ) · (r − r0 )
= r · r − 2(r · ω̂)2 + (r · ω̂)2 (ω̂ · ω̂)
= r · r − 2(r · ω̂)2 + (r · ω̂)2
= r · r − (r · ω̂)2
111
2. (r · r):
(r · r) = (r · r)(ω̂ · ω̂)
= (r · r)ω̂ T I3 ω̂
From this expression, we want to find the extrema. We can solve for the extrema by solving for the minimium and maximums
of this objective:
1. Minimum: minω̂ ω̂ T Aω̂
2. Maximum: maxω̂ ω̂ T Aω̂
∆ RRR
Where A = O
((r · r)I3 − rrT )dm. This matrix is known as the “inertia matrix”.
How can we solve this problem? We can do so by looking for the eigenvectors of the inertia matrix:
• For minimization, the eigenvector corresponding to the smallest eigenvalue of the inertia matrix corresponds to our
solution.
• For maximization, the eigenvector corresponding to the largest eigenvalue of the inertia matrix corresponds to our
solution.
• For finding the saddle point, our solution will be the eigenvector corresponding to the middle eigenvalue.
Since this is a polynomial system of degree 3, we have a closed-form solution! These three eigenvectors will form a coordinate
system for the lefthand system.
Taking a step back, let us look at what we have done so far. We have taken the cloud of points from the left frame of
reference/coordinate system and have estimated a coordinate system for it by finding an eigenbasis from solving these opti-
mization problems over the objective ω̂ T Aω̂. With this, we can then repeat the same process for the righthand system.
112
2. Rotations preserve length: R(a) · R(a) = a · a
3. Rotations preserve angles: |R(a) × R(b)| = |a × b|
4. Rotations preserve triple products: [R(a) R(b) R(c)] = [a b c] (Where the triple product [a b c] = a · (b × c)).
Using these properties, we are now ready to set this up as least squares, using correspondences between points measured between
two coordinate systems:
Transform: rr = R(rl ) + r0
And we can write our optimization as one that minimizes this error term:
N
X
R∗ , r∗0 ||ei ||2
i=1
Next, we can compute teh centroids of the left and right systems:
N N
1 X 1 X
r̄l = rl,i , r̄r = rr,i
N i=1 N i=1
We can use these computed centroids from points so we do not have to worry about translation. A new feature of this system
is that the new centroid is at the origin. To prove this, let us “de-mean” (subtract the mean) of our coordinates in the left and
righthand coordinate systems:
r0l,i = rl,i − r̄l , r0r,i = rr,i − r̄r
Because we subtract the mean, the mean of these new points now becomes zero:
N N
1 X 0 1 X 0
r̂0l = rl,i = 0 = r̂0r = r
N i=1 N i=1 r,i
Substituting this back into the objective, we can solve for an optimal rotation R:
N
X
R∗ = min0 ||r0i − R(r0l,i − r00 )||22
R,r0
i=1
N
X N
X
= min0 ||r0i − R(r0l,i )||22 − 2r00 (r0r,i − R(rl,i )) + N ||r00 ||22
R,r0
i=1 i=1
N
X
= min0 ||r0i − R(r0l,i )||22 + N ||r00 ||22
R,r0
i=1
Since only the last term depends on r00 , we can set r00 to minimize. Moreover, we can solve for the true r0 by back-solving later:
r0 = r̄r − R(r̄l )
Intuitively, this makes sense: the translation vector between these two coordinate systems/point clouds is the difference between
the centroid of the right point cloud and the centroid of the left point cloud after it has been rotated.
Since we now have that r00 = 0 ∈ R3 , we can write our error term as:
ei = r0r,i − R(r0l,i )
Which in turn allows us to write the objective as:
N
X N
X
min ||ei ||22 = (r0r,i − R(r0l,i )(r0r,i − R(r0l,i ))
i=1 i=1
N
X N
X N
X
= ||r0r,i ||22 − (r0r,i − R(r0l,i )) − ||r0l,i ||22
i=1 i=1 i=1
113
Where the first of these terms is fixed, the second of these terms is to be maximized (since there is a negative sign in front of
this term, this thereby minimizes the objective), and the third of these terms is fixed. Therefore, our rotation problem can be
simplified to:
N
X N
X
min ||ei ||22 = −2 (r0r,i − R(r0l,i )) (50)
i=1 i=1
d
To solve this objective, we could take the derivative dR (·) of the objective, but constraints make this optimization problem
difficult to solve (note that we are not optimizing over a Euclidean search space - rather, we are optimizing over a generalized
transformation). We will look into a different representation of R for next class to find solutions to this problem!
16.3 References
1. Bundle Adjustment, https://en.wikipedia.org/wiki/Bundle adjustment
17 Lecture 18: Rotation and How to Represent it, Unit Quaternions, the
Space of Rotations
Today, we will focus on rotations. Note that unlike translations, rotations are not commutative, which makes fitting
estimates to data, amongst other machine vision tasks, more challenging. In this lecture, we will cover rotation in terms of:
• Properties
• Representations
• Hamilton’s Quaternions
• Rotation as Unit Quaternion
• Space of rotations
• Photogrammetry
• Closed-form solution of absolute quaternions
• Divison algebras, quaternion analysis, space-time
We will start by looking at some motivations for why we might care about how we formulate and formalize rotations: What is
rotation used for?
• Machine vision
• Recognition/orientation
• Graphics/CAD
• Virtual Reality
• Protein Folding
• Vehicle Attitude
• Robotics
• Spatial Reasoning
• Path Planning - Collision Avoidance
114
17.1 Euclidean Motion and Rotation
Rotation and translation form Euclidean motion. Some properties of Euclidean motion, including this property. Euclidean
motion:
• Contains translation and rotation
• Preserves distances between points
• Finite rotations do not commute - this holds because rotations are defined by Lie groups in a non-Euclidean space.
n(n−1)
• The Degrees of Freedom for rotation in dimension n is given by 2 , which coincidentally equals 3 in 3 dimensions.
• Intuition: Oftentimes, it is easier to think about rotation “in planes”, rather than “about axes”. Rotations preserve
points in certain planes.
115
17.2.1 Isomorphism Vectors and Skew-Symmetric Matrices
A technical note that is relevant when discussing cross products: Although a cross product produces a vector in 3D, in higher
dimensions the result of a cross product is a subspace, rather than a vector.
Specifically, this subspace that forms the result of a higher-dimensional cross product is the space that is perpendicu-
lar/orthogonal to the two vectors the cross product operator is applied between.
With this set up, we can think of cross products as producing the following isomorphism vectors and skew-symmetric ma-
trices: One Representation:
a × b = Ab
0 −az ay
A = az 0 −ax
−ay ax 0
An Isomorphic Representation:
a × b = B̄a
0 bz −by
A = −bz 0 bx
by −bx 0
Note that while these skew-symmetric matrices have 9 elements as they are 3 × 3 matrices, they only have 3 DOF.
8. Unit Quaternions
Let us delve into each of these in a little more depth.
116
17.3.3 Orthonormal Matrices
We have studied these previously, but these are the matrices that have the following properties:
1. RT R = RRT = I, RT = R−1 (skew-symmetric)
2. det |R| = +1
3. R ∈ SO(3) (see notes on groups above) - being a member of this Special Orthogonal group is contingent on satisfying the
properties above.
R = eθΩ
Taking the Taylor expansion of this expression, we can write this matrix exponential as:
R = eθΩ
∞
X 1
= (θΩ)i
i=0
i!
∞
X θi
= (VΩ ΛiΩ VΩ
T T
) (Taking an eigendecomposition of Ω = VΩ ΛΩ VΩ )
i=0
i!
(Optional) Let us look a little deeper into the mathematics behind this exponential cross product. We can write a rotation about
ω̂ through angle θ as:
r = R(θ)r0
dr d
= (R(θ)r0 )
dθ dθ
dr
= ω̂ × r = Ωr = ΩR(θ)r0
dθ
d
R(θ)r0 = ΩR(θ)r0
dθ
Then for all r0 :
d
R(θ) = ΩR(θ) =⇒ R(θ) = eθΩ
dθ
117
Figure 31: Mapping from a sphere to a complex plane, which we then apply a homogeneous transformation to and map back to
the sphere in order to induce a rotation.
The geometry of this problem can be understood through the following figure:
Figure 32: Geometric interpretation of the Rodrigues formula: Rotation about the vector ω̂ through an angle θ.
One disadvantage of this approach is that there is no way to have compositions of rotations.
118
Next, let us take an in-depth analysis of the Rodrigues Formula and the Exponential Cross Product:
dR
= ΩR =⇒ R = eΩθ
dθ
∞
X θi
1
eΩθ = I + θΩ + (θΩ)2 + ... = Ωi
2! i=0
i!
Next, we have that:
Ω2 = (ω̂ ω̂ T − I)
Ω3 = −Ω
We can then write this matrix exponential as:
θ3 θ5 θ2 θ4 θ6
eθΩ = I + Ω(θ − + + · · · ) + Ω2 ( − + + ···)
3! 5! 2! 4! 6!
∞
X θ2i+1 ∞
X θ2i+2
= I + Ω( (−1)i ) + Ω2 ( (−1)i )
i=0
(2i + 1)! i=0
(2i + 2)!
= I + Ω sin θ + Ω2 (1 − cos θ)
= I + (sin θ)Ω + (ω̂ ω̂ T − I)(1 − cos θ)
= (cos θ)I + (sin θ)Ω + (1 − cos θ)ω̂ ω̂ T
From this, we have:
r = eθΩ r
r0 = (cos θ)r + (1 − cos θ)(ω̂ · r)ω̂ + sin θ(ω̂ × r)
Where the last line is the result of the Rodrigues formula.
119
17.6 Quaternions
In this section, we will discuss another way to represent rotations: quaternions.
120
17.6.4 Representations for Quaternion Multiplication
With several ways to represent these quaternions, we also have several ways through which we can represent quaternion multi-
plication:
1. Real and 3 Imaginary Parts:
3. 4-Vector:
p0 −px −py −pz q0
px
p0 −pz py qx
py pz p0 −px qy
pz −py px p0 qz
Note: Here we also have an isomorphism between the quaternion and the 4 x 4 orthogonal matrix (this matrix is or-
thonormal if we have unit quaternions). Here we can show the isomorphism and relate this back to the cross product we
saw before by considering the equivalence of the two following righthand-side expressions:
p0 −px −py −pz
oo o px p0 −pz py
(a) pq = P q, where P =
py pz p0 −px
pz −py px p0
q0 −qx −qy −qz
oo o qx q0 qz −qy
(b) pq = q̄p, where q̄ =
qy −qz
q0 qx
qz qy −qx q0
A few notes about these matrices P and q̄:
• These matrices are orthonormal if quaternions are unit quaternions
o o
• P is normal if p is a unit quaternion, and q̄ is normal if q is a unit quaternion.
o o
• P is skew-symmetric if p has zero scalar part, and q̄ is skew-symmetric if q has zero scalar part.
• P and q̄ have the same signs for the first row and column, and flipped signs for off-diagonal entries in the bottom
right 3 x 3 blocks of their respective matrices.
121
o o∗
6. Conjugate Multiplication: q q :
o o∗
q q = (q, q)(q, −q)
= (q 2 + q · q, 0)
o o o
= (q · q)e
o ∆ o∗ o oo o
Where e = (1, 0), i.e. it is a quaternion with no vector component. Conversely, then, we have: q q = (q q)e.
o∗
o −1 q o
7. Multiplicative Inverse: q = o o (Except for q = (0, 0), which is problematic with other representations anyway.)
(q·q
122
o o
1. Scalar Component: r0 = r(q · q)
2. Vector Component: r0 = (q 2 − q · q)r + 2(q · r)q + 2q(q × r)
o0 o0 o o
3. Operator Preserves Dot Products: r · s = r · s =⇒ r0 · s0 = r · s
o0 o0 o0 o o o
4. Operator Preserves Triple Products: (r · s ) · t = (r · s) · t =⇒ (r0 · s0 )t0 = (r · s) · t =⇒ [r0 s0 t0 ] = [r s t]
5. Composition (of rotations!): Recall before that we could not easily compose rotations with our other rotation repre-
sentations. Because of associativity, however, we can compose rotations simply through quaternion multiplication:
o o o o∗ o∗ o o o o∗ o∗ oo o oo
p(q rq )p = (pq)r(q p ) = (pq)r(pq)∗
o ∆ oo
I.e. if we denote the product of quaternions z = pq, then we can write this rotation operator as a single rotation:
o o o o∗ o∗ o o o o∗ o∗ oo o oo o o o∗
p(q rq )p = (pq)r(q p ) = (pq)r(pq)∗ = z rz
This ability to compose rotations is quite advantageous relative to many of the other representations of rotations we have
seen before (orthonormal rotation matrices can achieve this as well).
• q = cos θ2
• ||q||22 = sin θ2
o θ θ
q = (cos , ω̂ sin )
2 2
A few notes on this:
o
• We see that both the scalar and vector components of the quaternion q depend on the axis of rotation ω̂ and the angle of
rotation θ.
• The vector component of this quaternion is parallel to ω̂.
• This representation is one way to represent a unit quaternion.
• Knowing the axis and angle of a rotation allows us to compute the quaternion.
o o
• Note that −q represents the same mapping as q since:
o o o∗ o o o∗
(−q)r(−q ) = q rq
To build intuition with this quaternion rotation operator, one way we can conceptualize this is by considering that our space
of rotations is a 3D sphere in 4D, and opposite points on this sphere represent the same rotation.
123
17.8 Applying Quaternion Rotation Operator to Photogrammetry
Now that we have specified this operator and its properties, we are ready to apply this to photogrammetry, specifically for
absolute orientation. Let us briefly review our four main problems of photogrammetry:
1. Absolute Orientation (3D to 3D): Range Data
2. Relative Orientation (2D to 2D): Binocular Stereo
Where the first of these terms is fixed, the second of these terms is to be maximized (since there is a negative sign in front of
this term, this thereby minimizes the objective), and the third of these terms is fixed. Therefore, our rotation problem can be
simplified to:
n
X n
X
R∗ = min ||ei ||22 = −2 (r0r,i − R(r0l,i )) (51)
R
i=1 i=1
n
X
= min(−2 r0r,i · R(r0l,i )) (52)
R
i=1
Xn
= max( rb0r,i · R(r0l,i )) (53)
R
i=1
But since we are optimizing over an orthonormal rotation matrix R, we cannot simply take the derivative and set it equal to
zero as we usually do for these least squares optimization problems. Though we can solve this as a Lagrangian optimization
problem, specifying these constraints is difficult and makes for a much more difficult optimization problem. It turns out this a
common problem in spacecraft attitude control. Let us see how we can use quaternions here!
124
Then we can solve for our optimal rotation by solving for quaternions instead:
n
X
R∗ = max rb0r,i · R(r0l,i )
R
i=1
n
X o o0 o∗ o0
= o max
o
(q rl,i q ) · rr,i
q,||q||2 =1 i=1
n
X o o0 o0 o
= o max
o
(q rl,i ) · (rr,i q)
q,||q||2 =1 i=1
n
X o o
= o max
o
(R̄l,i q) · (Rr,i q)
q,||q||2 =1 i=1
n
oT
X o o
T
= o max
o
q R̄l,i Rr,i q (Since q does not depend on i)
q,||q||2 =1 i=1
Where the term in the sum is a 4 × 4 matrix derived from point cloud measurements.
From here, we can solve for an optimal rotation quaternion through Lagrangian Optimization, with our objective given by:
n
oT o o o ∆
X
T
max
o
q N q, subject to: q · q = 1, N = R̄l,i Rr,i
q i=1
Then written with the Lagrangian constraints this optimization problem becomes:
oT o o o
max
o
q N q + λ(1 − q · q)
q
o
Differentiating this expression w.r.t. q and setting the result equal to zero yields the following first-order condition:
o o
2N q − 2λq = 0
It turns out that similarly to our inertia problem from the last lecture, the quaternion solution to this problem is the solution to
an eigenvalue/eigenvector problem. Specifically, our solution is the eigenvector corresponding to the largest eigenvalue of a 4 × 4
real symmetric matrix N constructed from elements of the matrix given by a dyadic product of point cloud measurements from
the left and righthand systems of coordinates:
n
X
M= r0l,i r0r,i T
i=1
This matrix M is an asymmetric 3 × 3 real matrix. A few other notes about this:
• The eigenvalues of this problem are the Lagrange Multipliers of this objective, and the eigenvectors are derived from
the eiegenvectors of N , our matrix of observations.
• This analytic solution leads to/requires solving a quartic polynomial - fortunately, we have closed-form solutions of poly-
nomials up to a quartic degree! Therefore, a closed-form solution exists. Specifically, the characteristic equation in this
case takes the form:
λ4 + c3 λ3 + c2 λ2 + c1 λ + c0 = 0
Because the matrix of point cloud measurements N is symmetric, this characteristic equation simplifies and we get the
following coefficients:
1. c3 = tr(N ) = 0
2. c2 = −2tr(M T M )
3. c1 = −8 det |M |
4. c0 = det |N |
In addition to solving absolute orientation problems with quaternions, this approach has applications to other problems as
well, such as:
125
• Relative Orientation (Binocular Stereo)
• Camera Calibration
• Manipulator Kinematics
• Manipulator Fingerprinting
• Spacecraft Dynamics
Compared with orthonormal matrices, composing quaternions is faster for this operation (orthonormal matrices require
27 multiplications and 18 additions.
o o o∗
• Operation: Rotating Vectors: q rq
This is given by:
o o o∗
q rq →r0 = (q 2 − r · q)r + 2(q · r)q + 2q(q × r)
r0 = r + 2q(q × r) + 2q × (q × r) (More efficient implementation)
Carrying this out naively requires 15 multiplications and 12 additions.
Compared with orthonormal matrices, composing quaternions is slower for this operation (orthonormal matrices require
9 multiplications and 6 additions.
126
• Operation: Renormalization This operation is used when we compose many rotations, and the quaternion (if we are
using a quaternion) or the orthonormal matrix is not quite an orthonormal matrix due to floating-point arithmetic. Since
this operation requires matrix inversion (see below) for orthonormal matrices, it is much faster to carry out this operation
with quaternions.
o
Nearest Unit Quaternion: √qo o
q·q
1
Nearest Orthonormal Matrix: M (M T M )− 2
• Sampling: Sampling in this space can be done in regular and random intervals
• Finite rotation groups: These include the platonic solids with 12, 24, 60 elements - we can have rotation groups for (i)
tetrahedron, (ii) hexahedron/octahedron, and (iii) dodecahedron/icosahedron
• Finer-grade Sampling can be achieved by sub-dividing the simplex in the rotation space
o o0 o0 o o
• If {q i }N N
i=1 is a group, then so is {q i }i=1 , where q i = q 0 q i .
17.10 References
1. Group, https://en.wikipedia.org/wiki/Group (mathematics)
2. Manifold, https://en.wikipedia.org/wiki/Manifold
• Relative Orientation 2D ←→ 2D
• Exterior Orientation 2D ←→ 3D
• Intrinsic Orientation 3D ←→ 2D
In the last lecture, we saw that when solving absolute orientation problems, we are mostly interested in finding transfor-
mations (translation + rotation) between two coordinate systems, where these coordinate systems can correspond to objects
or sensors moving in time (recall this is where we saw duality between objects and sensors).
Last time, we saw that one way we can find an optimal transformation between two coordinate systems in 3D is to de-
compose the optimal transformation into an optimal translation and an optimal rotation. We saw that we could solve for
optimal translation in terms of rotation, and that we can mitigate the constraint issues with solving for an orthonormal rotation
matrix by using quaternions to carry out rotation operations.
127
18.1.1 Rotation Operations
Relevant to our discussion of quaternions is identifying the critical operations that we will use for them (and for orthonormal
rotation matrices). Most notably, these are:
oo
1. Composition of rotations: pq = (p, q)(q, q) = (pq − q · q, pq + qq + q × q)
o0 o o o∗
2. Rotating vectors: r = q rq = (q 2 − q · q)r + 2(q · r)q + 2q(q × r)
Recall from the previous lecture that operation (1) was faster than using orthonormal rotation matrices, and operation (2)
was slower.
Combining these equations from above, we have the following axis-angle representation:
!
o θ θ o θ θ
q ⇐⇒ ω̂, θ, q = cos , q = ω̂ sin =⇒ q = cos , ω̂ sin
2 2 2 2
We also saw that we can convert these quaternions to orthonormal rotation matrices. Recall that we can write our vector rotation
operation as:
o o o∗ o
q rq = (Q̄T Q)r, where
o o
q·q 0 0 0
0 q02 + qx2 − qy2 − qz2 2(qx qy − q0 qz ) 2(qx qz + q0 qy )
Q̄T Q =
0 2(qy qx + q0 qz ) q02 − qx2 + qy2 − qz2 2(qy qz − q0 qx )
2 2 2 2
0 2(qz qx − q0 qy ) 2(qz qy + q0 qz ) q0 − qx − qy + qz
The matrix Q̄T Q has skew-symmetric components and symmetric components. This is useful for conversions. Given a
quaternion, we can compute orthonormal rotations more easily. For instance, if we want an axis and angle representation, we
can look at the lower right 3 × 3 submatrix, specifically its trace:
128
18.2 Quaternion Transformations/Conversions
Next, let us focus on how we can convert between quaternions and orthonormal rotation matrices. Given a 3 × 3 orthonormal
rotation matrix r, we can compute sums and obtain the following system of equations:
This equation can be solved by taking square roots, but due to the number of solutions (8 by Bezout’s theorem, allowing for the
flipped signs of quaternions, we should not use this set of equations alone to find the solution).
Instead, we can compute these equations, evaluate them, take the largest for numerical accuracy, arbitrarily select to use
the positive version (since there is sign ambiguity with the signs of the quaternions), and solve for this. We will call this selected
righthand side qi .
For off-digaonals, which have symmetric and non-symmetric components, we derive the following equations:
Adding/subtracting off-diagonals give us 6 relations, of which we only need 3 (since we have 1 relation from the diagonals). For
instance, if we have qi = qy , then we pick off-diagonal relations involving qy , and we solve the four equations given by:
This system of four equations gives us a direct way of going from quaternions to an orthonormal rotation matrix. Note that this
could be 9 numbers that could be noisy, and we want to make sure we have best fits.
Taking scaling into account, we can write the relationship between two point clouds corresponding to two different coordi-
nate systems as:
r0r = sR(r0l )
∆
Where rotation is again given by R ∈ SO(3), and the scaling factor is given by s ∈ R+ (where R+ = {x : x ∈ R , x > 0}. Recall
that r0r and r0l are the centroid-subtracted variants of the point clouds in both frames of reference.
129
As we did for translation and rotation, we can solve for an optimal scaling parameter:
n
X
s∗ = arg min ||r0r,i − sR(r0l,i )||22
s
i=1
n
X n
X n
X
= arg min ||r0r,i ||22 − 2s r0r,i R(r0l,i ) + s2 ||R(r0l,i )||22
s
i=1 i=1 i=1
n
X n
X n
X
= arg min ||r0r,i ||22 − 2s r0r,i R(r0l,i ) + s2 ||r0l,i ||22 (Rotation preserves vector lengths)
s
i=1 i=1 i=1
∆ Pn
3. sl = i=1 ||r0l,i ||22
Then we can write this objective for the optimal scaling factor s∗ as:
∆
s∗ = arg min{J(s) = sr − 2sD + s2 sl }
s
Since this is an unconstrained optimization problem, we can solve this by taking the derivative w.r.t. s and setting it equal to 0:
dJ(s) d
= sr − 2sD + s2 sl = 0
ds ds
D
= −2D + s2 sl = 0 =⇒ s =
sl
As we also saw with rotation, this does not give us an exact answer without finding the orthonormal matrix R, but now we are
able to remove scale factor and back-solve for it later using our optimal rotation.
Intuitively, this is the case because the version of OLS we used above “cheats” and tries to minimize error by shriking the
scale by more than it should be shrunk. This occurs because it brings the points closer together, thereby minimizing, on average,
the error term. Let us look at an alternative formulation for our error term that accounts for this optimization phenomenon.
We then take the same definitions for these terms that we did above:
130
∆ Pn
1. sr = i=1 ||r0r,i ||22
∆ Pn
2. D = i=1 r0r,i R(r0l,i )
∆ Pn
3. sl = i=1 ||r0l,i ||22
Then, as we did for the asymmetric OLS case, we can write this objective for the optimal scaling factor s∗ as:
∆ 1
s∗ = arg min{J(s) = sr − 2D + ssl }
s s
Since this is an unconstrained optimization problem, we can solve this by taking the derivative w.r.t. s and setting it equal to 0:
dJ(s) d 1
= sr − 2D + ssl = 0
ds ds s
1 sl
= − 2 sr + sl = 0 =⇒ s2 =
s sr
Therefore, we can see that going in the reverse direction preserves this inverse (you can verify this mathematically and intu-
itively by simply setting r0r,i ↔ r0l,i ∀ i ∈ {1, ..., n} and noting that you will get s2inverse = ssrl ). Since this method better preserves
symmetry, it is preferred.
Intuition: Since s no longer depends on correspondences (matches between points in the left and right point clouds), then
the scale simply becomes the ratio of the point cloud sizes in both coordinate systems (note that sl and sr correspond to the
summed vector lengths of the centroid-subtracted point clouds, which means they reflect the variance/spread/size of the point
cloud in their respective coordinate systems.
We can deal with translation and rotation in a correspondence-free way, while also allowing for us to decouple rotation. Let us
also look at solving rotation, which is covered in the next section.
If this were an unconstrained optimization problem, we could solve by taking the derivative of this objective w.r.t. our quaternion
o
q and setting it equal to zero. Note the following helpful identities with matrix and vector calculus:
d
1. da (a · b) = b
d T
2. da (a M b) = 2M b
However, since we are working with quaternions, we must take this constraint into account. We saw in lecture 18 that we did
this with using Lagrange Multiplier - in this lecture it is also possible to take this specific kind of vector length constraint
into account using Rayleigh Quotients.
What are Rayleigh Quotients? The intuitive idea behind them: How do I prevent my parameters from becoming too
large) positive or negative) or too small (zero)? We can accomplish this by dividing our objective by our parameters, in this
case our constraint. In this case, with the Rayleigh Quotient taken into account, our objective becomes:
oT o n
q Nq ∆
X
T
Recall that N = Rl,i Rr,i
oT o
q q i=1
131
How do we solve this? Since this is now an unconstrained optimization problem, we can solve this simply using the rules of
calculus:
oT o
o ∆ q Nq
J(q) ==
oT o
q q
o oT o
dJ(q) d q Nq
o = o T
=0
dq dq qo qo
T o oT o oT o d oT o
d o
o (q N q)q q − q N q o (q q)
dq dq
= =0
oT o 2
(q q)
o o
2N q 2q oT o
= − (q N q) = 0
oT o oT o
q q (q q)2
From here, we can write this first order condition result as:
oT o
o q Nq o
Nq = q
oT o
q q
oT o
q Nq
Note that oT o ∈ R (this is our objective). Therefore, we are searching for a vector of quaternion coefficients such applying the
q q
T
∆ qo N qo
rotation matrix to this vector simply produces a scalar multiple of it - i.e. an eigenvector of the matrix N . Letting λ = oT o ,
q q
o o
then this simply becomes N q = λq. Since this optimization problem is a maximization problem, this means that we can pick
the eigenvector of N that corresponds to the largest eigenvalue (which in turn maximizes the objective consisting of the
oT o
q Nq
Rayleigh quotient oT o , which is the eigenvalue.
q q
Even though this quaternion-based optimization approach requires taking this Rayleigh Quotient into account, it is much easier
to do this optimization than to solve for orthonormal matrices, which either require a complex Lagrangian (if we solve with
Lagrange multipliers) or an SVD decomposition from Euclidean space to the SO(3) group (which also happens to be a manifold).
Let us start with two correspondences: if we have two objects corresponding to the correspondences of points in the 3D world,
then if we rotate one object about axis, we find this does not work, i.e. we have an additional degree of freedom. Note that the
distance between correspondences is fixed.
132
Figure 33: Using two correspondences leads to only satisfying 5 of the 6 needed constraints to solve for translation and rotation
between two point clouds.
Because we have one more degree of freedom, this accounts for only 5 of the 6 needed constraints to solve for translation and
rotation, so we need to have at least 3 correspondences.
With 3 correspondences, we get 9 constraints, which leads to some redundancies. We can add more constraints by incor-
porating scaling and generalizing the allowable transformations between the two coordinate systems to be the generalized
linear transformation - this corresponds to allowing non-orthonormal rotation transformations. This approach gives us 9
unknowns!
a11 a12 a13 x a14
a21 a22 a23 y + a24
a31 a32 a33 z a34
But we also have to account for translation, which gives us another 3 unknowns, giving us 12 in total and therefore requiring at
least 4 non-redundant correspondences in order to compute the full general linear transformation. Note that this doesn’t have
any constraints as well!
On a practical note, this is often not needed, especially for finding the absolute orientation between two cameras, because
oftentimes the only transformations that need to be considered due to the design constraints of the system (e.g. an autonomous
car with two lidar systems, one on each side) are translation and rotation.
Recall that our matrix N composed of the data has some special properties:
1. c3 = tr(N ) = 0 (This is actually a great feature, since usually the first step in solving 4th-order polynomial systems is
eliminating the third-order term).
2. c2 = 2tr(M T M ), where M is defined as the sum of dyadic products between the points in the point clouds:
n
∆
X
M = r0l,i r0r,i T ∈ R3×3
i=1
3. c1 = 8 det |M |
4. c0 = det |N |
What happens if det |M | = 0, i.e. the matrix M is singular? Then using the formulas above we must have that the coefficient
c1 = 0. Then this problem reduces to:
λ4 + c2 λ2 + c0 = 0
This case corresponds to a special geometric case/configuration of the point clouds - specifically, when points are coplanar.
133
18.4.3 What Happens When Points are Coplanar?
When points are coplanar, we have that the matrix N , composed of the sum of dyadic products between the correspondences in
the two point clouds, will be singular.
To describe this plane in space, we need only find a normal vector n̂ that is orthogonal to all points in the point cloud -
i.e. the component of each point in the point cloud in the n̂ direction is 0. Therefore, we can describe the plane by the equation:
Figure 34: A coplanar point cloud can be described entirely by a surface normal of the plane n̂.
Note: In the absence of measurement noise, if one point cloud is coplanar, the the other point cloud must be as well (assuming
that the transformation between the point clouds is a linear transformation). This does not necessarily hold when measurement
noise is introduced.
Recall that our matrix M , which we used above to compute the coefficients of the characteristic polynomial describing this
system, is given by:
n
∆
X
M= r0r,i r0l,i T
i=1
Therefore, when a point cloud is coplanar, the null space of M is non-trivial (it is given by at least Span({n̂}), and therefore
M is singular. Recall that a matrix M ∈ Rn×d is singular if ∃ x ∈ Rd , x 6= 0 such that M x = 0, i.e. the matrix has a non-trivial
null space.
134
Figure 35: Two coplanar point clouds. This particular configuration allows us to estimate rotation in two simpler steps.
In this case, we can actually decompose finding the right rotation into two simpler steps!
1. Rotate one plane so it lies on top of the other plane. We can read off the axis and angle from the unit normal vectors of
these two planes describing the coplanarity of these point clouds, given respectively by n̂1 and n̂2 :
• Axis: We can find the axis by noting that the axis vector will be parallel to the cross product of n̂1 and n̂2 , simply
scaled to a unit vector:
n̂1 × n̂2
ω̂ =
||n̂1 × n̂||2
• Angle: We can also solve for the angle using the two unit vectors n̂1 and n̂2 :
We now have an axis angle representation for rotation between these two planes, and since the points describe each of the
respective point clouds, therefore, a rotation between the two point clouds! We can convert this axis-angle representation
into a quaternion with the formula we have seen before:
o θ θ
q = cos , sin ω̂
2 2
2. Perform an in-plane rotation. Now that we have the quaternion representing the rotation between these two planes, we can
orient two planes on top of each other, and then just solve a 2D least-squares problem to solve for our in-place rotation.
With these steps, we have a rotation between the two point clouds!
18.5 Robustness
In many methods in this course, we have looked at the use of Least Squares methods to solve for estimates in the presence
of noise and many data points. Least squares produces an unbiased, minimum-variance estimate if (along with a few other
assumptions) the dataset/measurement noise is Gaussian (Gauss-Markov Theorem) [1]. But what if the measurement noise is
non-Gaussian? How do we deal with outliers in this case?
It turns out that Least Squares methods are not robust to outliers. One alternative approach is to use absolute error in-
stead. Unfortunately, however, using absolute error does not have a closed-form solution. What are our other options for dealing
with outliers? One particularly useful alternative is RANSAC.
RANSAC, or Random Sample Consensus, is an algorithm for robust estimation with least squares in the presence
of outliers in the measurements. The goal is to find a least squares estimate that includes, within a certain threshold band, a
set of inliers corresponding to the inliers of the dataset, and all other points outside of this threshold bands as outliers. The
high-level steps of RANSAC are as follows:
1. Random Sample: Sample the minimum number of points needed to fix the transformation (e.g. 3 for absolute orientation;
some recommend taking more).
135
2. Fit random sample of points: Usually this involves running least squares on the sample selected. This fits a line (or
hyperplane, in higher dimensions), to the randomly-sampled points.
3. Check Fit: Evaluate the line fitted on the randomly-selected subsample on the rest of the data, and determine if the fit
produces an estimate that is consistent with the “inliers” of your dataset. If the fit is good enough accept it, and if it is
not, run another sample. Note that this step has different variations - rather than just immediately terminating once you
have a good fit, you can run this many times, and then take the best fit from that.
Furthermore, for step 3, we threshold the band from the fitted line/hyperplane to determine which points of the dataset are
inliers, and which are outliers (see figure below). This band is usually given by a 2 band around the fitted line/hyperplane.
Typically, this parameter is determined by knowing some intrinsic structure about the dataset.
Figure 36: To evaluate the goodness of fit of our sampled points, as well as to determine inliers and outliers from our dataset,
we have a 2 thick band centered around the fitted line.
Another interpretation of RANSAC: counting the “maximimally-occupied” cell in Hough transform parameter space! Another
way to find the best fitting line that is robust to outliers:
1. Repeatedly sample subsets from the dataset/set of measurements, and fit these subsets of points using least squares
estimates.
2. For each fit, map the points to a discretized Hough transform parameter space, and have an accumulator array that keeps
track of how often a set of parameters falls into a discretized cell. Each time a set of parameters falls into a discretized
cell, increment it by one.
3. After N sets of random samples/least squares fits, pick the parameters corresponding to the cell that is “maximally-
occupied”, aka has been incremented the most number of times! Take this as your outlier-robust estimate.
Figure 37: Another way to perform RANSAC using Hough Transforms: map each fit from the subsamples of measurements to
a discretized Hough Transform (parameter) space, and look for the most common discretized cell in parameter space to use for
an outlier-robust least-squares estimate.
136
18.6 Sampling Space of Rotations
Next, we will shift gears to discuss the sampling space of rotations.
Why are we interested in this space? Many orientation problems we have studied so far do not have a closed-form
solution and may require sampling. How do we sample from the space of rotations?
One way to sample from a sphere is with latitude and longitude, given by (θi , φi ), respectively. The problem with this ap-
proach, however, is that we sample points that are close together at the poles. Alternatively, we can generate random longitude
θi and φi , where:
• − π2 ≤ θi ≤ π
2 ∀i
• −π ≤ φi ≤ π ∀ i
But this approach suffers from the same problem - it samples too strongly from the poles. Can we do better?
Idea: Map all points (both inside the sphere and outside the sphere/inside the cube) onto the sphere by connecting a line
from the origin to the sampled point, and finding the point where this line intersects the sphere.
Figure 38: Sampling from a sphere by sampling from a cube and projecting it back to the sphere.
Problem with this approach: This approach disproportionately samples more highly on/in the direction of the cube’s
edges. We could use sampling weights to mitigate this effect, but better yet, we can simply discard any samples that fall outside
the sphere. To avoid numerical issues, it is also best to discard points very close to the sphere.
Generalization to 4D: As we mentioned above, our goal is to generalize this from 3D to 4D. Cubes and spheres simply
become 4-dimensional - enabling us to sample quaternions.
• Hexahedra (6 faces)
• Octahedra (8 faces)
• Dodecahedra (12 faces)
137
• Icosahedra (20 faces)
These polyhedra are also known as the regular solids.
As we did for the cube, we can do the same for polyhedra: to sample from the sphere, we can sample from the polyhedra,
and then project onto the point on the sphere that intersects the line from the origin to the sampled point on the polyhedra.
From this, we get great circles from the edges of these polyhedra on the sphere when we project.
Fun fact: Soccer balls have 32 faces! More related to geometry: soccer balls are part of a group of semi-regular solids,
specifically an icosadodecahedron.
o
1. Identity rotation: q = (1, 0)
o
2. π about x̂: q = (cos π2 , sin π2 x̂) = (0, x̂)
o
3. π about ŷ: q = (cos π2 , sin π2 ŷ) = (0, ŷ)
o
4. π about ẑ: q = (cos π2 , sin π2 ẑ) = (0, ẑ)
o
5. π2 about x̂: q = (cos π4 , sin π4 x̂) = √12 (1, x̂)
π o π π √1 (1, ŷ)
6. 2 about ŷ: q = (cos 4 , sin 4 ŷ) = 2
π o π π √1 (1, ẑ)
7. 2 about ẑ: q = (cos 4 , sin 4 ẑ) = 2
o
8. − π2 about x̂: q = (cos − π4 , sin − π4 x̂) = √1 (1, −x̂)
2
o
9. − π2 about ŷ: q = (cos − π4 , sin − π4 ŷ) = √1 (1, −ŷ)
2
o
10. − π2 about ẑ: q = (cos − π4 , sin − π4 ẑ) = √1 (1, −ẑ)
2
These 10 rotations by themselves give us 10 ways to sample the rotation space. How can we construct more samples? We can
do so by taking quaternion products, specifically, products of these 10 quaternions above. Let us look at just a couple of
these products:
1. (0, x̂)(0, ŷ):
We see that this simply produces the third axis, as we would expect. This does not give us a new rotation to sample from.
Next, let us look at one that does.
2. √1 (1, x̂) √1 (1, ŷ):
2 2
1 1 1
√ (1, x̂) √ (1, ŷ) = (1 − x̂ · ŷ, ŷ + x̂ + x̂ × ŷ)
2 2 2
1
= (1, x̂ + ŷ + x̂ × ŷ)
2
This yields the following axis-angle representation:
• Axis: √1 (1 1 1)
3
138
θ 1 θ π 2π
• Angle: cos 2 = 2 =⇒ 2 = 3 =⇒ θ = 3
18.7 References
1. Gauss-Markov Theorem, https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov theorem
What are tessellations? “A filling or tessellations of a flat surface is the covering of a plane using one or more geomet-
ric shapes (polygons).” [1]. Tessellations of the surface of the sphere can be based on platonic solids, with 4, 6, 8, 12, and
20 faces. Each of the tessellations from the platonic solids results in equal area projections on the sphere, but the division is
somewhat coarse.
For greater granularity, we can look at the 14 Archimedean solids. This allows for having multiple polygons in each
polyhedra (e.g. squares and triangles), resulting in unequal area in the tessellations on the unit sphere.
Related, we are also interested in the rotation groups (recall groups are mathematical sets that obey certain algebras,
e.g. the Special Orthogonal group) of these regular polyhedra:
• 12 elements in rotation group for tetrahedron.
• 24 elements in rotation group for hexahedron.
• 60 elements in rotation group for dodecahedron.
• The octahedron is the dual of the cube and therefore occupies the same rotation group as it.
• The icosahedron is the dual of the dodecahedron and therefore occupies the same rotation group as it.
A few other notes on tessellations:
• One frequently-used method for creating tessellations is to divide each face into many triangular or hexagonal areas.
• Histograms can be created in tessellations in planes by taking square sub-divisions of the region.
• Hexagonal tessellations are a dual of triangular tessellations.
Why are Critical Surfaces important? Critical surfaces can negatively impact the performance of relative orientation
systems, and understanding their geometry can enable us to avoid using strategies that rely on these types of surfaces in order
to find the 2D transformation, for instance, between two cameras.
139
We will discuss more about these at the end of today’s lecture. For now, let’s introduce quadrics - geometric shapes/sur-
faces defined by second-order equations in a 3D Cartesian coordinate system:
x2 y2 z2
1. Ellipsoid: a2 + b2 + c2 =1
Figure 39: 3D geometric depiction of an ellipsoid, one member of the quadric family.
Figure 40: 3D geometric depiction of a sphere, another member of the quadric family and a special case of the ellipse.
This quadric surface is ruled: we can embed straight lines in the surface, despite its quadric function structure.
Figure 41: 3D geometric depiction of a hyperboloid of one sheet, another member of the quadric family and a special case of the
ellipse.
140
Figure 42: 3D geometric depiction of a hyperboloid of one sheet, another member of the quadric family and a special case of the
ellipse.
z2 x2 y2
5. Cone: c2 = a2 + b2
Figure 43: 3D geometric depiction of a cone, another member of the quadric family and a special case of the hyperboloid of one
sheet.
z x2 y2
6. Elliptic Paraboloid: c = a2 + b2
Note that this quadric surface has a linear, rather than quadratic dependence, on z.
Figure 44: 3D geometric depiction of a elliptic paraboloid, another member of the quadric family and a special case of the
hyperboloid of one sheet.
z x2 y2
7. Hyperbolic Paraboloid: c = a2 − b2
Note that this quadric surface also has a linear, rather than quadratic dependence, on z.
141
Figure 45: 3D geometric depiction of a hyperbolic paraboloid, another member of the quadric family and a special case of the
hyperboloid of one sheet.
Another special case is derived through planes. You may be wondering - how can we have a quadratic structure from planes? We
can derive a surface with quadratic terms by considering the intersection of two planes. This intersection of planes is computed
analytically as a product of two linear equations, resulting in a quadratic equation.
Figure 46: Binocular stereo system set up. For this problem, recall that one of our objectives is to measure the translation, or
baseline, between the two cameras.
When we search for correspondences, we need only search along a line, which gives us a measure of binocular disparity
that we can then use to measure distance between the cameras.
142
The lines defined by these correspondences are called epipolar lines. Places that pass through epipolar lines are called
epipolar planes, as given below:
Figure 47: Epipolar planes are planes that pass through epipolar lines.
Next, in the image, we intersect the image plane with our set of epipolar planes, and we look at the intersections of these
image planes (which become lines). The figure below illustrates this process for the left and right cameras in our binocular stereo
system.
Figure 48: After finding the epipolar planes, we intersect these planes with the image plane, which gives us a set of lines in both
the left and righthand cameras/coordinate systems of our binocular stereo system.
143
• With 2 correspondences, we still have the ability to rotate one of the cameras without changing the correspondence, which
implies that only 4 out of the 5 constraints are satisfied. This suggests we need more constraints.
It turns out we need 5 correspondences needed to solve the binocular stereo/relative orientation problem - each correspondence
gives you 1 constraint. Why? Each image has a disparity, with two components (this occurs in practice). There are two
types of disparity, each corresponding to each of the 2D dimensions. Note that disparity computes pixel discrepancies between
correspondences between images.
The vectors r0l,i (the left system measurement after the rotation transformation has been applied), rr,i , and b are all coplanar
in a perfect system. Therefore, if we consider the triple product of these 3 vectors, the volume of the parallelipiped, which can
be constructed from the triple product, should have zero volume because these vectors are coplanar (note that this is the
ideal case). This is known as the coplanarity condtion:
V = [0l,i rr,i b] = 0
A potential solution to find the optimal baseline (and rotation): we can use least squares to minimize the volume of the paral-
lelipiped corresponding to the triple product of these three (hopefully near coplanar) vectors. This is a feasible approach, but
also have a high noise gain/variance.
This leads us to an important question: What, specifically, are we trying to minimize? Recall that we are matching cor-
respondences in the image, not in the 3D world. Therefore, the volume of the parallelipiped is proportional to the
error, but does not match it exactly. When we have measurement noise, the rays from our vectors in the left and righthand
systems no longer intersect, as depicted in the diagram below:
144
Figure 50: With measurement noise, our rays do not line up exactly. We will use this idea to formulate our optimization problem
to solve for optimal baseline and rotation.
We can write that the error is perpendicular to the cross product between the rotated left and the right rays:
r0l,i × rr
And therefore, we can write the equation for the “loop” (going from the rotated left coordinate system to the right coordinate
system) as:
αr0l + γ(r0l × rr ) = b + βrr
Where the error we seek to minimize (r0l × rr ) is multiplied by a parameter γ.
To solve for our parameters α, β, and γ, we can transform this vector equation into 3 scalar equations by taking dot prod-
ucts. We want to take dot products that many many terms drop to zero, i.e. where orthogonality applies. Let’s look at these 3
dot products:
1. ·(r0l × rr ): This yields the following:
Lefthand side : (αr0l + γ(r0l × rr )) · (r0l × rr ) = γ||r0l × rr ||22
Righthand side : (b + βrr ) · (r0l × rr ) = b · r0l × rr + 0 = [b r0l rr ]
Combining : γ||r0l × rr ||22 = [b r0l rr ]
Intuitively, this says that the error we see is proportional to teh triple product (the volume of the parallelipiped). Taking
our equation with this dot product allows us to isolate γ.
2. ·(r0l × (r0l × rr )): This yields the following:
Lefthand side : (αr0l + γ(r0l × rr )) · ((r0l × (r0l × rr )) = β||r0l × rr ||22
Righthand side : (b + βrr ) · ((r0l × (r0l × rr )) = (b × r0l ) · (r0l × rr )
Combining : β||r0l × rr ||22 = (b × r0l ) · (r0l × rr )
Taking this dot product with our equation allows us to find β. We can repeat an analogous process to solve for α.
3. ·(rr × (r0l × rr )): This yields the following:
Lefthand side : (αr0l + γ(r0l × rr )) · ((rr × (r0l × rr )) = α||r0l × rr ||22
Righthand side : (b + βrr ) · ((rr × (r0l × rr )) = (b × rr ) · (r0l × rr )
Combining : α||r0l × rr ||22 = (b × rr ) · (r0l × rr )
Taking this dot product with our equation allows us to find α.
With this, we have now isolated all three of our desired parameters. We can then take these 3 equations to solve for our 3
unknowns α, β, and γ. We want |γ| to be as small as possible. We also require α and β to be non-negative, since these indicate
the scalar multiple of the direction along the rays in which we (almost) get an intersection between the left and right coordinate
systems. Typically, a negative α and/or β results in intersections behind the camera, which is often not physically feasible.
It turns out that one of the ways discard some of the 20 solutions produced by this problem is to throw out solutions that
result in negative α and/or β.
Next, we can consider the distance this vector corresponding to the offset represents. This distance is given by:
[b r0l rr ]
d = γ||r0l × rr ||2 =
||r0l × rr ||2
145
Closed-Form Solution: Because this is a system involving 5 second-order equations, and the best we can do is reduce this
to a single 5th-order equation, which we do not (yet) have the closed-form solutions to, we cannot solve for this problem in
closed-form. However, we can still solve for it numerically. We can also look at solving this through a weighted least-squares
approach below.
Therefore, incorporating this weighting factor, our least squares optimization problem becomes:
n
X
min wi [b r0l rr ]2 , subject to b · b = ||b||22 = 1
b,R(·)
i=1
How do we solve this? Because wi will change as our candidate solution changes, we will solve this problem iteratively and in
an alternating fashion - we will alternate between updated our conversion weights wi and solving for a new objective given the
recent update of weights. Therefore, we can write this optimization problem as:
n
X
b∗ , R∗ = arg min
2
wi [b r0l rr ]2
b,||b||2 =1,R(·)
i=1
n
X 2
= arg min
2
wi (rr,i × b) · r0l,i
b,||b||2 =1,R(·)
i=1
Intuition for these weights: Loosely, we can think of the weights as being the conversion factor from 3D to 2D error, and
therefore, they can roughly be thought of as w = Zf , where f is the focal length and Z is the 3D depth.
Now that we have expressed a closed-form solution to this problem, we are ready to build off of the last two lectures and
apply our unit quaternions to solve this problem. Note that because we express the points from the left coordinate system in a
rotated frame of reference (i.e. with the rotation already applied), then we can incorporate the quaternion into this definition
to show how we can solve for our optimal set of parameters given measurements from both systems:
o0 o o o∗
rl = qrl q
Then we can solve for this as t:
t = (rr × b) · r0l
o o o o o∗
= rr b · qrl q
o oo oo
= rr (bq) · q rl
o o oo o ∆ oo
= rd d · q rl , where d = bq, which we can think of as a product of baseline and rotation.
o o
Recall that our goal here is to find the baseline b - this can be found by solving for our quantity d and multiplying it on the
o∗
righthand side by q :
o o∗ o o o∗ oo o
dq = bq q = be = b
o o o∗
(Recall that e = (1, 0) = q q )
At first glance, it appears we have 8 unknowns, with 5 constraints. But we can add additional constraints to the system to make
the number of constraints equal the number of DOF:
o o o
1. Unit quaternions: ||q||2 = q · q = 1
o o o
2. Unit baseline: ||b||2 = b · b = 1
o o o o
3. Orthogonality of q and d: q · d = 0
Therefore, with these constraints, we are able to reach a system with 8 constraints. Note that we have more constraints to
enforce than with the absolute orientation problem, making the relative orientation problem more difficult to solve.
146
19.3.5 Symmetries of Relative Orientation Approaches
We can interchange the left and right coordinate system rays. We can do this for this problem because we have line intersections
rather than line rays. These symmetries can be useful for our numerical approaches. The equation below further demonstrates
o o
this symmetry by showing we can interchange the order of how d and q interact with our measurements to produce the same
result.
o o oo
t = rr d · qrl
o o oo
= rr q · drl
While this is one approach, a better approach is to use nonlinear optimization, such as Levenberg-Marquadt (often
called LM in nonlinear optimization packages). An optional brief intro to Levenberg-Marquadt is presented at the end of this
lecture summary.
Levenberg-Marquadt (LM) optimization requires a non-redundant parameterization, which can be achieved with either Gibbs
vectors or quaternions (the latter of which we can circumvent the redundancies of Hamilton’s quaternions bytreating
the
θ θ
redundancies of this representation as extra constraints. Recall that Gibbs vectors, which are given as (cos 2 , sin 2 ω̂), have
a singularity at θ = π.
• Gefärliche Flaächen, also known as “dangerous or critical surfaces”: There exist surfaces that make the relative
orientation problem difficult to solve due to the additional ambiguity that these surfaces exhibit. One example: Plane
flying over a valley with landmarks:
147
Figure 51: A plane flying over a valley is an instance of a critical surface. This is because we can only observe angles, and not
points. Therefore, locations of two landmarks are indistinguishable.
To account for this, pilots typically plan flight paths over ridges rather than valleys for this exact reason:
Figure 52: When planes fly over a ridge rather than a valley, the angles between two landmarks do change, allowing for less
ambiguity for solving relative orientation problems.
How do we generalize this case to 3D? We will see that this type of surface in 3D is a hyperboloid of one sheet. We
need to ensure sections of surfaces you are looking at do not closely resemble sections of hyperboloids of one sheet.
There are also other types of critical surfaces that are far more common that we need to be mindful of when con-
sidering relative orientation problems - for instance, the intersection of two planes. In 2D, this intersection of two planes
can be formed by the product of their two equations:
(ax + by + c)(dx + ey + f ) = adx2 + aexy + af x + bdxy + bey 2 + bf y + cdx + cey + cf = 0
We can see that this equation is indeed second order with respect to its spatial coordinates, and therefore belongs to the
family of quadric surfaces and critical surfaces.
Figure 53: The intersection of two planes is another type of critical surface we need to be mindful of. It takes a second-order
form because we multiply the two equations of the planes together to obtain the intersection.
148
term to ensure that a solution exists by making the closed-form matrix to invert in the normal equations positive semidefinite.
The normal equations, which derive the closed-form solutions for GN and LM, are given by:
1. GN: (J(θ)T J(θ))−1 θ = J(θ)T e(θ) =⇒ θ = (J(θ)T J(θ))−1 J(θ)T e(θ)
2. LM: (J(θ)T J(θ) + λI)−1 θ = J(θ)T e(θ) =⇒ θ = (J(θ)T J(θ) + λI)−1 J(θ)T e(θ)
Where:
• θ is the vector of parameters and our solution point to this nonlinear optimization problem.
• J(θ) is the Jacobian of the nonlinear objective we seek to optimize.
• e(θ) is the residual function of the objective evaluated with the current set of parameters.
Note the λI, or regularization term, in Levenberg-Marquadt. If you’re familiar with ridge regression, LM is effectively ridge
regression/regression with L2 regularization for nonlinear optimization problems. Often, these approaches are solved iteratively
using gradient descent:
1. GN: θ(t+1) ← θ(t) − α(J(θ(t) )T J(θ(t) ))−1 J(θ(t) )T e(θ(t) )
2. LM: θ(t+1) ← θ(t) − α(J(θ(t) )T J(θ(t) ) + λI)−1 J(θ(t) )T e(θ(t) )
Where α is the step size, which dictates how quickly the estimates of our approaches update.
19.4 References
1. Tesselation, https://en.wikipedia.org/wiki/Tessellation
This form is simple: (i) Centered at the origin, (ii) Axes line up. In general, the form of these quadric surfaces may be more
complicated, with both linear and constant terms. The signs of these qudratic terms determine the shape, as we saw in the
previous lecture:
1. + + + (ellipsoid)
2. + + − (hyperboloid of one sheet) - this is a class of critical surfaces
3. + − − (hyperboloid of two sheets)
4. − − − (imaginary ellipsoid)
For the solutions to the polynomial R2 + · · · − R = 0 (the class of hyperboloids of one sheets), the solutions tell us why these
critical surfaces create solution ambiguity:
1. R = 0: I.e. the solution is on the surface - the origin of the righthand system is on the surface. Two interpretations from
this; the first is for binocular stereo, and the second is for Structure from Motion (SfM):
(a) Binocular Stereo: The righthand system lies on the surface.
(b) Structure from Motion: In Structure from Motion, we “move ont the surface” in the next time step.
2. R = −b: This solution is the origin of the lefthand system (we move the solution left along the baseline from the righthand
system to the lefthand system). Two interpretations from this; the first is for binocular stereo, and the second is for
Structure from Motion (SfM):
(a) Binocular Stereo: “The surface has to go through both eyes”.
149
(b) Structure from Motion: We must start and end back on the surface.
3. R = kb: Because this system (described by the polynomial in R, then we have no scaling, giving us a solution. This
suggests that the entire baseline in fact lies on the surface, which in turn suggests:
• This setting/surface configuration is rare.
• This surface is ruled - we can draw lines inscribed in the surface. This suggests that our critical surface is a hyperboloid
of one sheet. It turns out that we have two rulings for the hyperboloid at one sheet (i.e. at every point on this surface,
we can draw two non-parallel lines through the surface that cross at a point).
Hyperboloids of one sheet, as we saw in the previous lecture, are not the only types of critical surfaces. Intersections of planes
are another key type of critical surface, namely because they are much more common in the world than hyperboloids of one
sheet. Analytically, the intersection of planes is given by the product of planes:
(a1 X + b1 Y + c1 Z + d1 )(a2 X + b2 Y + c2 Z + d2 ) = 0
This analytic equation describes the intersection of two planes - gives us a quadric surface.
In order for this intersection of planes to not have any constants, one of the planes must be the epipolar plane, whose
image is a line. The second plane is arbitrary and can take any form.
Practically, we may not run into this problem in this exact analytic form (i.e. a surface that is an exact instance of a hy-
perboloid of one sheet), but even as we venture near this condition, the noise amplification factor increases. As it turns out,
using a higher/wider Field of View (FOV) helps to mitigate this problem, since as FOV increases, it becomes less and less likely
that the surface we are imaging is locally similar to a critical surface.
One way that we can increase the FOV is through the use of spider heads, which consist of 8 cameras tied together into
a rigid structure. Though it requires calibration between the cameras, we can “stitch” together the images from the 8 cameras
into a single “mosaic image”.
Figure 54: A spider heads apparatus, which is one technique for increasing the FOV, thereby reducing the probability of surfaces
resembling critical surfaces.
Next, we will switch our discussion to another of our photogrammetry problems - interior orientation.
Have we already touched on this? Somewhat - with vanishing points. Recall that the goal with vanishing points was to
find (x0 , y0 , f ) using calibration objects. This method:
• Is not very accurate, nor general across different domains/applications
• Does not take into account radial distortion, which we need to take into account for high-quality imaging such as aerial
photography.
What is radial distortion? We touch more on this in the following section.
150
20.1.1 Radial Distortion
This type of distortion leads to a discrepancy between what should be the projected location of an object in the image, and
what it actually projects to. This distortion is radially-dependent.
This type of distortion becomes more apparent when we image lines/edges. It manifests from having a center of distor-
tion. The image of an object does not appear exactly where it should, and the error/discrepancy depends on the radius of the
point in the image relative to the center of distortion. This radial dependence is typically approximated as a polynomial in the
radius r:
r = ||r||2
δx = x(k1 r2 + k2 r4 + · · · )
δy = y(k1 r2 + k2 r4 + · · · )
Figure 55: Radial distortion is in the direction of the radius emanating from the center of projection to the point in the image
plane where a point projects to in the absence of radial distortion.
Note that the error vector δr is parallel to the vector r - i.e. the distortion is in the direction of the radial vector.
Many high-quality lenses have a defined radial distortion function that gives the distortion as a function of r, e.g. the fig-
ure below:
Radial Distortion Function
1.5
1
e(r)
0.5
0
0 1 2 3 4 5
r
These functions, as we stated above, are typically approximated as polynomial functions of r. How do we measure radial
distortion? One way to do so is through the use of plumb lines. These lines are suspended from the ceiling via weights, which
allows us to assume that they are straight and all parallel to one another in the 3D world. Then, we take an image of these lines,
with the image centered on the center lines. We can estimate/assess the degree of radial distortion by identifying the curvature,
if any, of the lines as we move further away from the center lines of the image.
151
Figure 56: Plumb lines are a useful way of estimating radial distortion, because they allow us to estimate the degree of curvature
between lines that should be straight and parallel.
Plumb lines allow us to estimate two types of radial distortion: (i) Barrel Distortion and (ii) Pincushion Distortion,
depicted below. The radius of curvature from these lines allows us to measure k1 (the first polynomial coefficient for r2 ):
Figure 57: Barrel and pincushion distortion, which we can absorb from plumb lines.
A more subtle question: is it better to go from )a) undistorted → distorted coordinates, or from (b) distorted → undis-
torted coordinates?
It is often desirable to take approach (b) (distorted → undistorted) because we can measure the distorted coordinates.
Wse can use series inversion to relate the distorted and undistorted coodinates. This affects the final coordinate system that
we do optimization in, i.e. we optimize error in the image plane.
δx = −y(1 r2 + 2 r4 + · · · )
δy = x(1 r2 + 2 r4 + · · · )
Figure 58: Tangential distortion acts tangentially relative to the radius - in this case, in a counterclockwise fashion (as we rotate
θ).
152
2. Decentering: If the center of distortion is not the principal point (center of the image with perspective projection), then
we get an offset that depends on position. This offset is typically small, but we still need to take it into account for
high-quality imaging such as aerial photography.
3. Tilt of Image Plane: If the image plane is tilted (see the figure below), then magnification will not be uniform and the
focus will not be perfect. In practice, to fix this, rather than changing the tilt of the image plane, you can instead insert a
compensating element to offset this tilt.
Figure 59: Tilt of the image plane is another factor that can affect distortion of images, as well as variable magnification/scaling
across the image. Oftentimes, it can be corrected via a compensating element that offsets this tilt, rather than realigning the
image plane.
Taking these effects into account allows for creating more sophisticated distortion models, but sometimes at the expense of
overfitting and loss of generalizability (when we have too many degrees of freedom, and our estimates for them become so
precariously configured that our model is not applicable for a slight change of domain).
These distortion models can also be used for nonlinear optimization protocols such as Levenberg-Marquadt (LM) or Gauss-
Newton (GN) (see lecture 20 for a review of these).
In camera calibration, correspondences are defined between points in the image and points on the calibration ob-
ject.
Here, we encounter the same reason that vanishing point-based methods are difficult - it is hard to directly relate the cam-
era to calibration objects. When we take exterior orientation into account, we generally perform much better and get
higher-performing results. Exterior orientation (2D → 3D) seeks to find the calibration object in space given a camera image.
Combined, therefore our problem now possesses 9 DOF. We can start solving this problem with perspective projection and
interior orientation.
153
1. (Xc , Yc , Zc )T are the camera coordinates (world coordinate units)
2. (xI , yI ) denote image position (row, column/grey level units)
3. (xO , yO , f ) denotie interior orientation/principal point (column/grey level and pixel units)
Can we modify the measurements so we are able to take radial distortion into account?
We need a good initial guess, because, numerical, iterative approaches that are used to solve this problem precisely exhibit
multiple mimima, and having a good initial guess/estimate to serve as our “prior” will help us to find the correct minima.
Figure 60: Exterior orientation seeks to find the transformation between a camera and a calibration object.
Where the matrix R and the translation t are our unknowns. From this, we can combine these equations with our equations
for interior orientation to find our solutions.
154
Written this way, we can map directly from image coordinates to calibration objects.
Let’s start by looking at this problem in polar coordinates. Radial distortion only changes radial lengths (r), and not an-
gle (θ). Changing f or Zc changes only radius/magnification. Let’s use this factor to “forget” about the angle.
We can also remove additional DOF by dividing our combined equations by one another:
r11 Xs +r12 Ys +r13 Zs +tx
xI − xO r31 Xs +r32 Ys +r33 Zs +tz
= r21 Xs +r22 Ys +r23 Zs +ty
yI − y0
r31 Xs +r32 Ys +r33 Zs +tz
155
A good sanity check: these two variables should evaluate to approximately the same value - if they do not, this is often
one indicator of correspondence mismatch.
With this scale factor computed we can find the true values of our solution by multiplying the scaled solutions with this scale
factor:
0 0 0 0 0 0
s(r11 , r12 , r13 , r21 , r22 , r23 , tx , ty = 1) → r11 , r12 , r13 , r21 , r22 , r23 , tx , ty
Note that we also have not (yet) enforced orthonormality of the first two rows in the rotation matrix, i.e. our goal is to have (or
minimize):
0 0 0 0 0 0
r11 r21 + r12 r22 + r13 r23 =0
The figure below highlights that given vectors a and b, we can find the nearest set of rotated vectors a0 and b0 using the
following, which states that adjustments to each vector in the pair are made in the direction of the other vector:
a0 = a + kb
b0 = b + ka
Since we want this dot product to be zero, we can solve for the scale k by taking the dot product of a0 and b0 and setting it
equal to zero:
a0 · b0 = (a + kb)(b + ka) = 0
= a · b + (a · a + b · b)k + k 2 + a · b = 0
This equation produces a quadratic equation in k, but unfortunately, as we approach our solution point, this sets the second
and third terms, which k depends on, to values near zero, creating numerical instability and consequently making this a poor
approach for solving for k. A better approach is to use the approximation:
a·b
k≈−
2
It is much easier to solve for this and iterate over it a couple of times. I.e. instead of using the approach above, we use:
2c
x= √
−b ± b2 − 4ac
Where we note that the “standard” quadratic solution is given by:
√
−b ± b2 − 4ac
x=
2a
It is important to consider both of these approaches because of floating-point precision accuracy - in any case, as we approach
the solution point, one of the two forms of these solutions will not perform well, and the other form will perform substantially
better.
With this, we are able to iteratively solve for k until our two vectors have a dot product approximately equal to zero. Once
this dot product is approximately zero, we are done! We have enforced orthonormality of the first two rows of the rotation
matrix, which allows us to find an orthonormal third row of the rotation matrix by taking a cross product of the first two rows.
Therefore, we then have an orthonormal rotation matrix.
156
for high camera calibration accuracy. They are typically mounted on the side of wheels so that they can be rotated. They are
typically given analytically by the plane:
Figure 61: A geometric view of a planar target, which is a calibration target whose shape is geometrically described by a plane.
As a result of this, we no longer need to determine the rotation matrix coefficients r13 , r23 , and r33 (i.e. the third column of
the rotation matrix, which determines how Zs affects Xc , Yc , and Zc in the rotated coordinate system. Mathematically:
Xc r11 r12 r13 Xs tx
Yc = r21 r22 r23 Ys + ty
Zc r31 r32 r33 0 tz
yI − yO Yc r21 Xs + r22 Ys + ty
= =
f Zc r31 Xs + r32 Ys + tz
And, as we saw before, dividing these equations by one another yields a simplified form relative to our last division for the
general case above:
r11 Xs +r12 Ys +tx
xI − xO r31 Xs +r32 Ys +tz
= r21 Xs +r22 Ys +ty
yI − y0
r31 Xs +r32 Ys +tz
r11 Xs + r12 Ys + tx
=
r21 Xs + r22 Ys + ty
As we did before for the general case, cross-multiplying and rearranging gives:
(XS yI0 )r11 + (YS yI0 )r12 + yI0 tx − (XS x0I )r21 − (YS x0I )r22 − x0I ty = 0
Now, rather than having 8 unknowns, we have 6: r11 , r12 , r21 , r22 , tx , ty . Because this is also a homogeneous equation, we again
can account for scale factor ambiguity by setting one of DOF/parameters to a fixed value. This in turn reduces the DOF from
6 to 5, which means we only need 5 correspondences to solve for now (compared to 7 in the general case). As before, if we have
more than 5 correspondences, this is still desirable as it will reduce estimation error and prevent overfitting.
One potential issue: if the parameter we fix, e.g. ty , evaluates to 0 in our solution, this can produce large and unstable value
estimates of other parameters (since we have to scale the parameters according to the fixed value). If the fixed parameter is
close to 0, then we should fix a different parameter.
157
20.2.7 Solving for tz and f
Now that we have solved for our other parameters, we can also solve for tz and f :
• These unknowns do not appear in other equations, but we can use our estimates from our other parameters to solve for tz
and f .
• We can again use the same equations that combine interior and exterior orientation:
Since we only have two unknowns (since we have found estimates for our other unknowns at this point), this now becomes
a much simpler problem to solve. With just 1 correspondence, we get both of the 2 equations above, which is sufficient
to solve for our 2 unknowns that remain. However, as we have seen with other cases, using more correspondences still
remains desirable, and can be solved via least squares to increase robustness and prevent overfitting.
• One problem with this approach: We need to have variation in depth. Recall that perspective projection has multipli-
cation by f and division by Z; therefore, doubling f and Z results in no change, which means we can only determine tz
and f as a ratio tfz , rather than separately. To remove this consequence of scale ambiguity, we need variation in depth,
i.e. the calibration object, such as a plane, cannot be orthogonal to the optical axis.
Figure 62: In order to obtain f and tz separately, and not just up to a ratio, it is crucial that the calibration object does not
lie completely orthogonal to the optical axis - else, these parameters exhibit high noise sensitivity and we can determine them
separately.
It turns out this problem comes up in wheel alignment - if machine vision is used for wheel alignment, the calibration
object will be placed at an angle of 45 or 60 degrees relative to the optical axis.
20.2.8 Wrapping it Up: Solving for Principal Point and Radial Distortion
To complete our joint interior/exterior orientation camera calibration problem, we need to solve for principal point and radial
distortion.
As it turns out, there is unfortunately no closed-form solution for finding the principal point. Rather, we minimize im-
age error using nonlinear optimization (e.g. Levenberg-Marquadt or Gauss-Newton). This error is the 2D error in the (x, y)
image plane that describes the error between predicted and observed points in the image plane:
Figure 63: Image error that we minimize to solve for our principal point and radial distortion. We can formulate the total
error from a set of observations as the squared errors from each correspondence. Note that xI refers to the observed point,
and xP refers to the predicted point after applying our parameter estimate transformations to a point on the calibration object
corresponding to an observed point.
158
Ideally, this error is zero after we have found an optimal set of parameters, i.e. for each i:
x Ii − x P i = 0
yIi − yPi = 0
Where do xPi and yPi come from? They come from applying:
• Rotation matrix estimate (R)
One small problem with this approach: LM only works for unconstrained optimization, which is not the case when we only
use orthonormal rotation matrices. To overcome this, Tsai’s Calibration Method used Euler Angles, and we can use either
Gibb’s vector or unit quaternions. Comparing/contrasting each of these 3 rotation representations to solve this problem:
• Euler Angles: This representation is non-redundant, but “blows up” if we rotate through 180 degrees.
• Gibb’s Vector: This representation is also non-redundant, but exhibits a singularity at θ = π.
• Unit Quaternions: This representation is redundant but exhibits no singularities, and this redundancy can be mitigated
by adding an additional equation to our system:
o o
q·q−1=0
Finally, although LM optimization finds local, and not global extrema, we have already developed an initial estimate of our
solution through the application of the combined interior/exterior orientation equations above, which ensures that extrema that
LM optimization finds are indeed not just locally optimal, but globally optimal.
159
21.1 Exterior Orientation: Recovering Position and Orientation
Consider the problem of a drone flying over terrain:
Figure 64: Example problem of recovering position and orientation: a drone flying over terrain that observes points for which it
has correspondences of in the image plane. The ground points are given by p1 , p2 , and p3 , and the point of interest is p0 . This
photogrammetry problem is sometimes referred to as Church’s tripod.
Problem Setup:
• Assume that the ground points p1 , p2 , and p3 are known, and we want to solve for p0 .
• We also want to find the attitude of the drone in the world: therefore, we solve for rotation + translation (6 DOF).
• We have a mix of 2D-3D correspondences: 2D points in the image plane that correspond to points in the image, and 3D
points in the world.
• Assume that the interior orientation of the camera is known (x0 , y0 , f ).
• Connect image to the center of projection.
How many correspondences do we Need? As we have seen with other frameworks, this is a highly pertinent question that
is necessary to consider for photogrammetric systems.
Since we have 6 DOF with solving for translation + rotation, we need to have at least 3 correspondences.
Figure 65: As a next step of solving this exterior orientation problem, we calculate the angles between the ground points relative
to the plane, as well as the lengths from the plane to the ground points. The angles of interest are given by: θ12 , θ13 , and θ2,3 ,
and the lengths of interest are r1 , r2 , and r3 .
• Given the problem setup above, if we have rays from the points on the ground to the points in the plane, we can calculate
angles between the ground points using dot products (cosines), cross products (sines), and arc tangents (to take ratios of
sines and cosines).
160
• We also need to know the lengths of the tripod - i.e. we need to find r1 , r2 , and r3 .
• From here, we can find the 3D point p0 by finding the intersection of the 3 spheres corresponding to the ground points
(center of spheres) and the lengths from the ground points to the plane (radii):
Figure 66: We can find the location of the plane p0 by finding the intersection of 3 spheres using the point information pi and
the length information ri .
Note that with this intersection of spheres approach, there is ambiguity with just three points/correspondences (this gives 2
solutions, since the intersection of 3 spheres gives 2 solutions). Adding more points/correspondences to the system reduces
this ambiguity and leaves us with 1 solution.
Another solution issue that can actually help us to reduce ambiguity with these approaches is that mirror images have a
mirror cyclical ordering of the points that is problematic. This allows us to find and remove “problematic solutions”.
What if I only care about attitude? That is, can I solve for only rotation parameters? Unfortunately not, since the
variables/parameters we solve for are coupled to one another.
Some laws of trigonometry are also helpful here, namely the law of sines and cosines:
Figure 67: For any triangle, we can use the law of sines and law of cosines to perform useful algebraic and geometric manipulations
on trigonometric expressions.
a b c
Law of Sines: sin A = sin B = sin C
We do not use these approaches here, but it turns out we can solve for the lengths between the different ground points (r12 , r23 ,
and r13 ) using these trigonometric laws.
161
Note that the vectors a1 , a2 , and a3 are our correspondences in the image plane. This means that we can now relate these two
coordinate systems. We can relate these via a rotation transformation:
We can first represent R as an orthonormal rotation matrix. It turns out we can actually solve this by considering all three
correspondences at once:
We can solve for R using matrix inversion, as below. Question to reflect on: Will the matrix inverse result be orthonormal?
∆
A = (â1 , â2 , â3 )
∆
B = (b̂1 , b̂2 , b̂3 )
RA = B =⇒ R = BA−1
In practice, as we saw with other photogrammetry frameworks, we would like to have more than 3 correspondences in tandem
with least squares approach. Can use estimates with first 3 correspondence to obtain initial guess, and then iteration to solve
for refined estimate. May reduce probability of converging to local minima.
Note that errors are in image, not in 3D, and this means we need to optimize in the image plane, and not in 3D.
Goal of BA: Determine the 3D locations of landmark objects and cameras in the scene relative to some global coordinate
system, as well as determine the orientation of cameras in a global frame of reference.
Figure 68: Bundle Adjustment (BA). In the general case, we can have any number of K landmark points (“interesting” points
in the image) and N cameras that observe the landmarks.
162
This approach is typically solved using Levenberg-Marquadt (LM) nonlinear optimization. Although there are many pa-
rameters to solve for, we can make an initial guess (as we did with our previous camera calibration approaches to avoid local
minima) to ensure that we converge to global, rather than local minima.
How do we find “interesting points” in the image? One way to do this is to use descriptors (high-dimensional vectors cor-
responding to image gradients), as is done with SIFT (Scale-Invariant Feature Transforms). Some other, more recent approaches
include:
• SURF (Speeded Up Robust Features)
• ORB (Oriented FAST and rotated BRIEF)
• BRIEF (Binary Robust Independent Elementary Features)
• VLAD (Vector of Locally Aggregated Descriptors)
Let’s consider several representations for effective alignment estimation and recognition:
• Polyhedra: When this is our object representation, we often consider this to be a class of “block world problems”. We can
describe these polyhedra objects semantically with edges, faces, and linked data structures. Because this is not a realistic
representation of the world, for systems in practice, we look for more complicated representations.
• Graphics/curves: These can be done with a mesh. Here, we consider any curved surface. Meshes can be thought of as
polyhedral objects with many facets (polygon faces). This representation is well-suited for outputting pictures.
Figure 69: Example of a mesh. Note that this can be constructed as many facets, where each facet is a polygon.
What kind of tasks are we interested in for meshes, and more generally, the rest of our object representations?
– Find alignment/pose (position/orientation): For alignment, we can accomplish this task by assigning correspon-
dences between vertices, but this is not a very effective approach because there is no meaning behind the vertices (i.e.
they are not necessarily deemed “interesting” points), and the vertex assignments can change each time the shape is
generated. Can do approximate alignment by reducing the distance to the nearest vertex iteratively and solving.
– Object recognition: We cannot simply compare numbers/types of facets to models in a library, because the generated
mesh can change each time the mesh is generated. It turns out that this is a poor representation for recognition.
Since neither of these representations lend themselves well for these two tasks, let us consider alternative representations. First,
let us consider what characteristics we look for in a representation.
163
Representations to consider given these properties:
• Generalized Cylinders:
– These can be thought of as extruding a circle along a line.
– In the generalized case, we can extrude an arbitrary cross-section along a line, as well as allow the line to be a general
curve and to allow the cross-section (generator) to vary in size.
Figure 71: Representing a sphere as a generalized cylinder - unfortunately, there are an infinite number of axes we can use to
“generate” the cylinder along.
– This same problem shows up elsewhere, especially when we allow for inaccuracies in data.
– This approach has not been overly successful in solving problems such as alignment, in practice.
– This leads us to pursue an alternative representation.
• Polyhedra Representation:
– Let’s briefly revisit this approach to get a good “starting point” for our object representation that we can solve
alignment and recognition problems.
– One way, as we have seen before, was to take edges and faces, and to build a graph showing connectedness of polyhedra.
– Instead, we take the unit normal vector of faces and multiply them by the area of the face/polygon.
164
Figure 72: Polyhedra approach in which we represent faces by the unit normal vector multiplied by the area of the face of the
polyhedra representation.
– Despite throwing away information about the connectedness of faces, Minkowski’s proof shows that this generates a
unique representation for convex polyhedra.
– Note that this proof is non-constructive, i.e. there was no algorithm given with the proof that actually solves the
construction of this representation. There is a construction algorithm that solves this, but it is quite slow, and, as it
turns out, are not needed for recognition tasks.
– Further, note that the sum of vectors form a closed loop:
N
X
ni = 0
i=1
Therefore, any object that does not satisfy this is non-convex polyhedra.
Let’s consider what this looks like for the following cylindrical/conic section. Consider the surface normals on the cylindrical
component (Ai n̂i ) and the surface normals on the conic component (Bi n̂i ). Note that we still need to be careful with this
representation, because each mesh generated from this shape can be different.
Figure 73: A shape represented by a mesh that we will convert to an extended Gaussian sphere, as we will see below. Note that
the surface normals Ai n̂i and Bi n̂i each map to different great circles on the sphere, as can be seen in the figure below.
Idea: Combine facets that have similar surface normals and represent them on the same great circle when we convert to a
sphere.
What does this look like when we map these scaled surface normals to a unit sphere?
Idea: Place masses in great circle corresponding to the plane spanned by the unit normals of the cylindrical and conic
sections above. Note that the mass of the back plate area of our object occupies a single point on the mapped sphere, since
every surface normal in this planar shape is the same.
165
Figure 74: Extended Gaussian sphere mapping, which maps the surface normals of arbitrary (convex) surfaces to the sphere
such that the surface vector corresponds to a scaled unit normal in both the original surface and the sphere.
Therefore, this sphere becoems our representation! This works much better for alignment and recognition. Note that we
work in a sphere space, and not a high-dimensional planar/Euclidean space. Therefore:
• Comparisons (e.g. for our recognition task) can be made by comparing masses using distance/similarity functions.
• Alignment can be adjusted by rotating these spheres.
How well does this representation adhere to our desired properties above?
• Translation Invariance: Since surface normals and areas do not depend on location, translation invariance is preserved.
• Rotation Equivariance: Since rotating the object corresponds to just rotating the unit sphere normals without changing
the areas of the facets, rotation of the object simply corresponds to an equal rotation of the sphere.
Loosely, this representation corresponds to density. This density is dependent on the degree of curvature of the surface we are
converting to a sphere representation:
• Low density ↔ High curvature
• High density ↔ Low curvature
To better understand this representation in 3D, let us first understand it in 2D.
Figure 75: Representation of our extended Gaussian image in 2D. The mapping between the surface and the sphere is determined
by the points with the same surface normals.
Gauss: This extended Gaussian image representation is a way to map arbitrary convex objects to a sphere by mapping
points with the same surface normals.
We can make this (invertible) mapping in a point-point fashion, and subsequently generalize this to mapping entire convex
shapes to a sphere. There are issues with non-convex objects and invertibility because the forward mapping becomes a surjec-
tive mapping, since multiple ponts can have the same surface normal, and therefore are mapped to the same location on the
circle.
166
Figure 76: When surfaces are non-convex, we run into issues with inverting our mapping, because it may be surjective, i.e. some
points on the surface may map to the same point on the sphere due to having the same surface normal.
Idea: Recall that our idea is to map from some convex 2D shape to a circle, as in the figure below.
Note that in this case, density depends on which region (and consequently the degree of curvature) of the object above. We
are interested in this as a function of angle η. Analytically, our curvature and density quantities are given by:
dη 1
• Curvature: κ = dS = Rcurvature (The “turning rate”)
1 dS
• Density: G = κ = dη (inverse of curvature)
These are the only quantities we need for our representation mapping! We just map the density G = dS dη from the object to the
unit sphere. This is invertible in 2D for convex objects, though this invertibility is not necessarily the case in 3D.
δx = − sin(η)
δy = cos(η)
167
Then we can integrate to find x and y:
Z
x = x0 + − sin(η)dS
Z
dS
x = x0 + − sin(η) dη
dη
Z
x = x0 + − sin(η)G(η)dη
What happens if we integrate over the entire unit circle in 2D? Then we expect these integrals to evaluate to 0:
Z 2π
− sin(η)G(η)dη = 0
0
Z 2π
cos(η)G(η)dη = 0
0
These conditions require that the centroid of density G(η) is at the origin, i.e. that the weighted sum of G(η) = 0.
Note that while this holds for spheres, we can actually generalize it beyond spheres too (making it of more practical value). This
is because we can fit curves of arbitrary shape by fitting a circle locally and finding the radius of the best-fitting circle.
Note that because density is equal to R, this density increases with increasing radius. Because density is the same everywhere,
we cannot recover orientation as there is not uniqueness of each point. Let’s try using an ellipse in 2D.
• “Squashed circle”: x = a cos θ, y = b sin θ (note that θ is the variable we parameterize these parametric equations over.
Because this takes a parametric form, this form is better for generating ellipses than the first implicit form because we can
simply step through our parameter θ.
Next, let’s compute the normal vector to the curve to derive a surface normal. We can start by computing the tangent vector.
First, note that the vector r describing this parametric trajectory is given by:
Then the tangent vector is simply the derivative of this curve with respect to our parameter θ:
dr
t= = (−a sin θ, b cos θ)T
dθ
Finally, in 2D we can compute the normal vector to the surface by switching x and y of the tangent vector and inverting the
sign of y (which is now in x’s place):
168
We want this vector to correspond to our normal in the sphere:
Then we have:
b cos θ = n cos η
a sin θ = n sin η
n2 = b2 cos2 θ + a2 sin2 θ
s = (a cos η, b sin η)T
3
s·s 2
K=
ab
n2 ((a cos η)2 + (b sin η)2 ) = (ab)2
π
These derivations allow us to find the extrema of curvature: ellipses will have 4 of them, spaced 2 apart from one another
(alternating minimums and maximums): η = 0, η = π2 , η = π, η = 3π
2 .
π
Figure 79: Curvature extrema of an ellipse: An ellipse has 4 extrema, spaced 2 apart from one another. Extrema depend on
the major and minor axes of the ellipse.
If a is the major axis and b is the minor axis, then extrema of curvature are given by:
a
• Maximum: κ = b2
b
• Minimum: κ = a2
Without loss of generality, we can flip a and b if the major and minor axes switch.
For alignment, we can rotate these ellipses until their orientations match. If the alignment is not constructed well, then the
object is not actually an ellipse.
Geodetic Identity: Related, this identity relates geocentric angles to angles used for computing latitudes:
b cos θ = n cos η
a sin θ = n sin η
a(tan θ) = b(tan η)
What are some other applications of this in 2D? We can do (i) circular filtering and circular convolution!
169
Although curvature in 3D is more complicated, we can just use our notion of Gaussian curvature, which is a single scalar.
As long as our object is convex, going around the shape in one direction in the object corresponds to going around the sphere
in the same direction (e.g. counterclockwise).
Example of non-convex object: Saddle Point: As we mentioned before, we can run into mapping issues when we have
non-convex objects that we wish to map, such as the figure below:
Figure 80: An example of a non-convex object: a saddle point. Notice that some of the surface normals are the same, and
therefore would be mapped to the same point on the sphere.
Note further that traveling/stepping an object in the reverse order for a non-convex object inverts the curvature κ, and
therefore allows for a negative curvature κ < 0.
Example of 3D Gaussian Curvature: Sphere of Radius R: This results in the following (constant) curvature and density,
as we saw in the 2D case:
1
• Curvature: κ = R2
• Density: G(η) = R2
Extensions: Integral Curvature: Next time, we will discuss how Gaussian curvature allows for us to find the integral of
Gaussian curvature for recognition tasks.
22 Lecture 23: Gaussian Image and Extended Gaussian Image, Solids of Rev-
olution, Direction Histograms, Regular Polyhedra
In this lecture, we cover more on the Gaussian images and extended Gaussian images (EGI), as well as integral Gaussian images,
some properties of EGI, and applications of these frameworks.
Gaussian Image: A correspondence between points on an image and corresponding points on the unit sphere based off of
equality of the surface normals.
Figure 81: Mapping from object space the Gaussian sphere using correspondences between surface normal vectors. Recall that
this method can be used for tasks such as image recognition and image alignment.
Recall that we defined the Gaussian curvature of an (infinitesimal) patch on an object as a scalar that corresponds to the
ratio of the areas of the object and the Gaussian sphere we map the object to, as we take the limit of the patch of area on the
170
object to be zero:
δS
κGaussian = lim
δO
δ→0
= The ratio of areas of the image and the object
1
Rather than use Gaussian curvature, however, we will elect to use the inverse of this quantity (which is given by density) κ,
which roughly corresponds to the amount of surface that contains that surface normal.
We can also consider the integral of Gaussian curvature, and the role this can play for dealing with object discontinuities:
Note: One nice feature of integral curvature: This can be applied when curvature itself cannot be applied, e.g. at disconti-
nuities such as edges and corners.
Figure 82: An application in which integral Gaussian curvature can be used in lieu of the curvature function. The smoothing
properties of integration enable for the mitigation of discontinuity affects.
Next, let’s consider the different forms our mass density distribution G can take.
171
22.1.3 Can We Have Any Distribution of G on the Sphere?
It is important to understand if we need to impose any restrictions on our mass density distribution. Let’s analyze this in both
the discrete and continuous cases. The results from the discrete case extend to the continuous case.
• Discrete Case: Consider an object that we have discretized to a set of polygon facets, and consider that we have a camera
that observes along an optical axis v̂, and a mirror image of this camera that observes along the optical axis −v̂. Note
that any surface normals that are not parallel to these optical axes will have foreshortening effects imposed.
Figure 83: Problem setup in the discrete case. Consider our object parameterized by a discrete number of finite polygon facets,
and a camera and its mirror image observing the different surface normals of these facets from two opposite vantage points.
First, consider only taking the facets with positive dot product (i.e. 90 degrees or less) away from the vector v̂, and sum
the product of these facet area multiplied by normal and the unit vector corresponding to the optical axis v̂:
X
Ai ŝi · v̂
ŝi ·v̂≥0
Now, consider the other facets, that is, those for which we take the facets with positive dot product (i.e. 90 degrees or
less) away from the vector −v̂, which is 180 degrees rotated from v̂, and applying the same operation:
X
Ai ŝi · (−v̂)
ŝi ·(−v̂)≥0
172
Therefore, for both the discrete and continuous cases, we can think of the Extended Gaussian Image (EGI) as a mass distribution
over the sphere that objects are mapped to according to their surface normals.
Next, let’s look at some discrete applications. Continuous EGI can be useful as well, especially if we are using an analyti-
cal model (in which case we can compute the exact EGI).
We saw that because the curvature and density are the same everywhere, that this imposes limitations on the effectiveness of
this surface for recognition and alignment tasks. Let’s next consider something with more complicated structure.
x = a cos θ cos φ
y = b sin θ cos φ
z = c sin φ
We will elect to use the second of these representations because it allows for easier and more efficient generation of points by
parameterizing θ and φ and stepping these variables.
As we did last time for the 2D ellipse, we can express these parametric coordinates as a vector parameterized by θ and φ:
And similarly to what we did before, to compute the surface normal and the curvature, we can first find (in this case, two)
tangent vectors, which we can find by taking the derivative of r with respect to θ and φ:
dr
∆
rθ = = (−a sin θ cos φ, b cos θ cos φ, 0)T
dθ
∆ dr
rφ = = (−a cos θ sin φ, −b sin θ sin φ, c cos φ)T
dφ
Then we can compute the normal vector by taking the cross product of these two tangent vectors of the ellipsoid. We do not
carry out this computation, but this cross product gives:
Where we have dropped the cos φ factor because we will need to normalize this vector anyway. Note that this vector takes a
similar structure to that of our original parametric vector r.
With our surface normals computed, we are now equipped to match surface normals on the object to the surface normals
on the unit sphere. To obtain curvature, our other quantity of interest, we need to differentiate again. We will also change our
173
parameterization from (θ, φ) → (ξ, η), since (θ, φ) parameterize the object in its own space, and we want to parameterize the
object in unit sphere space (ξ, η).
n̂ = (cos ξ cos η, sin ξ cos η, sin η)T
s̄ = (a cos ξ cos η, b sin ξ cos η, sin η)T
2
s·s
κ=
abc
2
1 abc
G= =
K s·s
With the surface normals and curvature now computed, we can use the distribution G on the sphere for our desired tasks of
recognition and alignment!
Related, let us look at the extrema of curvature for our ellipsoid. These are given as points along the axes of the ellipsoids:
2
1. bca
ac 2
2. b
2
ab
3. c
For these extrema, there will always be one maxima, one minima, and one saddle point. The mirror images of these extrema
(which account for the mirrors of each of these three extrema above) exhibit identical behavior and reside, geometrically, on the
other side of the sphere we map points to.
How Well Does this Ellipsoid Satisfy Our Desired Representation Properties?
• Translation invariance: As we saw with EGI last time, this representation is robust to translation invariance.
• Rotation “Equivariance”: This property works as desired as well - rotations of the ellipsoid correspond to equivariant
rotations of the EGI of this ellipse.
Figure 84: EGI representation of a generalized solid of revolution. Note that bands in the object domain correspond to bands
in the sphere domain.
174
As we can see from the figure, the bands “map” into each other! These solids of revolution are symmetric in both the object
and transform space. Let’s look at constructing infinitesimal areas so we can then compute Gaussian curvature κ and density G:
δS 2π cos(η)δη cos(η)δη
κ= = =
δO 2πrδs rδs
1 δO δs
G= = = r sec(η)
κ δS δη
Then in the limit of δ → 0, our curvature and density become:
δS cos η dη dη
κ = lim = (Where is the rate of change of surface normal direction along the arc, i.e. curvature)
δ→OδO r ds ds
δO ds ds
G = lim = r sec η (Where is the rate of change of the arc length w.r.t. angle)
δ→0 δS dη dη
cos η
κ= κ
r
Figure 85: Our infinitesimal trigonometric setup that we can use to calculate Gaussian curvature.
175
Figure 86: Cross-section of a 3D sphere in 2D. We can apply our framework above to derive our same result for Gaussian
curvature.
Mathematically:
S
r = R cos
R
1 S 1 S
rss = R − 2 cos = − cos
R R R R
Alternatively, rather than expressing this in terms of R and S, we can also express as a function of z (with respect to the diagram
above). Let’s look at some of the quantities we need for this approach:
dr
tan η = − = −rz (First derivative)
dz
dη d dz
sec2 = (−rz ) = −rzz (By Chain Rule)
ds ds ds
sec2 η = 1 + tan2 η
dz
cos η = = −zs
dη
Putting all of these equations together:
−rzz
κ=
r(1 + z 2 )2
p
r = R2 − Z 2
Z
rz = − √
R − Z2
2
R2
rzz = − 3
(R2 − Z 2 ) 2
R2
1 + rz2 = 2
R − Z2
−rzz 1
Then: κ = 2 2
= 2
r(1 + z ) R
One particular EGI, and the applications associated with it, is of strong interest - the torus.
176
Figure 87: Geometric cross-section of the torus, along with its associated parameters and their significance that describe the
torus.
What problems might we have with this object? It is non-convex. This Gaussian image may not be invertible when
objects are non-convex. We lose uniqueness of mapping, as well as some other EGI properties, by using this non-convex object.
For the torus, the mapping from the object to the sphere is not invertible - each point in the EGI maps to two points
on the object, which means the surface normal of the object is not unique.
The torus is convex into the board. Convex shapes/objects have non-negative curvature, and above we have a saddle
point with negative curvature.
We will also repeat this calculation for the corresponding surface normal on the other side. We will see that this results in
different sign and magnitude:
r = R − ρ cos η
S
= R − ρ cos
ρ
From here, we can take the second derivative rss :
d2 r
1 s
rss = 2
= cos
ds ρ ρ
Combining/substituting to solve for curvature κ:
−rss 1 cos Sρ
κ= =−
r ρ R + ρ cos S
ρ
Since we have two different Gaussian curvatures, and therefore two separate densities for the same surface normal, what do we
do? Let’s discuss two potential approaches below.
177
1. Approach 1: Adding Densities: For any non-convex object in which we have multiple points with the same surface
normals, we can simply add these points together in density:
1 1
G= + = 2ρ2
κ+ κ−
Even though this approach is able to cancel many terms, it creates a constant density, which means we will not be able to
solve alignment/orientation problems. Let’s try a different approach.
2. Approach 2: Subtracting Densities: In this case, let us try better taking into account local curvature by subtracting
instead of adding densities, which will produce a non-constant combined density:
1 1
G= − = 2Rρ sec(η)
κ+ κ−
This result is non-constant (better for solving alignment/orientation problems), but we should note that because of the
secant term, it has a singularity at the pole. We can conceptualize this singularity intuitively by embedding a sphere within
an infinite cylinder, and noting that as we approach the singularity, the mapped location climbs higher and higher on the
cylinder.
π
Figure 88: Embedding our sphere within a cylinder geometrically illustrates the singularity we observe for sec η as η → 2.
Note: Our alignment and recognition tasks are done in sphere space, and because of this, we do not need to reconstruct the
original objects after applying the EGI mapping.
Figure 89: Computing the EGI of a torus, and demonstrating how we do not have the same “band to band” correspondence/map-
ping that we had in the previous case, suggesting that this mapping may not be invertible.
Note: On a torus, the band is slightly narrower on the inner side. Can balance out this asymmetry by using the band on
the opposite side.
178
We can also consider the area of a torus:
Atorus = (2πρ)(2πR)
= 4π 2 Rρ
= “A circle of circles”
As a result of how this area is structured, two “donuts” (tori) of different shape/structure but with the same area correspond/map
to the same EGI, since EGI captures a ratio of areas. This loss of uniqueness is due to the fact that this torus representation
still remains non-convex - in a sense this is the “price we pay” for using a non-convex representation.
Figure 90: A triangular facet, which we can imagine we use to compute our EGI numerically.
As we have done before, we can find the surface normal and area of this facet to proceed with our EGI computation:
• Surface Normal: This can be computed simply by taking the cross product between any of the two edges of the triangular
facet, e.g.
n = (b − a) × (c − b)
= (a × b) + (b × c) + (c × a)
• Area: This can be computed by recalling the area of a triangular ( 21 base × height):
1
A= (b − a) · (c − a)
2
1
= ((a · a + b · b + c · c) − (a · b + b · c + c · a))
6
We can repeat this area and surface normal computation on all of our facets/elements to compute the mass distribution over the
sphere by adding different facets from all over surface. We add rather than subtract components to ensure we have a non-constant
EGI, and therefore have an EGI representation that is suitable for alignment tasks.
Typically, these direction histograms can be used to ascertain what the most common orientations are amongst a set of
orientations.
179
For histograms in higher dimensions, is subdividing the region into squares/cubes the most effective approach? As it turns
out, no, and this is because the regions are not round. We would prefer to fill in the plane with disks.
To contrast, suppose the tessellation is with triangles - you are now combining things that are pretty far away, compared
to a square. Hexagons alleviate this problem by minimizing distance from the center of each cell to the vertives, while also
preventing overlapping/not filled in regions. Other notes:
• We also need to take “randomness” into account, i.e. when points lie very close to the boundaries between different bins.
We can account for this phenomena by constructing and counting not only the original grids, but additionally, shifted/offset
grids that adjust the intervals, and consequently the counts in each grid cell.
The only issue with this approach is that this “solution” scales poorly with dimensionality: As we increase dimensions, we
need to take more grids (e.g. for 2D, we need our (i) Original plane, (ii) Shifted x, (iii) Shifted y, and (iv) Shifted x and
shifted y). However, in light of this, this is a common solution approach for mitigating “randomness” in 2D binning.
For instance, with a dodecahedron, our orientation/direction histogram is represented by 12 numbers (which cor-
respond to the number of faces on the dodecahedron):
[A1 , A2 , · · · , A12 ]
When we bring this object into alignment during, for instance, a recognition task, we merely only need to permute the
order of these 12 numbers - all information is preserved and there is no loss from situations such as overlapping. This is
an advantage of having alignment of rotation.
[A7 , A4 , · · · , A8 ]
Platonic and Archimedean solids are the representations we will use for these direction histograms.
23 Quiz 1 Review
Here you will find a review of some of the topics covered so far in Machine Vision. These are as follows in the section notes:
1. Mathematics review - Unconstrained optimization, Green’s Theorem, Bezout’s Theorem, Nyquist Sampling Theorem
180
7. Photometric Stereo - least squares, multiple measurement variant, multiple light sources variant
8. Computational molecules - Sobel, Robert, Silver operators, finite differences (forward, backward, and average), Lapla-
cians
9. Lenses - Thin lenses, thick lenses, telecentric lens, focal length, principal planes, pinhole model
10. Patent Review - Edge detector, Object Detection
• Bézout’s Theorem: The maximum number of solutions is the product of the polynomial order of each equation in the
system of equations:
E
Y
number of solutions = oe
e=1
• Nyquist Sampling Theorem: We must sample at twice the frequency of the highest-varying component of our image
to avoid aliasing and consequently reducing spatial artifacts.
• Taylor Series: We can expand any analytical, continuous, infinitely-differentiable function into its Taylor Series form
according to:
∞
(δx)2 00 (δx)3 000 (δx)4 (4) X (δx)i f (i) (x) ∆
f (x + δx) = f (x) + δxf 0 (x) + f (x) + f (x) + f (x) + ... = , where 0! = 1
2! 3! 24 i=0
i!
∂f (x, y) ∂f (x, y)
f (x + δx , y + δy ) = f (x, y) + δx + δy + ···
∂x ∂y
For a multivariable function.
181
• We can derive perspective projection from the pinhole model and similar triangles.
Perspective Projection:
x X y Y
= , = (component form)
f Z f Z
1 1
r= R (vector form)
f R · ẑ
Orthographic Projection:
f f
x= X, y = y
Z0 Z0
dE(x, y, t) dx ∂E dy ∂E ∂E
= + + =0
dt dt ∂x dt ∂y ∂t
Rewriting this in terms of u, v from above:
uEx + vEy + Et = 0
This equation above is known as the Brightness Change Constraint Equation (BCCE). This is also one of the most
important equations in 2D optical flow.
Normalizing the equation on the right by the magnitude of the brightness derivative vectors, we can derive the brightness
gradient:
!
Ex Ey Et
(u, v) · q ,q = (u, v) · Ĝ = − q
2
Ex + Ey2 2 2
Ex + Ey Ex + Ey2
2
!
Ex Ey
Brightness Gradient : Ĝ = q ,q
Ex2 + Ey2 Ex2 + Ey2
• Measures spatial changes in brightness in the image in the image plane x and y directions.
182
We are also interested in contours of constant brightness, or isophotes. These are curves on an illuminated surface that connects
points of equal brightness (source: Wikipedia).
Finally, we are also interested in solving for optimal values of u and v for multiple measurements. In the ideal case with-
out noise:
U 1 Ey2 −Ey1 −Et1
=
V (Ex1 Ey2 − Ey1 Ex2 ) −Ex2 Ex1 −Et2
When there is noise we simply minimize our objective, instead of setting it equal to zero:
Z Z
∆
J(u, v) = (uEx + vEy + Et )2 dxdy
x∈X y∈Y
We solve for our set of optimal parameters by finding the set of parameters that minimizes this objective:
Z Z
u∗ , v ∗ = arg min J(u, v) = arg min (uEx + vEy + Et )2 dxdy
u,v u,v x∈X y∈Y
Since this is an unconstrained optimization problem, we can solve by finding the minimum of the two variables using two
First-Order Conditions (FOCs):
∂J(u,v)
• ∂u =0
∂J(u,v)
• ∂v =0
Vanishing points: These are the points in the image plane (or extended out from the image plane) that parallel lines in the
world converge to. Applications include:
• Multilateration
• Calibration Objects (Sphere, Cube)
• Camera Calibration
We will also look at Time to Contact (TTC):
Z ∆ Z meters
== dZ = meters = seconds
W dt seconds
Let us express the inverse of this Time to Contact (TTC) quantity as C, which can be interpreted roughly as the number of
frames until contact is made:
∆ W 1
C= =
Z TTC
23.4 Photometry
Here, we will mostly focus on some of the definitions we have encountered from lecture:
• Photometry: Photometry is the science of measuring visible radiation, light, in units that are weighted according to the
sensitivity of the human eye. It is a quantitative science based on a statistical model of the human visual perception of
light (eye sensitivity curve) under carefully controlled conditions.
• Radiometry: Radiometry is the science of measuring radiation energy in any portion of the electromagnetic spectrum.
In practice, the term is usually limited to the measurement of ultraviolet (UV), visible (VIS), and infrared (IR) radiation
using optical instruments.
∆
• Irradiance: E = δP 2
δA (W/m ). This corresponds to light falling on a surface. When imaging an object, irradiance is
converted to a grey level.
∆ δP
• Intensity: I = δW (W/ster). This quantity applied to a point source and is often directionally-dependent.
∆ δ2 P
• Radiance: L = δAδΩ (W/m2 × ster). This photometric quantity is a measure of how bright a surface appears in an image.
183
• BRDF (Bidirectional Reflectance Distribution): f (θi , θe , φi , φe ) = δL(θ e ,φe )
δE(θi ,φi ) . This function captures the fact that
oftentimes, we are only interested in light hitting the camera, as opposed to the total amount of light emitted from an
object. Last time, we had the following equation to relate image irradiance with object/surface radiance:
π d 2
E= L cos4 α
4 f
Where the irradiance of the image E is on the lefthand side and the radiance of the object/scene L is on the right. The
BRDF must also satisfy Helmholtz reciprocity, otherwise we would be violating the 2nd Law of Thermodynamics.
Mathematically, recall that Helmholtz reciprocity is given by:
f (θi , θe , φi , φe ) = f (θe , θi , φe , φi ) ∀ θi , θe , φi , φe
∆ ∂z
• q= ∂y
• Lambertian Surfaces:
– Ideal Lambertian surfaces are equally bright from all directions, i.e.
f (θi , θe , φi , φe ) = f (θe , θi , φe , φi ) ∀ θi , θe , φi , φe
AND
f (θi , θe , φi , φe ) = K ∈ R with respect to θe , φe
– “Lambert’s Law”:
Ei ∝ cos θi = n̂ · ŝi
• Hapke Surfaces:
– The BRDF of a Hapke surface is given by:
1
f (θi , φi , θe , φe ) = √
cos θe cos θi
184
For these problems, we considered:
– Characteristic strips (x, y, z, p, q)
– Initial curves/base characteristics
– Normalizing with respect to constant step sizes
– A system of 5 ODEs
– Stationary points and estimates of surfaces around them for initial points
Next, let us review reflectance maps. A reflectance map R(p, q) is a lookup table (or, for simpler cases, a parametric function)
∂z ∂z
that stores the brightness for particular surface orientations p = ∂x , q = ∂y .
The Image Irradiance Equation relates the reflectance map to the brightness function in the image E(x, y) and is the
first step in many Shape from Shading approaches.
E(x, y) = R(p, q)
One way we can solve photometric stereo is by taking multiple brightness measurements from a light source that we move
around. This problem becomes:
−sT1
E1
T
−s2 n = E2
−sT3 E3
Written compactly:
Sn = E −→ n = S−1 E
Note that we we need S to be invertible to compute this, which occurs when the light source vectors are not coplanar.
6. Robert’s Cross: This approximates derivatives in a coordinate system rotated 45 degrees (x0 , y 0 ). The derivatives can
be approximated using the Kx0 and Ky0 kernels:
∂E 0 −1
→ Kx0 =
∂x0 −1 0
∂E 1 0
→ Ky0 =
∂y 0 0 −1
185
7. Sobel Operator: This computational molecule requires more computation and it is not as high-resolution. It is also more
robust to noise than the computational molecules used above:
−1 0 1
∂E
→ Kx = 2 0 2
∂x
−1 0 1
−1 2 −1
∂E
→ Ky = 0 0 0
∂y
1 2 1
8. Silver Operators: This computational molecule is designed for a hexagonal grid. Though these filters have some advan-
tages, unfortunately, they are not compatible with most cameras as very few cameras have a hexagonal pixel structure.
0 1 0
9. “Direct Edge” Laplacian: 12 1 −4 1
0 1 0
1 0 1
10. “Indirect Edge” Laplacian: 212 0 −4 0
1 0 1
11. Rotationally-symmetric Laplacian:
1 0 1 0 1 1 0 1
1
1 4 1
4 2 1 −4 1 + 1 0 −4 0 = 2 4 −20 4
22 6
0 1 0 1 0 1 1 4 1
3. Fourier domain: This type of analysis is helpful for understanding how these “stencils”/molecules affect higher (spatial)
frequency image content.
186
23.9 Lenses
Lenses are also important, because they determine our ability to sense light and perform important machine vision applications.
Some types of lenses:
• Thin lenses are the first type of lens we consider. These are often made from glass spheres, and obey the following three
rules:
– Central rays (rays that pass through the center of the lens) are undeflected - this allows us to preserve perspective
projection as we had for pinhole cameras.
– The ray from the focal center emerges parallel to the optical axis.
– Any parallel rays go through the focal center.
• Thick lenses (cascaded thin lenses)
• Telecentric lenses - These “move” the the Center of Projection to infinity to achieve approximately orthographic pro-
jection.
• Potential distortions caused by lenses:
– Radial distortion: In order to bring the entire angle into an image (e.g. for wide-angle lenses), we have the “squash”
the edges of the solid angle, thus leading to distortion that is radially-dependent. Typically, other lens defects are
mitigated at the cost of increased radial distortion. Some specific kinds of radial distortion [5]:
∗ Barrel distortion
∗ Mustache distortion
∗ Pincushion distortion
– Lens Defects: These occur frequently when manufacturing lenses, and can originate from a multitude of different
issues.
187
24 Quiz 2 Review
24.1 Relevant Mathematics Review
We’ll start with a review of some of the relevant mathematical tools we have relied on in the second part of the course.
The intuitive idea behind them: How do I prevent my parameters from becoming too large) positive or negative) or too small
(zero)? We can accomplish this by dividing our objective by our parameters, in this case our constraint. In this case, with the
Rayleigh Quotient taken into account, our objective becomes:
oT o
oT o q Nq
min q N q −→ min
o o o o oT o
q,q·q=1 q q q
Levenberg-Marquadt (LM) and Gauss-Newton (GN) are two nonlinear optimization procedures used for deriving solutions to
nonlinear least squares problems. These two approaches are largely the same, except that LM uses an additional regularization
term to ensure that a solution exists by making the closed-form matrix to invert in the normal equations positive semidefinite.
The normal equations, which derive the closed-form solutions for GN and LM, are given by:
1. GN: (J(θ)T J(θ))−1 θ = J(θ)T e(θ) =⇒ θ = (J(θ)T J(θ))−1 J(θ)T e(θ)
2. LM: (J(θ)T J(θ) + λI)−1 θ = J(θ)T e(θ) =⇒ θ = (J(θ)T J(θ) + λI)−1 J(θ)T e(θ)
Where:
• θ is the vector of parameters and our solution point to this nonlinear optimization problem.
• J(θ) is the Jacobian of the nonlinear objective we seek to optimize.
• e(θ) is the residual function of the objective evaluated with the current set of parameters.
Note the λI, or regularization term, in Levenberg-Marquadt. If you’re familiar with ridge regression, LM is effectively ridge
regression/regression with L2 regularization for nonlinear optimization problems. Often, these approaches are solved iteratively
using gradient descent:
1. GN: θ(t+1) ← θ(t) − α(J(θ(t) )T J(θ(t) ))−1 J(θ(t) )T e(θ(t) )
2. LM: θ(t+1) ← θ(t) − α(J(θ(t) )T J(θ(t) ) + λI)−1 J(θ(t) )T e(θ(t) )
Where α is the step size, which dictates how quickly the estimates of our approaches update.
188
24.1.4 Bezout’s Theorem
Though you’re probably well-versed with this theorem by now, its importance is paramount for understanding the number of
solutions we are faced with when we solve our systems:
Theorem: The maximum number of solutions is the product of the polynomial order of each equation in the system of equa-
tions:
E
Y
number of solutions = oe
e=1
24.2 Systems
In this section, we’ll review some of the systems we covered in this course through patents, namely PatQuick, PatMAx, and Fast
Convolutions. A block diagram showing how we can cascade the edge detection systems we studied in this class can be found
below:
Figure 92: An overview of how the patents we have looked at for object inspection fit together.
24.2.1 PatQuick
There were three main “objects” in this model:
• Training/template image. This produces a model consisting of probe points.
• Model, containing probe points.
• Probe points, which encode evidence for where to make gradient comparisons, i.e. to determine how good matches
between the template image and the runtime image under the current pose configuration.
Once we have the model from the training step, we can summarize the process for generating matches as:
1. Loop over/sample from configurations of the pose space (which is determined and parameterized by our degrees of freedom),
and modify the runtime image according to the current pose configuration.
2. Using the probe points of the model, compare the gradient direction (or magnitude, depending on the scoring function)
to the gradient direction (magnitude) of the runtime image under the current configuration, and score using one of the
scoring functions below.
3. Running this for all/all sampled pose configurations from the pose space produces a multidimensional scoring surface. We
can find matches by looking for peak values in this surface.
24.2.2 PatMAx
• This framework builds off of the previous PatQuick patent.
• This framework, unlike PatQuick, does not perform quantization of the pose space, which is one key factor in enabling
sub-pixel accuracy.
• PatMAx assumes we already have an approximate initial estimate of the pose.
• PatMAx relies on an iterative process for optimizing energy, and each attraction step improves the fit of the configuration.
• Another motivation for the name of this patent is based off of electrostatic components, namely dipoles, from Maxwell. As
it turns out, however, this analogy works better with mechanical springs than with electrostatic dipoles.
• PatMAx performs an iterative attraction process to obtain an estimate of the pose.
189
• An iterative approach (e.g. gradient descent, Gauss-Newton, Levenberg-Marquadt) is taken because we likely will not
have a closed-form solution in the real world. Rather than solving for a closed-form solution, we will run this iterative
optimization procedure until we reach convergence.
The goal of this system is to efficiently compute filters for multiscale. For this, we assume the form of an Nth -order piece-
wise polynomial, i.e. a Nth -order spline.
Figure 94: Block diagram of this sparse/fast convolution framework for digital filtering. Note that this can be viewed as a
compression problem, in which differencing compresses the signal, and summing decompresses the signal.
190
A few notes on this system:
• Why is it of interest, if we have Nth -order splines as our functions, to take Nth -order differences? The reason for this is
that the differences create sparsity, which is critical for fast and efficient convolution. Sparsity is ensured because:
N
dN +1 X
N +1
f (x) = 0 ∀ x if f (x) = ai xi , ai ∈ R ∀ i ∈ {1, · · · , N }
dx i=0
(I.e, if f (x) is a order-N polynomial, then the order-(N+1) difference will be 0 for all x.
This sparse structure makes convolutions much easier and more efficient to compute by reducing the size/cardinality of
the support.
• Why do we apply an order-(N+1) summing operator? We apply this because we need to “invert” the effects of the
order-(N+1) difference:
First Order : DS = I
Second Order : DDSS = DSDS = (DS)(DS) = II = I
..
.
Order K : (D)K (S)K = (DS)K = I K = I
Idea: The main idea of the Hough Transform is to intelligently map from image/surface space to parameter space for
that surface.
Figure 95: Example of finding parameters in Hough Space via the Hough Transform.
191
24.3 Photogrammetry
Given the length of the mathematical derivations in these sections, we invite you to revisit notes in lectures 17-21 for a more
formal treatment of these topics. In this review, we hope to provide you with strong intuition about different classes of pho-
togrammetric problems and their solutions.
Below we discuss each of these problems at a high level. We will be discussing these problems in greater depth later in this and
following lectures.
Figure 96: General case of absolute orientation: Given the coordinate systems (xl , yl , zl ) ∈ R3×3 and (xr , yr , zr ) ∈ R3×3 , our
goal is to find the transformation, or pose, between them using points measured in each frame of reference pi .
192
Figure 97: Binocular stereo system set up. For this problem, recall that one of our objectives is to measure the translation, or
baseline, between the two cameras.
More generally, with localization, our goal is to find where we are and how we are oriented in space given a 2D image
and a 3D model of the world.
Figure 98: Exterior orientation example: Determining position and orientation from a plane using a camera and landmark
observations on the ground.
193
Figure 99: Bundle Adjustment (BA) is another problem class that relies on exterior orientation: we seek to find the orientation
of cameras using image location of landmarks. In the general case, we can have any number of K landmark points (“interesting”
points in the image) and N cameras that observe the landmarks.
Figure 100: Interior orientation seeks to find the transformation between a camera and a calibration object - a task often known
as camera calibration. This can be used, for instance, with Tsai’s calibration method (note that this method also relies on
exterior orientation).
24.4 Rotation
There are a myriad of representations for rotations - some of these representations include:
1. Axis and angle
2. Euler Angles
3. Orthonormal Matrices
4. Exponential cross product
5. Stereography plus bilinear complex map
6. Pauli Spin Matrices
7. Euler Parameters
8. Unit Quaternions
We would also like our representations to have the following properties:
194
• The ability to rotate vectors - or coordinate systems
24.4.3 Quaternions
In this section, we will discuss another way to represent rotations: quaternions.
195
24.4.5 Properties of 4-Vector Quaternions
These properties will be useful for representing vectors and operators such as rotation later:
oo oo
1. Not commutative: pq 6= q p
oo o o oo
2. Associative: (pq)r = p(q r)
oo o∗ o
3. Conjugate: (p, p)∗ = (p, −p) =⇒ (pq) = q p
4. Dot Product: (p, p) · (q, q) = pq + p + q
o o o
5. Norm: ||q||22 = q · q
o o∗
6. Conjugate Multiplication: q q :
o o∗
q q = (q, q)(q, −q)
= (q 2 + q · q, 0)
o o o
= (q · q)e
o ∆ o∗ o oo o
Where e = (1, 0), i.e. it is a quaternion with no vector component. Conversely, then, we have: q q = (q q)e.
o∗
o −1 q o
7. Multiplicative Inverse: q = o o (Except for q = (0, 0), which is problematic with other representations anyway.)
(q·q
Another note: For representing rotations, we will use unit quaternions. We can represent scalars and vectors with:
• Representing scalars: (s, 0)
• Representing vectors: (0, v)
196
Where the matrix Q̄T Q is given by:
o o
q·q 0 0 0
0 q02 + qx2 − qy2 − qz2 2(qx qy − q0 qz ) 2(qx qz + q0 qy )
Q̄T Q =
0 2(qy qx + q0 qz ) q02 − qx2 + qy2 − qz2 2(qy qz − q0 qx )
0 2(qz qx − q0 qy ) 2(qz qy + q0 qz ) q02 − qx2 − qy2 + qz2
This ability to compose rotations is quite advantageous relative to many of the other representations of rotations we have
seen before (orthonormal rotation matrices can achieve this as well).
24.5 3D Recognition
24.5.1 Extended Gaussian Image
The idea of the extended Gaussian Image: what do points on an object and points on a sphere have in common? They have the
same surface normals.
Figure 101: Mapping from object space the Gaussian sphere using correspondences between surface normal vectors. Recall that
this method can be used for tasks such as image recognition and image alignment.
δS δS dS
• Curvature: κ = δO = limδ→0 δO = dO
δO δO dO
• Density: G(η) = δS = limδ→0 δS = dS
197
24.5.2 EGI with Solids of Revolution
Are there geometric shapes that lend themselves well for an “intermediate representation” with EGI (not too simple, nor too
complex)? It turns out there are, and these are the solids of revolution. These include:
• Cylinders
• Spheres
• Cones
• Hyperboloids of one and two sheets
How do we compute the EGI of solids of revolution? We can use generators that produce these objects to help.
Figure 102: EGI representation of a generalized solid of revolution. Note that bands in the object domain correspond to bands
in the sphere domain.
As we can see from the figure, the bands “map” into each other! These solids of revolution are symmetric in both the object
and transform space. Let’s look at constructing infinitesimal areas so we can then compute Gaussian curvature κ and density G:
• Area of object band: δO = 2πrδs
• Area of sphere band: δS = 2π cos(η)δn
Then we can compute the curvature as:
δS 2π cos(η)δη cos(η)δη
κ= = =
δO 2πrδs rδs
1 δO δs
G= = = r sec(η)
κ δS δη
Then in the limit of δ → 0, our curvature and density become:
δS cos η dη dη
κ = lim = (Where is the rate of change of surface normal direction along the arc, i.e. curvature)
δ→OδO r ds ds
δO ds ds
G = lim = r sec η (Where is the rate of change of the arc length w.r.t. angle)
δ→0 δS dη dη
cos η
κ= κ
r
Recall that we covered this for the following
198
• Octahedra (8 faces)
As we did for the cube, we can do the same for polyhedra: to sample from the sphere, we can sample from the polyhedra,
and then project onto the point on the sphere that intersects the line from the origin to the sampled point on the polyhedra.
From this, we get great circles from the edges of these polyhedra on the sphere when we project.
Fun fact: Soccer balls have 32 faces! More related to geometry: soccer balls are part of a group of semi-regular solids,
specifically an icosadodecahedron.
24.6 References
1. Groups, https://en.wikipedia.org/wiki/Group (mathematics)
199
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms