0% found this document useful (0 votes)

3 views69 pages

Non rigid structure from motion slides in cv

The lecture discusses Non-Rigid Structure from Motion (NRSfM), focusing on recovering 3D shapes and camera motion from monocular videos of non-rigid objects. It highlights the challenges of non-rigid motion compared to rigid motion, including the use of various priors and optimization techniques to solve the problem. Applications span multiple industries, such as film, sports, and robotics, emphasizing the importance of accurately modeling non-rigid shapes.

Uploaded by

mirensamaniego.ikas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views69 pages

Non rigid structure from motion slides in cv

Uploaded by

mirensamaniego.ikas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Lecture:

Non-Rigid Structure from Motion

-------------------------------------------------------------------------------------------------------------------------------

3D Vision
Universitat Pompeu Fabra
Discussion

Non-Rigid Shapes?

§ Can we obtain non-rigid 3D information from images?

Structure from Motion
(Rigid) Structure from Motion

Given: a monocular video (or a collection of pictures)

We want: simultaneously recovering the 3D shape and the camera
motion
Epipolar geometry can be used

The assumption of rigidity is enough to make the problem well-posed

What about non-rigid motion?
Our world is Non Rigid!

No external markers!
One or Many
Why is this important?

The world is non-rigid! Too many everyday applications in many

different domains

Movie industry, augmented reality Experimental industry

Sport industry: sailing Endoscopy

The movie industry
Even more details
er
e sp
m
0 fr a n d
0 o
1 2 sec
Animal Reconstruction
er
e sp
m
0 fr a n d
0 o
1 2 sec
…produces better robots?
Epipolar geometry can be used

The assumption of rigidity is enough to make the problem well-posed

Can Epipolar geometry be used?

Considering only one image, we obtain the same 3D constraint

Epipolar geometry can be used

After acquiring a new image, we obtain a similar constraint but now

triangulation is not available since the shape is non rigid
Epipolar geometry can be used

After acquiring a new image, we obtain a similar constraint but now

triangulation is not available since the shape is non rigid
Non-Rigid
Structure from Motion
Non-Rigid Structure from Motion
Given: a monocular video (or a collection of pictures)
We want: simultaneously recovering the 3D shape of a time-
varying object (4D estimation) and the camera motion
Some Results
Some Results
Some Results
Solving the problem

The problem can be solved by:

• Factorization: a closed-form solution can be achieved by using
SVD factorization, enforcing a specific rank (this can change as
a function of the type of camera model, or the type of scene). In
theory, it is hard to accurately enforce constraints
• Non-linear Optimization: the solution is achieved iteratively, the
computational cost can be bigger but additional priors can be
enforced accurately

In terms of processing, the problem can be solved:

• Offline: all the frames are processed at once, after video
capture
• Online: the frames are processed as the data arrive, frame by
frame. More real applications, but can become less accurate
Non-Rigid
Structure from Motion

Problem Statement
A Reminder of Camera Models

§ Perspective camera: All rays converge to the optical center

§ Orthographic camera: All rays are parallel. Z-coordinate is
irrelevant in the projection

Perspective camera Orthographic camera

3D-to-2D: Perspective Model

A p-th 3D point Xp=[Xp, Yp, Zp]T in homogeneous coordinates can be

related with its 2D projection xp=[xp, yp]T by means of a matrix Pi for
the i-th image, such as:

where Pi is 3x4 matrix as:

3D-to-2D: Orthographic Model

A p-th 3D point Xp=[Xp, Yp, Zp]T can be related with its 2D projection
xp=[xp, yp]T by means of a matrix Ri for the i-th image, such as:

where Ri is 2x3 matrix and ti is a 2x1 translation vector as:

In practice, we subtract the translations by assuming centered

observations (i.e., they are equivalent to the mean values of xp). For
later computations, we will approximate xp= xp- ti
Problem Statement
Orthographic camera

2xP

In
th th
e e
rig sa
id m
2xP 2x3 3xP 2xP

ca e
se
,i
ist
ge
3D ima
nt
re per
ffe
di ion
3IxP

A rat
u
ig
nf
co
2Ix3I
Full Linear Relation

2IxP
Orthographic camera
Measurement Matrix

Considering P non-rigid 3D points observed in I RGB images, we

can collect all observations to obtain a linear system such as:

3IxP 3Ix4I 4IxP 2IxP 2Ix3I 3IxP

Perspective camera Orthographic camera

where W is a 3IxP matrix, P is 3Ix4I, and X is 4IxP for the
perspective case (relation on the left); and W is a 2IxP matrix, R is
2Ix3I, and X is 3IxP for the perspective case (relation on the right)
What about the rank of W?

Considering P non-rigid 3D points observed in I RGB images, we

can collect all observations to obtain a linear system such as:

3IxP 3Ix4I 4IxP 2IxP 2Ix3I 3IxP

Perspective camera Orthographic camera

rank(W)≤ min(3I,P) rank(W)≤ min(2I,P)

A severely ill-posed problem
Orthographic camera

Th
is
is var
an iab
2IxP 2Ix3I 3IxP

ex les
pl
os
io
n
of
2ip entries << 6i variables + 3ip variables
A Toy Comparison
Let us assume a 1 minute video with just 100 tracked points, and
considering only the estimation of the 3D shape

Rigid Case Non-Rigid Case

Input data: Input data:
100 points x 60 sec x 30 Hz x 2 100 points x 60 sec x 30 Hz x 2
= 360,000 measurements = 360,000 measurements
Unknowns: Unknowns:
100 points x 3 100 points x 60 sec x 30 Hz x 3
= 300 unknowns = 540,000 unknowns

well-posed problem ill-posed problem

How can I solve the problem?

The art of priors

Including deformation priors is substantially more difficult than

using simple rigidity
Many possibilities were presented

A wide variety of priors in literature:

§ Physical priors. Particle dynamics, elasticity, finite elements,
and many others
§ Probabilistic priors. Low-rank models on shape, trajectory,
shape-trajectory or force domains. Union of subspaces,
Gaussian priors
§ Geometric priors: isometric, as rigid as possible, bone lengths,
quadratic models
§ Temporal priors: temporal-coherent deformations
§ Piecewise priors
§ Many others
Shape Linear Subspace
(a probabilistic prior)
A Low-Rank Shape Model

Basically, the non-rigid 3D shape can be obtained as a linear

combination of fixed shape vectors. For every combination of
weight coefficients, a different solution can be achieved:

Rotation Linear combination of Translation Your estimation

some shapes
Including the low-rank shape model

We approximate the 3D shape by a linear combination of K shape

vectors b (normally, K << P or I). For every k-th component, a
weight coefficient lk is needed. As the shape is non-rigid, by
modifying the coefficients for every i-th image, we will change the
3D shape as:

3IxP 3Ix3K 3KxP

Another type of expression for the i-th image:

Shape Basis Estimation
In non-rigid structure from motion, we have some alternatives to
estimate the shape basis:
§ The most natural is to learn it on the fly, using only the input data
§ The input data can also be used to estimate a shape basis from
a shape at rest (like a mean shape) by applying:
- Modal analysis based on physical models
- Spectral analysis based on a distance matrix
§ If training data are assumed, we learn it by means of a learning
approach (PCA, deep based, etc.). This approach is supervised
Non-Rigid
Structure from motion
by factorization
Including the low-rank shape model

Thanks to the relation between the 3D shape and the shape basis:
Orthographic camera

3IxP 3Ix3K 3KxP

we obtain the projection equation by using the low-rank shape
model as:

2IxP 2Ix3K 3KxP

Including the low-rank shape model
Orthographic camera

2IxP 2Ix3K 3KxP

What about the perspective case?

A similar analysis can be followed, but now, considering

homogeneous coordinates. We can obtain:

3IxP 3Ix3K+1 3K+1xP

3x3 3x1
Factorization

In both cases, the goal is to infer the motion factor (P or R) and the
3D coordinates X of the observed non-rigid object from 2D point
tracks in a monocular video W:
a
er
m
ca
ic
ph
ra
og

2IxP 2Ix3K 3KxP

th
Or

a
er
m
ca
ve
cti
pe
rs
Pe

3IxP 3Ix3K+1 3K+1xP

The full linear system

W=MB
Two factors: motion factor M (camera rotation and weight
coefficients) and shape one as a product of B and the coefficients
More on factorization Orthographic camera

Because M is a 2Ix3K matrix and B is a 3KxP matrix, the rank of W

is 3K. If we apply SVD to W, we will have only 3K non-zero
singular values

However, measurements are normally noisy, and in practice the

rank will not be 3K. We have to impose it

Applying SVD factorization, we have:

W ra
e nk
ne K
W= UAVT=[U !][ !VT]=[U !Q][Q-1 !VT]

ed a
to pri
i.e., M=U !Q and B=Q-1 !VT (the two factors we look for)

tu ori
ne
th
e
Many solutions can be achieved by modifying Q. Of course, for all
invertible 3Kx3K Q matrices
Metric Upgrade

How is Q computed?

Enforcing orthogonality constraints on the camera rotation. A

rotation matrix always has some properties (it is not a random
matrix), since lies in the SO(3) manifold

Be careful. Now, matrix M also includes the weight coefficients in

addition to the camera rotations!
But in many cases, we
cannot observe all the
points in all the images
==
Missing tracks
A toy example with missing tracks
Orthographic camera

…
l l
u

(
f

(
!"" !"%
o t
( #"
'"% '"% '"$ '"&
(
n
!"% !%% !$%
is = #%
t a!
!"$
d a
!%$ $
$ !&$ #$

u
!"&
t !%& !$& !&&
#&
'"& '&% '&$ '&&
p
In 8x4 8x12 12x4
Handling missing tracks
Two alternatives are possible:
§ Applying a matrix completion algorithm to infer the missing
entries, and then run factorization over the full measurement
matrix
§ No consider missing entries in the formulation by applying non-
linear optimization. Once the 3D model and camera pose are
computed, the 2D missing tracks can be inferred too

?
Non-Rigid
Structure from Motion
by Non-Linear Optimization
Problem Statement
For an orthographic camera, we have:

The problem (compacting over the points) can be formulated as:

and we perform non-linear optimization by minimizing a geometric

error cost function. Translation ti is optional
Bundle Adjustment
Normally, the Levenberg-Marquardt method is used to minimize the
problem. We need a Jacobian matrix J as the derivative of the
function with respect to the unknowns (R, B and the set of weight lk)

Again, there are many variants on how to proceed to reduce the

computational complexity of the problem:
§ Alternate minimization of motion and shape parameters
§ Sparse methods. The computation of J is complex, but it can be
approximated by considering a binary pattern

Initialization: The optimization can be initialized assuming a rigid

shape, i.e., using rigid factorization or non-linear optimization for a
rigid shape
Bundle Adjustment
The bundle adjustment method:
§ Minimize the cost function with Levenberg-Marquadt
§ Exploit the sparseness of the Jacobian function matrix to
decrease computation and memory requirements

The Levenberg-Marquadt algorithm does:

§ Mixture of Gauss-Newton and Gradient descent
§ Behaves like Gauss-Newton when close to the minimum
(quadratic region)
§ Gradient descent when the prediction is poor
§ Depends on a parameter θ that controls the mixture of Gauss-
Newton and Gradient descent as:

(JJT+θI) δp = -g Parameters we
want to estimate
Exercise
Let us assume a monocular video of 3 images, where 6 points are
observed. Considering the map is non-rigid and the visibility is full,
define the corresponding Jacobian matrix. A low-rank shape model
of rank 2 can be considered

Number of unknowns
Number of equations

se !
J= p
r
a rn
S tte
pa
Including priors
As in the rigid case, we can apply temporal smoothness priors, but
now, in both camera motion and shape deformation (be careful
when input data are a collection of pictures). To this end, we may
consider the expression:

where Li includes all K weight coefficients in the i-th image

How can we obtain a sequential solution?

We solve the optimization in a sequential manner, considering the

information as the data arrive. Future frames are not available. Two
options:
§ Pure sequential (frame by frame)
§ Sliding window (from 3 to 5 consecutive frames)

Initialization is performed by rigid estimation (assuming just the

initial frames). The problem is actually challenging
An Extension
Semantic 3D Reconstruction
3D Reconstruction of Categories

Unsupervised 3D Reconstruction and Grouping of Rigid and Non-Rigid Categories. Antonio Agudo. IEEE Transactions on
Pattern Analysis and Machine Intelligence (TPAMI), 44(1): 519-532, 2022.
Input Data as Training Data
Shape Basis as a MLP

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. Vikramjit Sidhu, Edgar Tretschk, Vladislav
Golyanik, Antonio Agudo, and Christian Theobalt. European Conference on Computer Vision, 2020.
Shape Basis as a MLP
Priors and models can be considered as a loss function in training. For
example, the next energy includes both data term and priors as:

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. Vikramjit Sidhu, Edgar Tretschk, Vladislav
Golyanik, Antonio Agudo, and Christian Theobalt. European Conference on Computer Vision, 2020.
Neural Radiance Fields in
the non-rigid context
Dynamic Neural Radiance Fields

4DPV: 4D Pet from Videos by Coarse-to-fine Non-Rigid Radiance Fields. Sergio M. de Paco and Antonio Agudo. Asian
Conference on Computer Vision, 2024.
Coarse-to-fine Shapes from Videos
Demo

4DPV: 4D Pet from Videos by Coarse-to-fine Non-Rigid Radiance Fields. Sergio M. de Paco and Antonio Agudo. Asian
Conference on Computer Vision, 2024.
Things to remember

3D and 4D information can be obtained from a sequence of images

For rigid objects, the problem is well-posed. For non-rigid ones, it is

inherently ill-posed (additional priors are necessary)

Model-based approaches can handle a wide variety of

deformations. They are normally universal and generic. No
supervision is needed

Data-based approaches require a lot of data to constrain the

solution space. Obtaining *good* data can become hard. Only for a
particular object or deformation (depending on the training data)

Future must be unsupervised (or self-supervised), and probably,

combining both model- and data-based approaches. With a hand-
held camera, performing the estimation of multiple scenarios
Acknowledgments

Thanks to Kris Kitani, Yaser Sheikh, Alessio del Bue, Lourdes

Agapito, Sergio M. de Paco

C Programming Tutorial
No ratings yet
C Programming Tutorial
73 pages
Learning Non-Rigid 3D Shape From 2D Motion: Ltorresa@cs - Stanford.edu Hertzman@dgp - Toronto.edu
No ratings yet
Learning Non-Rigid 3D Shape From 2D Motion: Ltorresa@cs - Stanford.edu Hertzman@dgp - Toronto.edu
8 pages
Structure From Motion: Class 9
No ratings yet
Structure From Motion: Class 9
47 pages
ASM, Image Search n Classification-2
No ratings yet
ASM, Image Search n Classification-2
4 pages
Determining The Epipolar Geometry and Its Uncertainty: A Review
No ratings yet
Determining The Epipolar Geometry and Its Uncertainty: A Review
35 pages
okTaskBasedControl
No ratings yet
okTaskBasedControl
85 pages
Class 09 Rigid Bodies PDF
No ratings yet
Class 09 Rigid Bodies PDF
44 pages
3D Action and Image - Ivc07
No ratings yet
3D Action and Image - Ivc07
14 pages
A Flexible New Technique For Camera Calibration
No ratings yet
A Flexible New Technique For Camera Calibration
40 pages
Uncalibrated Euclidean Reconstruction A Review
No ratings yet
Uncalibrated Euclidean Reconstruction A Review
9 pages
Euclidean Position Estimation of Features On An Object Using A Single Camera: A Lyapunov-Based Approach
No ratings yet
Euclidean Position Estimation of Features On An Object Using A Single Camera: A Lyapunov-Based Approach
19 pages
cv unit 4
No ratings yet
cv unit 4
30 pages
Lecture16 Stereo
No ratings yet
Lecture16 Stereo
41 pages
Lecture4-CS294-2022
No ratings yet
Lecture4-CS294-2022
36 pages
Lecture18
No ratings yet
Lecture18
56 pages
SFM Szeliski
No ratings yet
SFM Szeliski
18 pages
02 Position and Orientation
No ratings yet
02 Position and Orientation
50 pages
Calibration and Stereovision Final Kche
No ratings yet
Calibration and Stereovision Final Kche
14 pages
lecture20 calibration cont, stereo
No ratings yet
lecture20 calibration cont, stereo
41 pages
Ji Maximizing Rigidity Revisited ICCV 2017 Paper
No ratings yet
Ji Maximizing Rigidity Revisited ICCV 2017 Paper
9 pages
04 Multi-View Geometry
No ratings yet
04 Multi-View Geometry
54 pages
Lecture 03 Calibration
No ratings yet
Lecture 03 Calibration
44 pages
Jurnal Asing Acara 3 2
No ratings yet
Jurnal Asing Acara 3 2
9 pages
Core Concepts of SFM - Mastering OpenCV 4 - Third Edition
No ratings yet
Core Concepts of SFM - Mastering OpenCV 4 - Third Edition
20 pages
Lecture 1 2 Pose in 2d and 3d
No ratings yet
Lecture 1 2 Pose in 2d and 3d
48 pages
VSLAM Tutorial CVPR14 A13 BundleAdjustment Handout
No ratings yet
VSLAM Tutorial CVPR14 A13 BundleAdjustment Handout
7 pages
ECEn631 13 - 8 Point Algorithm
No ratings yet
ECEn631 13 - 8 Point Algorithm
33 pages
03 Face Detection
No ratings yet
03 Face Detection
7 pages
03 Image Formation 2
No ratings yet
03 Image Formation 2
64 pages
Lec 13
No ratings yet
Lec 13
7 pages
Lec 22
No ratings yet
Lec 22
12 pages
Lecture5-CS294-2022
No ratings yet
Lecture5-CS294-2022
49 pages
3-D Object Pose Determination Using Computer Vision
No ratings yet
3-D Object Pose Determination Using Computer Vision
4 pages
3D Reconstruction: Jeff Boody
No ratings yet
3D Reconstruction: Jeff Boody
32 pages
Visual Modeling With A Hand-Held Camera: Abstract
No ratings yet
Visual Modeling With A Hand-Held Camera: Abstract
26 pages
Lec 17
No ratings yet
Lec 17
10 pages
Lecture W4ab
No ratings yet
Lecture W4ab
64 pages
Image Feature Extraction
No ratings yet
Image Feature Extraction
7 pages
2005_06_vpcvpr_matchmoving (1)
No ratings yet
2005_06_vpcvpr_matchmoving (1)
2 pages
An Invitation To 3-D Vision From Images To Models
No ratings yet
An Invitation To 3-D Vision From Images To Models
339 pages
An Invitation To 3-D Vision PDF
No ratings yet
An Invitation To 3-D Vision PDF
338 pages
Computer Viruses
No ratings yet
Computer Viruses
58 pages
7. Image Transforms
No ratings yet
7. Image Transforms
48 pages
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
84 pages
Course Slam
No ratings yet
Course Slam
72 pages
02-camera_calibration
No ratings yet
02-camera_calibration
6 pages
EECS 106A Fa24 Homework 5 Vision
No ratings yet
EECS 106A Fa24 Homework 5 Vision
8 pages
Coordinate Changes
No ratings yet
Coordinate Changes
14 pages
Multiple Camera Calibration Using Robust Perspective Factorization
No ratings yet
Multiple Camera Calibration Using Robust Perspective Factorization
8 pages
LabLecture8 Inertial Odometery Using AR-Drone
No ratings yet
LabLecture8 Inertial Odometery Using AR-Drone
8 pages
sse2-18-fe-direct
No ratings yet
sse2-18-fe-direct
18 pages
Daniel Stereo
No ratings yet
Daniel Stereo
8 pages
Finding 3 D Positions From 2 D Images
No ratings yet
Finding 3 D Positions From 2 D Images
4 pages
Class04eth18 Annotated
No ratings yet
Class04eth18 Annotated
46 pages
Class Notes IIT Delhi
No ratings yet
Class Notes IIT Delhi
14 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Robot Manipulators: Modeling, Performance Analysis and Control
From Everand
Robot Manipulators: Modeling, Performance Analysis and Control
Etienne Dombre
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Pinhole Camera Model: Understanding Perspective through Computational Optics
From Everand
Pinhole Camera Model: Understanding Perspective through Computational Optics
Fouad Sabry
No ratings yet
Terex RH40E - DataSheet - English
No ratings yet
Terex RH40E - DataSheet - English
8 pages
Sample and Hold Circuit Based On 741 Opamp
No ratings yet
Sample and Hold Circuit Based On 741 Opamp
4 pages
Quantum Cloud Computing: A Review, Open Problems, and Future Directions
No ratings yet
Quantum Cloud Computing: A Review, Open Problems, and Future Directions
35 pages
Lab 03 2006
No ratings yet
Lab 03 2006
12 pages
B4
No ratings yet
B4
4 pages
Assignment-1 CC 03-134192-025 Muhammad Hamza
No ratings yet
Assignment-1 CC 03-134192-025 Muhammad Hamza
10 pages
Materilise Mimics 10.1 - Handbook
No ratings yet
Materilise Mimics 10.1 - Handbook
373 pages
Colour of Some Compound - 1
No ratings yet
Colour of Some Compound - 1
2 pages
MT2 Revision Class 6
No ratings yet
MT2 Revision Class 6
18 pages
DT Iir Filters
No ratings yet
DT Iir Filters
11 pages
Kerja Kursus Matematik Tambahan 2011
100% (3)
Kerja Kursus Matematik Tambahan 2011
26 pages
Introduction To Transaction Processing
No ratings yet
Introduction To Transaction Processing
20 pages
Ifm Pressure Sensors From Ifm Selection Guide 2016 (EN)
No ratings yet
Ifm Pressure Sensors From Ifm Selection Guide 2016 (EN)
21 pages
Experimental Technique
No ratings yet
Experimental Technique
27 pages
The How and Why of One Variable Calculus 1st Edition Amol Sasane download
No ratings yet
The How and Why of One Variable Calculus 1st Edition Amol Sasane download
64 pages
Applications of Fuzzy Sets
No ratings yet
Applications of Fuzzy Sets
21 pages
Worksheet_mysql
No ratings yet
Worksheet_mysql
4 pages
Tools
No ratings yet
Tools
2 pages
Principles and Applications of Well Logging-Hongqi Liu, 2017
100% (6)
Principles and Applications of Well Logging-Hongqi Liu, 2017
361 pages
ks3 Full Set Flash Cards
No ratings yet
ks3 Full Set Flash Cards
661 pages
Dantes Empyrean and The Eye of God
No ratings yet
Dantes Empyrean and The Eye of God
30 pages
CPC Usb/Arm7: CAN PC Interface
No ratings yet
CPC Usb/Arm7: CAN PC Interface
11 pages
Exercise 1 (V3B - List Comprehensions) : (1 + 2 + 2 5 Points)
No ratings yet
Exercise 1 (V3B - List Comprehensions) : (1 + 2 + 2 5 Points)
3 pages
At A Glance: Additional Database Objects
100% (1)
At A Glance: Additional Database Objects
7 pages
SDF Manual
No ratings yet
SDF Manual
13 pages
Lab 2 Its
No ratings yet
Lab 2 Its
13 pages
Introduction To Agres
No ratings yet
Introduction To Agres
24 pages
PUBG BG Brand Guidelines 2021
No ratings yet
PUBG BG Brand Guidelines 2021
59 pages
Machine Learning Approach For Predictive Maintenance in Hydroelectric Power Plants
No ratings yet
Machine Learning Approach For Predictive Maintenance in Hydroelectric Power Plants
6 pages

Non rigid structure from motion slides in cv

Uploaded by

Non rigid structure from motion slides in cv

Uploaded by

Lecture:

Non-Rigid Structure from Motion

§ Can we obtain non-rigid 3D information from images?

Given: a monocular video (or a collection of pictures)

The assumption of rigidity is enough to make the problem well-posed

The world is non-rigid! Too many everyday applications in many

Movie industry, augmented reality Experimental industry

Sport industry: sailing Endoscopy

The assumption of rigidity is enough to make the problem well-posed

Considering only one image, we obtain the same 3D constraint

After acquiring a new image, we obtain a similar constraint but now

After acquiring a new image, we obtain a similar constraint but now

The problem can be solved by:

In terms of processing, the problem can be solved:

§ Perspective camera: All rays converge to the optical center

Perspective camera Orthographic camera

A p-th 3D point Xp=[Xp, Yp, Zp]T in homogeneous coordinates can be

where Pi is 3x4 matrix as:

where Ri is 2x3 matrix and ti is a 2x1 translation vector as:

In practice, we subtract the translations by assuming centered

Considering P non-rigid 3D points observed in I RGB images, we

3IxP 3Ix4I 4IxP 2IxP 2Ix3I 3IxP

Perspective camera Orthographic camera

Considering P non-rigid 3D points observed in I RGB images, we

3IxP 3Ix4I 4IxP 2IxP 2Ix3I 3IxP

Perspective camera Orthographic camera

rank(W)≤ min(3I,P) rank(W)≤ min(2I,P)

Rigid Case Non-Rigid Case

well-posed problem ill-posed problem

The art of priors

Including deformation priors is substantially more difficult than

A wide variety of priors in literature:

Basically, the non-rigid 3D shape can be obtained as a linear

Rotation Linear combination of Translation Your estimation

We approximate the 3D shape by a linear combination of K shape

3IxP 3Ix3K 3KxP

Another type of expression for the i-th image:

3IxP 3Ix3K 3KxP

2IxP 2Ix3K 3KxP

2IxP 2Ix3K 3KxP

A similar analysis can be followed, but now, considering

3IxP 3Ix3K+1 3K+1xP

2IxP 2Ix3K 3KxP

3IxP 3Ix3K+1 3K+1xP

Because M is a 2Ix3K matrix and B is a 3KxP matrix, the rank of W

However, measurements are normally noisy, and in practice the

Applying SVD factorization, we have:

Enforcing orthogonality constraints on the camera rotation. A

Be careful. Now, matrix M also includes the weight coefficients in

The problem (compacting over the points) can be formulated as:

and we perform non-linear optimization by minimizing a geometric

Again, there are many variants on how to proceed to reduce the

Initialization: The optimization can be initialized assuming a rigid

The Levenberg-Marquadt algorithm does:

where Li includes all K weight coefficients in the i-th image

We solve the optimization in a sequential manner, considering the

Initialization is performed by rigid estimation (assuming just the

3D and 4D information can be obtained from a sequence of images

For rigid objects, the problem is well-posed. For non-rigid ones, it is

Model-based approaches can handle a wide variety of

Data-based approaches require a lot of data to constrain the

Future must be unsupervised (or self-supervised), and probably,

Thanks to Kris Kitani, Yaser Sheikh, Alessio del Bue, Lourdes

You might also like