Non rigid structure from motion slides in cv
Non rigid structure from motion slides in cv
3D Vision
Universitat Pompeu Fabra
Discussion
Non-Rigid Shapes?
No external markers!
One or Many
Why is this important?
Problem Statement
A Reminder of Camera Models
A p-th 3D point Xp=[Xp, Yp, Zp]T can be related with its 2D projection
xp=[xp, yp]T by means of a matrix Ri for the i-th image, such as:
2xP
In
th th
e e
rig sa
id m
2xP 2x3 3xP 2xP
ca e
se
,i
ist
ge
3D ima
nt
re per
ffe
di ion
3IxP
A rat
u
ig
nf
co
2Ix3I
Full Linear Relation
2IxP
Orthographic camera
Measurement Matrix
Th
is
is var
an iab
2IxP 2Ix3I 3IxP
ex les
pl
os
io
n
of
2ip entries << 6i variables + 3ip variables
A Toy Comparison
Let us assume a 1 minute video with just 100 tracked points, and
considering only the estimation of the 3D shape
Thanks to the relation between the 3D shape and the shape basis:
Orthographic camera
3x3 3x1
Factorization
In both cases, the goal is to infer the motion factor (P or R) and the
3D coordinates X of the observed non-rigid object from 2D point
tracks in a monocular video W:
a
er
m
ca
ic
ph
ra
og
a
er
m
ca
ve
cti
pe
rs
Pe
W=MB
Two factors: motion factor M (camera rotation and weight
coefficients) and shape one as a product of B and the coefficients
More on factorization Orthographic camera
W ra
e nk
ne K
W= UAVT=[U !][ !VT]=[U !Q][Q-1 !VT]
ed a
to pri
i.e., M=U !Q and B=Q-1 !VT (the two factors we look for)
tu ori
ne
th
e
Many solutions can be achieved by modifying Q. Of course, for all
invertible 3Kx3K Q matrices
Metric Upgrade
How is Q computed?
…
l l
u
(
f
(
!"" !"%
o t
( #"
'"% '"% '"$ '"&
(
n
!"% !%% !$%
is = #%
t a!
!"$
d a
!%$ $
$ !&$ #$
u
!"&
t !%& !$& !&&
#&
'"& '&% '&$ '&&
p
In 8x4 8x12 12x4
Handling missing tracks
Two alternatives are possible:
§ Applying a matrix completion algorithm to infer the missing
entries, and then run factorization over the full measurement
matrix
§ No consider missing entries in the formulation by applying non-
linear optimization. Once the 3D model and camera pose are
computed, the 2D missing tracks can be inferred too
?
Non-Rigid
Structure from Motion
by Non-Linear Optimization
Problem Statement
For an orthographic camera, we have:
(JJT+θI) δp = -g Parameters we
want to estimate
Exercise
Let us assume a monocular video of 3 images, where 6 points are
observed. Considering the map is non-rigid and the visibility is full,
define the corresponding Jacobian matrix. A low-rank shape model
of rank 2 can be considered
Number of unknowns
Number of equations
se !
J= p
r
a rn
S tte
pa
Including priors
As in the rigid case, we can apply temporal smoothness priors, but
now, in both camera motion and shape deformation (be careful
when input data are a collection of pictures). To this end, we may
consider the expression:
Unsupervised 3D Reconstruction and Grouping of Rigid and Non-Rigid Categories. Antonio Agudo. IEEE Transactions on
Pattern Analysis and Machine Intelligence (TPAMI), 44(1): 519-532, 2022.
Input Data as Training Data
Shape Basis as a MLP
Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. Vikramjit Sidhu, Edgar Tretschk, Vladislav
Golyanik, Antonio Agudo, and Christian Theobalt. European Conference on Computer Vision, 2020.
Shape Basis as a MLP
Priors and models can be considered as a loss function in training. For
example, the next energy includes both data term and priors as:
Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. Vikramjit Sidhu, Edgar Tretschk, Vladislav
Golyanik, Antonio Agudo, and Christian Theobalt. European Conference on Computer Vision, 2020.
Some Results
Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. Vikramjit Sidhu, Edgar Tretschk, Vladislav
Golyanik, Antonio Agudo, and Christian Theobalt. European Conference on Computer Vision, 2020.
Neural Radiance Fields in
the non-rigid context
Dynamic Neural Radiance Fields
4DPV: 4D Pet from Videos by Coarse-to-fine Non-Rigid Radiance Fields. Sergio M. de Paco and Antonio Agudo. Asian
Conference on Computer Vision, 2024.
Coarse-to-fine Shapes from Videos
Demo
4DPV: 4D Pet from Videos by Coarse-to-fine Non-Rigid Radiance Fields. Sergio M. de Paco and Antonio Agudo. Asian
Conference on Computer Vision, 2024.
Things to remember