0% found this document useful (0 votes)
23 views43 pages

Lecture1-CS294-2022

The document outlines the course CS 294-167 on Geometry and Learning for 3D Vision at UC Berkeley, detailing course information, grading policies, prerequisites, and main topics covered. It emphasizes the fundamental problem of reconstructing 3D structures from multiple images, along with various applications in autonomous vehicles, virtual reality, and digital arts. Additionally, it discusses the integration of geometric knowledge with data-driven learning approaches for enhanced 3D modeling and reconstruction.

Uploaded by

barryxu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views43 pages

Lecture1-CS294-2022

The document outlines the course CS 294-167 on Geometry and Learning for 3D Vision at UC Berkeley, detailing course information, grading policies, prerequisites, and main topics covered. It emphasizes the fundamental problem of reconstructing 3D structures from multiple images, along with various applications in autonomous vehicles, virtual reality, and digital arts. Additionally, it discusses the integration of geometric knowledge with data-driven learning approaches for enhanced 3D modeling and reconstruction.

Uploaded by

barryxu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

CS 294-167 Spring 2022

Geometry and Learning for 3D Vision

Yi Ma

UC Berkeley

MASKS © 2004 Invitation to 3D vision


Course Information

• Course piazza:
https://piazza.com/berkeley/spring2022/cs294167/
(information, homework, lecture notes, and resources…)

• Office hours:
Monday, Tuesday 2-3pm (together with EE106B)

• Grading policy:
10% participation; 20% homework; 70% final project

• Prerequisite:
EECS280 or equivalent in computer vision or image processing
Undergraduate linear algebra, some familiarity with ML tools.

MASKS © 2004 Invitation to 3D vision


Main Textbook (on piazza)

MASKS © 2004 Invitation to 3D vision


Supplementary Textbook

https://szeliski.org/Book/

MASKS © 2004 Invitation to 3D vision


Lecture 1
Overview and Introduction

MASKS © 2004 Invitation to 3D vision


Reconstruction from images – The Fundamental Problem

Input: Corresponding “features” in multiple perspective images.


Output: Camera poses, calibration, scene structure representations.
(3D point clouds, meshes, voxels, implicit surfaces, radiance fields…)

MASKS © 2004 Invitation to 3D vision


Reconstruction from images – The Fundamental Problem

Input: Corresponding “features” in multiple perspective images.


Output: Camera poses, calibration, scene structure representations.
(3D point clouds, meshes, voxels, implicit surfaces, radiance fields…)

Point Clouds Meshes Voxels

Implicit surfaces CAD like Models

MASKS © 2004 Invitation to 3D vision


Reconstruction from images – The Fundamental Problem

Geometric relationships among multiple views of points, lines, and planes.

. . .

Geometric and algorithmic foundation for multiple-view geometry.

MASKS © 2004 Invitation to 3D vision


Reconstruction from images – The Fundamental Problem

“Rome wasn’t built in a day.”

MASKS © 2004 Invitation to 3D vision


APPLICATIONS – Autonomous Highway Vehicles (1990-)

Image courtesy of California PATH


MASKS © 2004 Invitation to 3D vision
APPLICATIONS – Today Autonomous Vehicles

MASKS © 2004 Invitation to 3D vision


APPLICATIONS – Unmanned Aerial Vehicles (UAVs, 1998)

Rate: 10Hz; Accuracy: 5cm, 4o

MASKS © 2004 Invitation to 3D vision Courtesy of Berkeley Robotics Lab


APPLICATIONS – Today Unmanned Aerial Vehicles (UAVs)

MASKS © 2004 Invitation to 3D vision


APPLICATIONS – Real-Time Virtual Object Insertion

MASKS © 2004 Invitation to 3D vision UCLA Vision Lab


APPLICATIONS – Real-Time Sports Coverage

First-down line and virtual advertising

MASKS © 2004 Invitation to 3D vision Princeton Video Image, Inc.


Virtual Museum on Your Phone

Multi-camera
Light stage On iPhone VR kit

Shanghai Museum Items


APPLICATIONS – Image Based Modeling and Rendering

MASKS © 2004 Invitation to 3D vision Image courtesy of Paul Debevec, 1996


APPLICATIONS – Image Alignment, Mosaicing, and Morphing

MASKS © 2004 Invitation to 3D vision


GENERAL STEPS – Feature Selection and Correspondence

1. Small baselines versus large baselines


2. Point features versus line features

MASKS © 2004 Invitation to 3D vision


GENERAL STEPS – Structure and Motion Recovery

1. Two views versus multiple views


2. Discrete versus continuous motion
3. General versus planar scene
4. Calibrated versus uncalibrated camera
5. One motion versus multiple motions
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – Image Stratification and Dense Matching

Left

Right
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – 3-D Surface Model and Rendering

1. Point clouds versus surfaces (level sets)


2. Random shapes versus regular structures
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – Image-Based 3D Modeling

Building Rome in One Day

The Colosseum, 2,106 images

Steve Seitz, University of Washington, Richard Szeliski, Microsoft Research


Traditional 3D Reconstruction Pipeline

Feature Extraction & Multiview Point Clouds


Matching Geometry

Image Source: Internet


Limitation of Traditional 3D Reconstruction

Textureless Objects Reflection/Transparency Repetitive Patterns

Medium/Large baseline (SIFT Failure) Moving Objects

Image source: Internet


Deep Learning (Data-Driven) Approaches

Pose Estimation Voxels Point Clouds


Kehl, Wadim., et al. (2017) Song, S., et al. (2017) Charles Q., et al. (2017)

3D Bounding Cube Depth Map Regression Meshes Implicit Surfaces


Mousavian, A., et al. (2019) Li, Z., & Snavely, N. (2018) Groueix, T., et al. (2018) Weiyue, W., et al. (2019)
Challenges for Data-driven Approaches

n Recently research [1] suggests encoder-decoder


networks do not perform reconstruction but
classification
n CNN is not better than clever nearest
neighbors
n Cannot utilize geometry structures

Ground Truth AtlasNet OGN Matryoshka Clustering Retrieval Oracle NN

Maxim Tatarchenko, Stephan R. Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, Thomas Brox.
“What Do Single-view 3D Reconstruction Networks Learn?.” arXiv preprint arXiv:1905.03678 (2019).
We Live in a Highly Structured World

n Man-made environments are rich of structural regularities

n Straight lines
n Smooth curves
n Parallelism
n Orthogonality
n Symmetry

n How to detect & utilize them?

Image source: Internet


Symmetry based Modeling & Reconstruction

MASKS © 2004 Invitation to 3D vision


Symmetry based Modeling & Reconstruction

MASKS © 2004 Invitation to 3D vision


Symmetry based Modeling & Reconstruction

MASKS © 2004 Invitation to 3D vision


Regular Structure Based Modeling & Reconstruction

360o
panorama

TILT: Transform-Invariant Low-rank Textures, Z. Zhang, Y. Ma et. al, IJCV 2012


How to incorporate geometric knowledge into data-
driven learning approaches?

Multiple-View Reconstruction Recognition


Geometry:
o Points/junctions
o Lines
o Planes
o Incidence relations
o Symmetry
• Translation
• Reflection
• Rotation

[Ma, Soatto, Kosecka,


Sastry, 2004]
Combine Geometry and Learning (for Structures)

From Images to CAD Model

Multi-view Correspondence End-to-end Learning

Geometric Structure Data Representation

Learning with Structures, and for Structures, Yichao Zhou, UC Berkeley


Combine Geometry and Learning (for Structures)

Wireframes (junctions, lines, planes)

Learning to L-CNN:
Reconstruct 3D End-to-end
Wireframes from Wireframe
Single Images Parsing
(ICCV 2019) (ICCV 2019)

NeurVPS: NeRD: Neural 3D


Neural Vanishing Reflection Symmetry
Point Scanner via Detector
Conic Convolution (CVPR 2021)
(NeurIPS 2019)

Vanishing points (parallel, orthogonality) Symmetry (reflective, rotation, translation)


Holistic Scene Structures for 3D Vision

https://holistic-3d.github.io/iccv19/
From Images to 3D CAD Models

Holicity: 20 km^2 of downtown London

Yichao Zhou and Yi Ma et. al, UC Berkeley https://holicity.io


Evolution of Interface and Media

From 1D to 3D, and from physical to virtual (meta?)…

1D media 2D media 3D media

Quipu, Inca people


3rd millennium BCE
More Applications – 3D Object Digitization

With 3D vision, learning and light field technology at its


core, one can develop live virtual 3D digital technologies.

• Digital Human Reconstruction

• Live Holography

• 3D Reconstruction

• Interactive Videos

https://www.us1.dgene.com
More Applications – Digital Arts

On iPhone VR kit
Shanghai Museum Items

https://www.us1.dgene.com
More Applications – Virtual Shopping

https://www.us1.dgene.com
More Applications – Virtual Performance & Entertainment

https://www.us1.dgene.com
Reconstruction from images – The Fundamental Problem

“Rome wasn’t built in a day.”

But a digital Rome may be built in a day!

Let us start from the foundation...

You might also like