"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation from Qualcomm

1
High-resolution 3D Reconstruction
on a Mobile Processor
Michael Mangan
Senior Product Manager
Qualcomm Technologies, Inc.
May 3, 2016

2
30
years of driving
the evolution
of wireless
#1
in 3G/4G
LTE modem
#1
in RF
Source: Qualcomm Incorporated data. Currently, Qualcomm semiconductors are products of Qualcomm Technologies, Inc. or its subsidiaries
IHS, Jan. ’16 (RF); Strategy Analytics, Dec. ’15 (modem, AP)

3
Qualcomm® Snapdragon™ Chipsets
drive new experiences
Context aware
computing
Machine learning
Computing
performance
VR / AR - beyond small screen
360 degree camera
3D and low-light photography
Security
Biometric sensor
Virtual SIM/Multiple devices
Ultra HD VoLTE / audio quality
4G+
Wi-FiSuperior converged connectivity
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
Gaming

4
What is Active Depth Capture?
Depth provides z-dimension to a scene; a photograph provides only x-y information.
Two ways to capture depth information from a scene or object:
Passive Depth Capture:
(No IR Transmitter)
• Stereo RGB cameras can passively
generate a depth map of a scene.
• Baseline separation between the cameras
causes parallax between the two received
images.
• Parallax can be used to infer a disparity
estimate, which in turn is used to
generate a depth map.
Active Depth Capture:
(IR Transmitter)
• IR laser transmits, various
techniques are used to infer depth
from the reflected laser.
» Time of Flight
» Active Stereo
» Structured Light

5
Depth from Structured Light—
Technology Overview
Depth information is generated
using a structured light sensor
• Coded pattern is projected onto the scene
using near infrared (NIR) light
• NIR camera receives the reflected,
distorted pattern
• Codes in the received image are matched
against known codes in the transmitted
pattern
• Depth at each code location estimated
from the disparity between original and
received code positions, leading to
a dense depth map
NIR image
Depth map
coded pattern
transmitter receiver

7
Scanner Block Diagram
Scan
Starts
Color + Depth
(Structure Light Depth
Based Generation)
Live 3D
Renderer/Viewer
USER MOVES USER STOPS
Scan
Finishes
USE CASE:
3D Printing, Social
Networking, Gaming
Avatars, etc.
Computer Vision Based
Initial Pose Estimation
Inertial Motion
Sensor Fusion
Bundle
Adjustment
HD Texture
Generation
3D Mesh
Generation
Color
Correction
TRACKING/ALIGNMENT

8
Scanner System Architecture
3D Scanner Application
RGBD Image Grabber
Camera 2 APIDepth JNI 3D Scanner JNI
Depth Engine
(DSP/HVX)
RGB
Grabber
NIR
Grabber
3D Scanner Engine
(CPU/GPU)
SysFS Camera HAL Camera HAL
Raw
RGB Data
Raw
NIR Data
Driver
Laser NIR Camera
RGB
Camera
Active Sensing Module
Note: Arrows indicate
dependency, not dataflow
Apps(Java)Middleware(C++)Drivers(C)Hardware

9
3DR Workload Summary—
Running on Snapdragon 820
3D Reconstruction requires running
several computational demanding
processes simultaneously:
1. Camera Pose Tracking
2. Sensor Fusion
3. Bundle Adjustment
4. Rendering
5. Mesh Generation
6. Texture Mapping
7. Structured Light Sensor Decoding
Thanks to the heterogeneous computational
framework of the Snapdragon 820, we are able
to do all of this at 15 FPS:
Cryo—CPU/Neon:
• Pose Tracking
• Bundle Adjustment
• Sensor Fusion
• Mesh Generation
Adreno—GPU:
• Rendering
• Texture Mapping
Hexagon—DSP/HVX:
• Depth from Structured
Light
3DR powered by
Snapdragon 820
Spectra ISP:
• RGB sensor processing
• Depth sensor interface

10
Highest quality 3DR requires
great HW & SW. Efficient CV
SW algorithms, operating with
accurate depth sensors, &
power efficient processors,
bring commercial grade 3DR
to mobile platforms.
Lessons Learned
Running 3DR on mobile
requires tuning algorithms for
power as well as performance.
Power efficient heterogeneous
processors are mandatory for
3DR to run within mobile power
and thermal envelopes.
The heterogeneous
processing cores on
Snapdragon 820, enable a
high-quality, 3DR experience
on mobile platforms.

12
Scanner Block Diagram
Scan
Starts
Color + Depth
(Structure Light Depth
Based Generation)
Live 3D
Renderer/Viewer
USER MOVES USER STOPS
Scan
Finishes
USE CASE:
3D Printing, Social
Networking, Gaming
Avatars, etc.
Initial Pose Estimation
Inertial Motion
Sensor Fusion
Bundle
Adjustment
HD Texture
Generation
3D Mesh
Generation
Color
Correction
TRACKING/ALIGNMENT

13
Based on the Iterative Closest Point (ICP) Concept, minimize the sum of pixel
intensity differences (errors) and the sum of depth errors to align Images
𝑐𝑜𝑠𝑡 = 𝑃𝑖𝑥𝑒𝑙 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝐸𝑟𝑟𝑜𝑟 2
+ 𝜆 𝑃𝑖𝑥𝑒𝑙 𝐷𝑒𝑝𝑡ℎ 𝐸𝑟𝑟𝑜𝑟 2
Pixel Intensity Error Depth Error
• F. Steinbruecker,et al., “Real-Time Visual Odometry from Dense RGB-D Images”, ICCV 2011
• C. Kerl et al., “Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras”, ICCV 2015
Pose Estimation (6-DOF)

14
Flow
Reference Image
Current Image
Warp
subtract
Repeat to
Minimize Error
– =
Warped Image Error Image

15
Example

16
The Vision Pose will likely contain some errors.
• One example is lack of geometrical and textural structures
This can be overcome by fusing the vision pose with the Inertial Motion Unit (IMU) of the tablet
Using The Extended Kalman Filter (EKF) concept, one can predict poses from the IMU.
These are then fused in the update step of EKF to obtain the fused pose estimate
Motion Sensor Fusion
• M. Li et al., “3-D motion estimation and online temporal calibration for camera-IMU systems”, ICRA 2013
• S. Weiss et al., “Real-Time Metric State Estimation for Modular Vision-Inertial Systems. in IEEE International Conference on Robotics and Automation ”, ICRA 2011
Extended
Kalman Filter
(Predict)
Vision Based
Pose
Estimation
Extended
Kalman Filter
(Update)
Gyro
Accelerometer

17
Fused Poses need to be refined in order
to reduce the visual errors.
• Reason: Poses are being computed locally,
“between consecutive frames”
We use bundle adjustment to find optimal
global or semi-global poses
• Construct links (red lines) between captured frames
(blue nodes). Links are established if the re-projection
between captured images is above a certain threshold
• Jointly optimize the connected nodes
Bundle Adjustment
• V. Indelman et al., “Incremental Light Bundle Adjustment for Robotics Navigation”, IROS 2013
• R. Newcombe et al., “KinectFusion: Real-Time Dense Surface Mapping and Tracking”, IEEE ISMAR 2011
• K. Konolige et al., “FrameSLAM: from Bundle Adjustment to Realtime Visual Mappping”. IEEE Transactions on Robotics 2008
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
-0.2
0
0.2
0.4
0.6
0.8
1
1.2

18
Having computed the 3D points, we need to generate the 3D surface mesh that best describes the
scene while reducing the noise
Many methods are available in the literature for surface reconstruction: Moving Least Squares
(MLS), TSDF & Poisson. Any can be used in theory. TSDF is the least computational demanding,
MLS and Poisson are more demanding
These are then followed by the marching cubes concept to generate the mesh
Surface Reconstruction / Mesh Generation
• S. Fleischmann et al., “Robust Moving Least-squares Fitting with Sharp Features”, ACM SIGGRAPH 2005
• M. Kazdan et al., “Poisson Surface Reconstruction”, Symposium on Geometry Processing 2006
• R. Newcombe et al., “KinectFusion: Real-Time Dense Surface Mapping and Tracking”, IEEE ISMAR 2011

19
Captured color images can suffer from casting due to many reasons like different lighting
sources. We need to correct that so that the overall color of the 3D model is in harmony
Solution: Estimate Color Casts & Remove them
• Gray points provide best estimate about color
• Estimate gray pixels & shift the appropriate channel gain to bring them to neutral gray
• Repeat until convergence
Color Correction
• J. Huo et al., ‘”Robust Automatic White Balance Algorithm Using Gray Color Points in Images”, IEEE Trans. Consumer Electronics, 2006
BEFORE
AFTER

20
The captured images need to be joined in one or more images called Texture Maps
Texture mapping can be thought of as “3D stitching of the images on the 3D model”
Obtaining the Texture Map consists in general of two steps:
• Determine where to put the pixels on a 3D model (texture coordinates)
• Determine what is the color of the pixel given a sequence of input images
Texture Mapping
• P. Debevec et al., “Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping”, Eurographics Rendering Workshop 1998
• M. Waechter et al., “Let There Be Color! Large-Scale Texturing of 3D Reconstructions”, ECCV 2015
Input Camera Images Output Texture Map Colored 3D Model
Using the Texture Map

22
Using our system we can scan
a small toy, human face/body
or an object
All of this can happen easily
on the Snapdragon 820, thanks
to its powerful heterogeneous
computational framework
Some Results

Thank you
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2016 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Why Wait is a trademark of Qualcomm
Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within
the Qualcomm corporate structure, as applicable.  Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent
portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s
engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.
23

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation from Qualcomm

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to "High-resolution 3D Reconstruction on a Mobile Processor," a Presentation from Qualcomm (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation from Qualcomm