SlideShare a Scribd company logo
PointNet
Implementation Initial ‘deep learning’ idea
.XYZ point cloud better than the
reconstructed .obj file for automatic
segmentation due to higher resolution
InputPointCloud
3D CAD MODEL
No need to have
planar surfaces
Sampled too densely
www.outsource3dcadmodeling.com
2DCAD MODEL
Straightforward from 3D to 2D
cadcrowd.com
RECONSTRUCT 3D
“Deep Learning”
3DSemantic Segmentation
frompointcloud / reconstructed mesh
youtube.com/watch?v=cGuoyNY54kU
arxiv.org/1608.04236
Primitive-based deep learning segmentation
The order between semantic segmentation and reconstruction could be swapped
NIPS 2016: 3D Workshop
very early still for point cloud pipelines compared to “ordered images”
Deep learning is proven to be a powerful tool to build
models for language (one-dimensional) and image
(two-dimensional) understanding. Tremendous efforts
have been devoted to these areas, however, it is still
at the early stage to apply deep learning to 3D data,
despite their great research values and broad real-
world applications. In particular, existing methods
poorly serve the three-dimensional data that drives
a broad range of critical applications such as
augmented reality, autonomous driving, graphics,
robotics, medical imaging, neuroscience, and
scientific simulations. These problems have drawn
the attention of researchers in different fields such as
neuroscience, computer vision, and graphics.
The goal of this workshop is to foster interdisciplinary
communication of researchers working on 3D data
(Computer Vision and Computer Graphics) so that
more attention of broader community can be drawn
to 3D deep learning problems. Through those
studies, new ideas and discoveries are expected to
emerge, which can inspire advances in related fields.
This workshop is composed of invited talks, oral
presentations of outstanding submissions and a
poster session to showcase the state-of-the-art
results on the topic. In particular, a panel discussion
among leading researchers in the field is planned, so
as to provide a common playground for inspiring
discussions and stimulating debates.
The workshop will be held on Dec 9 at NIPS 2016 in
Barcelona, Spain. http://3ddl.cs.princeton.edu/2016/
ORGANIZERS
●
Fisher Yu - Princeton University
●
Joseph Lim - Stanford University
●
Matthew Fisher - Stanford University
●
Qixing Huang - University of Texas at Austin
●
Jianxiong Xiao - AutoX Inc.
http://cvpr2017.thecvf.com/ In Honolulu, Hawaii
“I am co-organizing the
2nd Workshop on Visual
Understanding for
Interaction in conjunction
with CVPR 2017. Stay
tuned for the details!”
“Our workshop on Large-
Scale Scene Under-
standing Challenge is
accepted by CVPR 2017.
http://3ddl.cs.princeton.edu/2016/slides/su.pdf
PointNet Deep learning for point cloud classification and segmentation
https://github.com/charlesq34/pointnethttps://arxiv.org/abs/1612.00593
Applications of PointNet. We propose a novel deep net
architecture that consumes raw unordered point cloud (set of
points) without voxelization or rendering.
It is a unified architecture that learns both global and local
point features, providing a simple, efficient and effective
approach for a number of 3D recognition tasks
PointNet Architecture
Our network has three key modules:
1) the max pooling layer as a symmetric function to aggregate information from all the points,
2) a local and global information combination structure,
3) and two joint alignment networks that align both input points and point features.
PointNet symmetry function #1: Multi-layer Perceptron
http://iamaaditya.github.io/2016/03/one-by-one-convolution/
https://github.com/charlesq34/pointnet/blob/master/models/pointnet_cls_basic.py
MLP implented
as 1x1 2D convolution
PointNet symmetry function #2: Max Pooling
https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features
Jean Da Rolt, PhD, Computer Engineer, Professor: “After some thought, I do not believe that pooling
operations are responsible for the translation invariant property in CNNs. I believe that invariance (at least to
translation) is due to the convolution filters (not specifically the pooling) and due to the fully-connected layer. In
conclusion, what makes a CNN invariant to object translation is the architecture of the neural network: the
convolution filters and the fully-connected layer.”
Artem Rozantsev, PhD Computer Vision & Machine Learning: “In addition to the previous answers,
standard ConvNets are invariant only to transformationas that are present in the training data. However, there are
works, which made a step towards training networks that are inherently invariant to transformations such as
rotation and translation, for example”
https://arxiv.org/abs/1703.00356,
https://arxiv.org/abs/1612.04642
https://arxiv.org/abs/1512.07108
University College London
Ecole Polytechnique Fedérale de Lausanne (EPFL),
Lausanne, Switzerland
Key to our approach is the use of a single
symmetric function, max pooling. E
ffectively the network learns a set of
optimization functions/criteria that select
interesting or informative points of the point
cloud and encode the reason for their selection.
The final fully connected layers of the network
aggregate these learnt optimal values into the
global descriptor for the entire shape as
mentioned above (shape classification) or are
used to predict per point labels (shape
segmentation
PointNet Combination Structure
(pg. 3)
" Therefore, the model needs to be able to capture local structures from nearby points,
and the combinatorial interactions among local structures"
(pg. 4)
" After computing the global point cloud feature vector, we feed it back to per point
features by concatenating the global feature with each of the point features. Then we
extract new per point features based on the combined point features - this time the per
point feature is aware of both the local and global information"
(pg. 8)
"As discussed in Sec 4.2 (pg. 4), our network computes K (we take K = 1024 in this
experiment) dimension point features for each point and aggregates all the *per-point
local features* via a max pooling layer into a single K-dim vector, which forms the global
shape descriptor."
(pg. 13)
"Normal Estimation In segmentation version of PointNet, local point features and global
feature are concatenated in order to provide context to local points. However, it’s unclear
whether the context is learnt through this concatenation. In this experiment, we
validate our design by showing that our segmentation network can be trained to predict
point normals, a local geometric property that is determined by a point’s neighborhood"
PointNet Alignment Network
PointNet: (pg. 1)
"Thus we can add a data-dependent
spatial transformer network that
attempts to canonicalize the data before
the PointNet processes them, so as to
further improve the results."
PointNet: (pg. 4)
However, transformation matrix in the
feature space has much higher dimension
than the spatial transform matrix (e.g.
from 3 × 3 to 64 × 64), which greatly
increase the difficulty of optimization. We
therefore add a regularization term to
our softmax training loss. We constraint
the feature transformation matrix to be
close to orthogonal matrix.
We find that by adding the regularization
term, the optimization becomes more
stable and our model achieves better
performance.
In Fig 15 we see that performance grows as we increase the
number of points however it saturates at around 1K points.
The max layer size plays an important role, increasing the layer
size from 64 to 1024 results in a 2−4% performance gain. It
indicates that we need enough point feature functions to cover
the 3D space in order to discriminate different shapes.
PointNet Modifications input data,increase dimensionality?
PointNet: (pg. 1)
"In the basic setting each point is represented by
just its three coordinates (x, y, z). Additional
dimensions may be added by computing normals
and other local or global features."
Data columns: x, y, z, red, green, blue, no normals
Pointclouds canbe huge
https://www.we-get-around.com/wegetaround-
atlanta-our-blog/2015/10/cubicasa-creates-
2d-and-3d-floor-plans-for-matterport-photo
graphers-from-3d-showcase-tours
6-dimensional inputdata
With the x,y,z coordinates one
obtains also R,G,B values (or CIE LAB
colorspace) that are very useful in
segmenting objects.
7-dimensional inputdata
Normals could be obtained too if the
camera position were known
Eurographics Symposium on Geometry Processing 2016, Volume 35
(2016), Number 5 http://dx.doi.org/10.1111/cgf.12983
PointNet: (pg. 13)
PointNet Modifications Architecture #1: Uncertainty estimation?
https://arxiv.org/pdf/1703.04977.pdf
http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html
[in classification
pipeline only] not in
segmentation part
PointNet Modifications Architecture #2: component variations?
Nonlinearity Pooling Layer Normalization
In order to make a model invariant to input
permutation, the authors use max pooling
as the simple symmetric function to
aggregate the information from each point.
[in classification[ All layers, except the last
one, include ReLU and batch normalization.
[in classification[ All layers, except the last
one, include ReLU and batch normalization.
http://arxiv.org/abs/1604.04112
“One possible future line of work is to embed the network in its
entirety in the frequency domain. In models that employ Fourier
transforms to compute convolutions, at every convolutional layer
the input is FFT-ed and the elementwise multiplication output is
then inverse FFT-ed. These back-andforth transformations are very
computationally intensive, and as such it would be desirable to
strictly remain in the frequency domain. However, the reason for
these repeated transformations is the application of nonlinearities
in the forward domain: if one were to propose a sensible
nonlinearity in the frequency domain, this would spare us from
the incessant domain switching.”
Our reparameterization is inspired by batch
normalization but does not introduce any
dependencies between the examples in a
minibatch. This means that our method can also
be applied successfully to recurrent models such
as LSTMs and to noise-sensitive applications
such as deep reinforcement learning or
generative models, for which batch
normalization is less well suited.
https://arxiv.org/abs/1602.07868
https://arxiv.org/abs/1605.09332
http://arxiv.org/abs/1512.07108
PointNet Modifications Architecture #3: Unsupervised/Semi-supervised extensions?

More Related Content

What's hot (20)

PDF
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
Edge AI and Vision Alliance
 
PDF
Time to Talk about Data Mesh
LibbySchulze
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
PPTX
Deep learning-for-pose-estimation-wyang-defense
Wei Yang
 
PPTX
Introduction to Capsule Networks (CapsNets)
Aurélien Géron
 
PPTX
Active contour segmentation
Nishant Jain
 
PPTX
Graph databases
Vinoth Kannan
 
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
PPTX
Optimization/Gradient Descent
kandelin
 
PPT
Ellipses drawing algo.
Mohd Arif
 
PPTX
Human Pose Estimation by Deep Learning
Wei Yang
 
PPTX
Hadoop Architecture
Dr. C.V. Suresh Babu
 
PDF
EMOTION DETECTION USING AI
Aantariksh Developers
 
PDF
Depth Fusion from RGB and Depth Sensors by Deep Learning
Yu Huang
 
PPTX
Spark
Heena Madan
 
PPT
Image pre processing - local processing
Ashish Kumar
 
PPTX
Decision Tree - C4.5&CART
Xueping Peng
 
PPTX
Azure Data Storage
Ken Cenerelli
 
PDF
Emerging 3D Scanning Technologies for PropTech
PetteriTeikariPhD
 
PDF
Data mining
R A Akerkar
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
Edge AI and Vision Alliance
 
Time to Talk about Data Mesh
LibbySchulze
 
Data Mining: clustering and analysis
DataminingTools Inc
 
Deep learning-for-pose-estimation-wyang-defense
Wei Yang
 
Introduction to Capsule Networks (CapsNets)
Aurélien Géron
 
Active contour segmentation
Nishant Jain
 
Graph databases
Vinoth Kannan
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Optimization/Gradient Descent
kandelin
 
Ellipses drawing algo.
Mohd Arif
 
Human Pose Estimation by Deep Learning
Wei Yang
 
Hadoop Architecture
Dr. C.V. Suresh Babu
 
EMOTION DETECTION USING AI
Aantariksh Developers
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Yu Huang
 
Image pre processing - local processing
Ashish Kumar
 
Decision Tree - C4.5&CART
Xueping Peng
 
Azure Data Storage
Ken Cenerelli
 
Emerging 3D Scanning Technologies for PropTech
PetteriTeikariPhD
 
Data mining
R A Akerkar
 

Similar to PointNet (20)

PDF
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Putra Wanda
 
PDF
Portfolio
Ivan Khomyakov
 
PDF
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
PDF
Laplacian-regularized Graph Bandits
lauratoni4
 
PDF
Learning Graph Representation for Data-Efficiency RL
lauratoni4
 
PDF
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
Seiya Ito
 
PDF
Development Infographic
RealMassive
 
PDF
Garbage Classification Using Deep Learning Techniques
IRJET Journal
 
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
UMBC
 
PDF
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
PDF
MODEL_FOR_SEMANTICALLY_RICH_POINT_CLOUD.pdf
Université Mohamed SeddiK Ben yahia-JIJEL
 
PDF
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET Journal
 
PDF
paper
Vincent Kee
 
PDF
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
PDF
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
PDF
Density Based Clustering Approach for Solving the Software Component Restruct...
IRJET Journal
 
PDF
Geometric Deep Learning
PetteriTeikariPhD
 
PDF
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
miyurud
 
PDF
IEEE Big data 2016 Title and Abstract
tsysglobalsolutions
 
PDF
kanimozhi2019.pdf
AshrafDabbas1
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Putra Wanda
 
Portfolio
Ivan Khomyakov
 
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
Laplacian-regularized Graph Bandits
lauratoni4
 
Learning Graph Representation for Data-Efficiency RL
lauratoni4
 
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
Seiya Ito
 
Development Infographic
RealMassive
 
Garbage Classification Using Deep Learning Techniques
IRJET Journal
 
Semantic segmentation with Convolutional Neural Network Approaches
UMBC
 
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
MODEL_FOR_SEMANTICALLY_RICH_POINT_CLOUD.pdf
Université Mohamed SeddiK Ben yahia-JIJEL
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET Journal
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
Density Based Clustering Approach for Solving the Software Component Restruct...
IRJET Journal
 
Geometric Deep Learning
PetteriTeikariPhD
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
miyurud
 
IEEE Big data 2016 Title and Abstract
tsysglobalsolutions
 
kanimozhi2019.pdf
AshrafDabbas1
 
Ad

More from PetteriTeikariPhD (20)

PDF
ML and Signal Processing for Lung Sounds
PetteriTeikariPhD
 
PDF
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
PetteriTeikariPhD
 
PDF
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
PetteriTeikariPhD
 
PDF
Wearable Continuous Acoustic Lung Sensing
PetteriTeikariPhD
 
PDF
Precision Medicine for personalized treatment of asthma
PetteriTeikariPhD
 
PDF
Two-Photon Microscopy Vasculature Segmentation
PetteriTeikariPhD
 
PDF
Skin temperature as a proxy for core body temperature (CBT) and circadian phase
PetteriTeikariPhD
 
PDF
Summary of "Precision strength training: The future of strength training with...
PetteriTeikariPhD
 
PDF
Precision strength training: The future of strength training with data-driven...
PetteriTeikariPhD
 
PDF
Intracerebral Hemorrhage (ICH): Understanding the CT imaging features
PetteriTeikariPhD
 
PDF
Hand Pose Tracking for Clinical Applications
PetteriTeikariPhD
 
PDF
Precision Physiotherapy & Sports Training: Part 1
PetteriTeikariPhD
 
PDF
Multimodal RGB-D+RF-based sensing for human movement analysis
PetteriTeikariPhD
 
PDF
Creativity as Science: What designers can learn from science and technology
PetteriTeikariPhD
 
PDF
Light Treatment Glasses
PetteriTeikariPhD
 
PDF
Deep Learning for Biomedical Unstructured Time Series
PetteriTeikariPhD
 
PDF
Hyperspectral Retinal Imaging
PetteriTeikariPhD
 
PDF
Instrumentation for in vivo intravital microscopy
PetteriTeikariPhD
 
PDF
Future of Retinal Diagnostics
PetteriTeikariPhD
 
PDF
OCT Monte Carlo & Deep Learning
PetteriTeikariPhD
 
ML and Signal Processing for Lung Sounds
PetteriTeikariPhD
 
Next Gen Ophthalmic Imaging for Neurodegenerative Diseases and Oculomics
PetteriTeikariPhD
 
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
PetteriTeikariPhD
 
Wearable Continuous Acoustic Lung Sensing
PetteriTeikariPhD
 
Precision Medicine for personalized treatment of asthma
PetteriTeikariPhD
 
Two-Photon Microscopy Vasculature Segmentation
PetteriTeikariPhD
 
Skin temperature as a proxy for core body temperature (CBT) and circadian phase
PetteriTeikariPhD
 
Summary of "Precision strength training: The future of strength training with...
PetteriTeikariPhD
 
Precision strength training: The future of strength training with data-driven...
PetteriTeikariPhD
 
Intracerebral Hemorrhage (ICH): Understanding the CT imaging features
PetteriTeikariPhD
 
Hand Pose Tracking for Clinical Applications
PetteriTeikariPhD
 
Precision Physiotherapy & Sports Training: Part 1
PetteriTeikariPhD
 
Multimodal RGB-D+RF-based sensing for human movement analysis
PetteriTeikariPhD
 
Creativity as Science: What designers can learn from science and technology
PetteriTeikariPhD
 
Light Treatment Glasses
PetteriTeikariPhD
 
Deep Learning for Biomedical Unstructured Time Series
PetteriTeikariPhD
 
Hyperspectral Retinal Imaging
PetteriTeikariPhD
 
Instrumentation for in vivo intravital microscopy
PetteriTeikariPhD
 
Future of Retinal Diagnostics
PetteriTeikariPhD
 
OCT Monte Carlo & Deep Learning
PetteriTeikariPhD
 
Ad

Recently uploaded (20)

PDF
241108_Assagaon Mds fsadssd sterplan.pdf
ishamenezes2
 
PDF
Brian McDaniel’s Blueprint for Personalized Property Success
Brian McDaniel
 
PDF
Experience Affordable Luxury Living at Spacia Madhyamgram – North Kolkata’s P...
spaciamadhyamgram
 
PDF
Find Your Ideal Workspace in Noida with SVAM Work
svamwork
 
PDF
Partnering Locally, Delivering Globally with Precision
Newman Leech
 
PDF
Vajram Vivera - Opulent 3 & 4 BHK Residences on Kogilu Main Road, Bangalore w...
JagadishKR1
 
PPTX
What Makes Brick & Bolt Different from Local Contractors .pptx
BrickAndBolt
 
PDF
Top 20 Curated List of Luxury Penthouses in Phnom Penh KH
Hoem Seiha
 
PDF
Promenade Peak Condo at Zion Road, Singapore.pdf
Lance Kuan
 
PDF
FHA Home Inspection with eAuditor Audits & Inspections
eAuditor Audits & Inspections
 
PDF
Top Reasons to Enroll in Multifamily Investing Training Today
multifamilystrategyu
 
PDF
Boosting Real Estate Portfolio Performance.pdf
Leni Co
 
PPTX
Hamad Al Wazzan Redefining the American Real Estate Landscape.pptx
Hamad Al Wazzan
 
PDF
DLF West Park Mumbai - Elegant Living in Andheri West
DLF The Dahlias
 
PDF
AKZIRVE TOPKAPI29 Dijital Catalog - Listing Turkey
Listing Turkey
 
PDF
Luxury Short Stays in Prime San Diego Homes
xlncsdstay
 
PDF
Against All Odds The DJ Thielen Story .
DJ Thielen
 
PDF
Alberta Inc. Fueling Growth in Canada's Economic Core.pdf
cashcars4info
 
PDF
Springleaf Residence, a new condominium Development in Singapore.pdf
Lance Kuan
 
PDF
PVC Wall Coverings for Dog Kennels – Durable, Low Maintenance & Easy to Clean
Duramax PVC Wall Panels
 
241108_Assagaon Mds fsadssd sterplan.pdf
ishamenezes2
 
Brian McDaniel’s Blueprint for Personalized Property Success
Brian McDaniel
 
Experience Affordable Luxury Living at Spacia Madhyamgram – North Kolkata’s P...
spaciamadhyamgram
 
Find Your Ideal Workspace in Noida with SVAM Work
svamwork
 
Partnering Locally, Delivering Globally with Precision
Newman Leech
 
Vajram Vivera - Opulent 3 & 4 BHK Residences on Kogilu Main Road, Bangalore w...
JagadishKR1
 
What Makes Brick & Bolt Different from Local Contractors .pptx
BrickAndBolt
 
Top 20 Curated List of Luxury Penthouses in Phnom Penh KH
Hoem Seiha
 
Promenade Peak Condo at Zion Road, Singapore.pdf
Lance Kuan
 
FHA Home Inspection with eAuditor Audits & Inspections
eAuditor Audits & Inspections
 
Top Reasons to Enroll in Multifamily Investing Training Today
multifamilystrategyu
 
Boosting Real Estate Portfolio Performance.pdf
Leni Co
 
Hamad Al Wazzan Redefining the American Real Estate Landscape.pptx
Hamad Al Wazzan
 
DLF West Park Mumbai - Elegant Living in Andheri West
DLF The Dahlias
 
AKZIRVE TOPKAPI29 Dijital Catalog - Listing Turkey
Listing Turkey
 
Luxury Short Stays in Prime San Diego Homes
xlncsdstay
 
Against All Odds The DJ Thielen Story .
DJ Thielen
 
Alberta Inc. Fueling Growth in Canada's Economic Core.pdf
cashcars4info
 
Springleaf Residence, a new condominium Development in Singapore.pdf
Lance Kuan
 
PVC Wall Coverings for Dog Kennels – Durable, Low Maintenance & Easy to Clean
Duramax PVC Wall Panels
 

PointNet

  • 2. Implementation Initial ‘deep learning’ idea .XYZ point cloud better than the reconstructed .obj file for automatic segmentation due to higher resolution InputPointCloud 3D CAD MODEL No need to have planar surfaces Sampled too densely www.outsource3dcadmodeling.com 2DCAD MODEL Straightforward from 3D to 2D cadcrowd.com RECONSTRUCT 3D “Deep Learning” 3DSemantic Segmentation frompointcloud / reconstructed mesh youtube.com/watch?v=cGuoyNY54kU arxiv.org/1608.04236 Primitive-based deep learning segmentation The order between semantic segmentation and reconstruction could be swapped
  • 3. NIPS 2016: 3D Workshop very early still for point cloud pipelines compared to “ordered images” Deep learning is proven to be a powerful tool to build models for language (one-dimensional) and image (two-dimensional) understanding. Tremendous efforts have been devoted to these areas, however, it is still at the early stage to apply deep learning to 3D data, despite their great research values and broad real- world applications. In particular, existing methods poorly serve the three-dimensional data that drives a broad range of critical applications such as augmented reality, autonomous driving, graphics, robotics, medical imaging, neuroscience, and scientific simulations. These problems have drawn the attention of researchers in different fields such as neuroscience, computer vision, and graphics. The goal of this workshop is to foster interdisciplinary communication of researchers working on 3D data (Computer Vision and Computer Graphics) so that more attention of broader community can be drawn to 3D deep learning problems. Through those studies, new ideas and discoveries are expected to emerge, which can inspire advances in related fields. This workshop is composed of invited talks, oral presentations of outstanding submissions and a poster session to showcase the state-of-the-art results on the topic. In particular, a panel discussion among leading researchers in the field is planned, so as to provide a common playground for inspiring discussions and stimulating debates. The workshop will be held on Dec 9 at NIPS 2016 in Barcelona, Spain. http://3ddl.cs.princeton.edu/2016/ ORGANIZERS ● Fisher Yu - Princeton University ● Joseph Lim - Stanford University ● Matthew Fisher - Stanford University ● Qixing Huang - University of Texas at Austin ● Jianxiong Xiao - AutoX Inc. http://cvpr2017.thecvf.com/ In Honolulu, Hawaii “I am co-organizing the 2nd Workshop on Visual Understanding for Interaction in conjunction with CVPR 2017. Stay tuned for the details!” “Our workshop on Large- Scale Scene Under- standing Challenge is accepted by CVPR 2017. http://3ddl.cs.princeton.edu/2016/slides/su.pdf
  • 4. PointNet Deep learning for point cloud classification and segmentation https://github.com/charlesq34/pointnethttps://arxiv.org/abs/1612.00593 Applications of PointNet. We propose a novel deep net architecture that consumes raw unordered point cloud (set of points) without voxelization or rendering. It is a unified architecture that learns both global and local point features, providing a simple, efficient and effective approach for a number of 3D recognition tasks
  • 5. PointNet Architecture Our network has three key modules: 1) the max pooling layer as a symmetric function to aggregate information from all the points, 2) a local and global information combination structure, 3) and two joint alignment networks that align both input points and point features.
  • 6. PointNet symmetry function #1: Multi-layer Perceptron http://iamaaditya.github.io/2016/03/one-by-one-convolution/ https://github.com/charlesq34/pointnet/blob/master/models/pointnet_cls_basic.py MLP implented as 1x1 2D convolution
  • 7. PointNet symmetry function #2: Max Pooling https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features Jean Da Rolt, PhD, Computer Engineer, Professor: “After some thought, I do not believe that pooling operations are responsible for the translation invariant property in CNNs. I believe that invariance (at least to translation) is due to the convolution filters (not specifically the pooling) and due to the fully-connected layer. In conclusion, what makes a CNN invariant to object translation is the architecture of the neural network: the convolution filters and the fully-connected layer.” Artem Rozantsev, PhD Computer Vision & Machine Learning: “In addition to the previous answers, standard ConvNets are invariant only to transformationas that are present in the training data. However, there are works, which made a step towards training networks that are inherently invariant to transformations such as rotation and translation, for example” https://arxiv.org/abs/1703.00356, https://arxiv.org/abs/1612.04642 https://arxiv.org/abs/1512.07108 University College London Ecole Polytechnique Fedérale de Lausanne (EPFL), Lausanne, Switzerland Key to our approach is the use of a single symmetric function, max pooling. E ffectively the network learns a set of optimization functions/criteria that select interesting or informative points of the point cloud and encode the reason for their selection. The final fully connected layers of the network aggregate these learnt optimal values into the global descriptor for the entire shape as mentioned above (shape classification) or are used to predict per point labels (shape segmentation
  • 8. PointNet Combination Structure (pg. 3) " Therefore, the model needs to be able to capture local structures from nearby points, and the combinatorial interactions among local structures" (pg. 4) " After computing the global point cloud feature vector, we feed it back to per point features by concatenating the global feature with each of the point features. Then we extract new per point features based on the combined point features - this time the per point feature is aware of both the local and global information" (pg. 8) "As discussed in Sec 4.2 (pg. 4), our network computes K (we take K = 1024 in this experiment) dimension point features for each point and aggregates all the *per-point local features* via a max pooling layer into a single K-dim vector, which forms the global shape descriptor." (pg. 13) "Normal Estimation In segmentation version of PointNet, local point features and global feature are concatenated in order to provide context to local points. However, it’s unclear whether the context is learnt through this concatenation. In this experiment, we validate our design by showing that our segmentation network can be trained to predict point normals, a local geometric property that is determined by a point’s neighborhood"
  • 9. PointNet Alignment Network PointNet: (pg. 1) "Thus we can add a data-dependent spatial transformer network that attempts to canonicalize the data before the PointNet processes them, so as to further improve the results." PointNet: (pg. 4) However, transformation matrix in the feature space has much higher dimension than the spatial transform matrix (e.g. from 3 × 3 to 64 × 64), which greatly increase the difficulty of optimization. We therefore add a regularization term to our softmax training loss. We constraint the feature transformation matrix to be close to orthogonal matrix. We find that by adding the regularization term, the optimization becomes more stable and our model achieves better performance. In Fig 15 we see that performance grows as we increase the number of points however it saturates at around 1K points. The max layer size plays an important role, increasing the layer size from 64 to 1024 results in a 2−4% performance gain. It indicates that we need enough point feature functions to cover the 3D space in order to discriminate different shapes.
  • 10. PointNet Modifications input data,increase dimensionality? PointNet: (pg. 1) "In the basic setting each point is represented by just its three coordinates (x, y, z). Additional dimensions may be added by computing normals and other local or global features." Data columns: x, y, z, red, green, blue, no normals Pointclouds canbe huge https://www.we-get-around.com/wegetaround- atlanta-our-blog/2015/10/cubicasa-creates- 2d-and-3d-floor-plans-for-matterport-photo graphers-from-3d-showcase-tours 6-dimensional inputdata With the x,y,z coordinates one obtains also R,G,B values (or CIE LAB colorspace) that are very useful in segmenting objects. 7-dimensional inputdata Normals could be obtained too if the camera position were known Eurographics Symposium on Geometry Processing 2016, Volume 35 (2016), Number 5 http://dx.doi.org/10.1111/cgf.12983 PointNet: (pg. 13)
  • 11. PointNet Modifications Architecture #1: Uncertainty estimation? https://arxiv.org/pdf/1703.04977.pdf http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html [in classification pipeline only] not in segmentation part
  • 12. PointNet Modifications Architecture #2: component variations? Nonlinearity Pooling Layer Normalization In order to make a model invariant to input permutation, the authors use max pooling as the simple symmetric function to aggregate the information from each point. [in classification[ All layers, except the last one, include ReLU and batch normalization. [in classification[ All layers, except the last one, include ReLU and batch normalization. http://arxiv.org/abs/1604.04112 “One possible future line of work is to embed the network in its entirety in the frequency domain. In models that employ Fourier transforms to compute convolutions, at every convolutional layer the input is FFT-ed and the elementwise multiplication output is then inverse FFT-ed. These back-andforth transformations are very computationally intensive, and as such it would be desirable to strictly remain in the frequency domain. However, the reason for these repeated transformations is the application of nonlinearities in the forward domain: if one were to propose a sensible nonlinearity in the frequency domain, this would spare us from the incessant domain switching.” Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. https://arxiv.org/abs/1602.07868 https://arxiv.org/abs/1605.09332 http://arxiv.org/abs/1512.07108
  • 13. PointNet Modifications Architecture #3: Unsupervised/Semi-supervised extensions?