Lecture2.2 UnimodalRepresentations Part1 PDF
Lecture2.2 UnimodalRepresentations Part1 PDF
Dimension of heterogeneity
Image representations
Image gradients, edges, kernels
Convolution neural network (CNN)
Convolution and pooling layers
Visualizing CNNs
Region-based CNNs
Sequence modeling with convolution networks
8
Dimensions of
Heterogeneity
Heterogeneous Modalities
Homogeneous Heterogeneous
Modality A
Modalities Modalities
Modality B (with similar qualities) (with diverse qualities)
1
0
1
Dimensions of Heterogeneity Modality A Modality B
1 Element representations:
Discrete, continuous, granularity
2 Element distributions:
Density, frequency
3 Structure:
Temporal, spatial, latent,
explicit
4 Information:
Abstraction, entropy 𝐻( ) 𝐻( )
5 Noise:
Uncertainty, noise, missing data
6 Relevance:
Task, context dependence 1 2
1
1
1
Modality Profile Modality A Modality B
1 Element representations:
Discrete, continuous, granularity
2 Element distributions:
Density, frequency
3 Structure:
Temporal, spatial, latent,
explicit
4 Information:
Abstraction, entropy 𝐻( ) 𝐻( )
5 Noise:
Uncertainty, noise, missing data
6 Relevance:
Task, context dependence 1 2
1
2
1
Modality Profile Visual Image Modality
1 Element representations:
Discrete, continuous, granularity
?
2 Element distributions:
Density, frequency ?
3 Structure:
Temporal, spatial, latent, ?
explicit
4 Information:
Abstraction, entropy
?
5 Noise:
Uncertainty, noise, missing data ?
6 Relevance:
Task, context dependence ?
1
3
1
Image
Representations
How Would You Describe This Image?
…
15
1
Object-Based Visual Representation
“person”
label
Feature vector
Appearance
descriptor
❑ Age
❑ Expression
❑ Clothes …
❑
1
Object Descriptors
Many approaches over the years…
1
Object Descriptors
Many approaches over the years…
Horizontal
Oriented
and vertical
gradients
gradients
Templates tested
on the image (i.e.,
convolution
kernels)
1
Convolution Kernels
∗ =
Convolution
kernels
Response maps
1
Object Descriptors
Many approaches over the years…
2
One representation, lots of tasks
https://github.com/facebookresearch/detectron2
2
Facial expression analysis
2
Articulated Body Tracking: OpenPose
https://github.com/CMU-Perceptual-Computing-
Lab/openpose
2
Convolutional
Neural Networks
Why using Convolutional Neural Networks?
Goal: building more abstract,
hierarchical visual representations
Objects
Key advantages:
1)Inspired from visual cortex
2)Encourages visual abstraction Parts
3)Exploits translation invariance
4)Kernels/templates are learned
5)Fewer parameters than MLP Edges/blobs
Input
pixels
2
Translation Invariance
2
Predefined vs Learned Kernels
Predefined kernels
Learned kernels
Convolutional Neural Network (CNN)
Gabor filters
With CNNs, the kernel values are
learned as model parameters
2
LearnedFilters(aka Convolution Kernels) ht t ps : //distill.pub/2017/f eat u re-visualization/
2
Convolution in 2D –Example
∗ =
Convolution
kernel
2
Convolution as a Fully-Connected Network
Input
Input: all pixels
Not efficient!
2 0 0 × 2 0 0 image
(image)
requires
4 0 , 0 0 0 × 𝑛 parameters
(where n is size of kernel)
Output
And it may learn different kernels
for different pixel positions
Output: kernel responses
(response map) Not translation invariant
3
Convolutional Neural Layer
Input
(response map)
Output: kernel responses
Convolution
𝑦=𝑊𝑥 kernel
3
Convolutional Neural Layer
(image)
Weighted sum Example with
𝑊𝑥 1D kernel:
Output w1 w2 w3
(response map)
Output: kernel responses
𝑦=𝑊𝑥
3
Convolutional Neural Layer
(image)
Weighted sum Example with
𝑊𝑥 1D kernel:
Output 𝒘 𝟏 𝒘 𝟐 𝒘 𝟑
(response map)
Output: kernel responses
𝑦=𝑊𝑥
3
Convolutional Neural Layer
(image)
Example with
𝑾 = 1D kernel:
Output 𝒘 𝟏 𝒘 𝟐 𝒘 𝟑
3
Pooling Layer
3
Common architectures
3
Example: VGGNetmodel
3
Residual Networks (ResNet)
3
Visualizing CNNs
Visualizing the Last CNN Layer: t-sne
Alex Net
4
Deconvolution
4
Visualizing & Understanding Conv. Nets
4
Grad-CAM [ICCV 2017]
4
Region-based CNNs
Object recognition
4
Object Detection (and Segmentation)
4
Object Detection (and Segmentation)
4
Selective Search [Uijlings et al., IJCV 2013]
Image segmentation And then merge
(using superpixels) similar regions
Create box
region proposals
4
R-CNN [Girshicket al., CVPR 2014]
Fast R-CNN: Applies CNN only once, and then extracts regions
Faster R-CNN: Region selection on the Conv5 response map
5
Mask R-CNN: Detection and Segmentation
(He et al., 2018)
5
Sequential Modeling
with Convolutional
Networks
Modeling Temporal and Sequential Data
5
3D CNN
3D CNN
Input as a 3D tensor
(stacking video images)
5
Time-Delay Neural Network
1D Convolution
5
Temporal Convolution Network (TCN) [Lea et al., CVPR 2017]
Decoder
Encoder
5
Appendix: Tools for
Automatic visual
behavior analysis
5
Automatic analysis of visual behavior
Face detection
Face tracking
Facial landmark detecion
Head pose
Eye gaze tracking
Facial expression analysis
Body pose tracking
5
Face Detection –Multi-Task CNN [SPL 2016]
60
6
Existing software (face detection)
6
Facial Landmarks: Constrained Local Neural Field
62
Existing software (facial landmarks)
6
Facial expression analysis
[OpenFace: an open source facial behavior analysis toolkit, T. Baltrušaitiset al., 2016]
64
6
Existing Software (expression analysis)
6
Gaze Estimation –Eye, Head and Body
Image from Hachisuet al (2018). FaceLooks: A Smart Headband for Signaling Face-to-Face Behavior.Sensors.
6
Existing Software (head gaze)
OpenFace
https://github.com/TadasBaltrusaitis/OpenFace
Chehraface tracking
https://sites.google.com/site/chehrahome/
Watson: head pose estimation
http://sourceforge.net/projects/watson/
Random forests
http://www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html
(requires a Kinect)
IntraFace
http://www.humansensing.cs.cmu.edu/intraface/
6
Existing Software (eye gaze)
6
Articulated Body Tracking: OpenPose
6
Existing Software (body tracking)
OpenPose
https://github.com/CMU-Perceptual-Computing-Lab/openpose
Microsoft Kinect
http://www.microsoft.com/en-us/kinectforwindows/
OpenNI
http://openni.org/
Convolutional Pose Machines
https://github.com/shihenw/convolutional-pose-machines-release
7
Visual Descriptors
SIFT descriptors
Optical Flow Gabor Jets
7
Existing Software (visual descriptors)