0% found this document useful (0 votes)

2 views

Lecture2.2 UnimodalRepresentations Part1 PDF

The document discusses unimodal representations in multimodal machine learning, focusing on image representations, convolutional neural networks (CNNs), and their applications in object detection and segmentation. It covers various aspects of CNNs, including their structure, advantages, and visualization techniques. Additionally, it highlights tools for automatic visual behavior analysis and existing software for tasks like face detection and expression analysis.

Uploaded by

kuroemon84426644

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture2.2 UnimodalRepresentations Part1 PDF

Uploaded by

kuroemon84426644

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Multimodal Machine Learning

Lecture 2.2: Unimodal Representations

Đàm Quang Tuấn
Lecture Objectives

Dimension of heterogeneity
Image representations
Image gradients, edges, kernels
Convolution neural network (CNN)
Convolution and pooling layers
Visualizing CNNs
Region-based CNNs
Sequence modeling with convolution networks

Team matching event

8
Dimensions of
Heterogeneity
Heterogeneous Modalities

Information present in different modalities will often show

diverse qualities, structures and representations.

Homogeneous Heterogeneous
Modality A
Modalities Modalities
Modality B (with similar qualities) (with diverse qualities)

Examples: Images Text from Language ???

from 2 2 different and vision
cameras languages

1
0
1
Dimensions of Heterogeneity Modality A Modality B

1 Element representations:
Discrete, continuous, granularity

2 Element distributions:
Density, frequency

3 Structure:
Temporal, spatial, latent,
explicit
4 Information:
Abstraction, entropy 𝐻( ) 𝐻( )
5 Noise:
Uncertainty, noise, missing data

6 Relevance:
Task, context dependence 1 2
1
1
1
Modality Profile Modality A Modality B

1 Element representations:
Discrete, continuous, granularity

2 Element distributions:
Density, frequency

3 Structure:
Temporal, spatial, latent,
explicit
4 Information:
Abstraction, entropy 𝐻( ) 𝐻( )
5 Noise:
Uncertainty, noise, missing data

6 Relevance:
Task, context dependence 1 2
1
2
1
Modality Profile Visual Image Modality

1 Element representations:
Discrete, continuous, granularity
?
2 Element distributions:
Density, frequency ?
3 Structure:
Temporal, spatial, latent, ?
explicit
4 Information:
Abstraction, entropy
?
5 Noise:
Uncertainty, noise, missing data ?
6 Relevance:
Task, context dependence ?
1
3
1
Image
Representations
How Would You Describe This Image?

…
15
1
Object-Based Visual Representation

“person”
label

Feature vector
Appearance
descriptor
❑ Age
❑ Expression
❑ Clothes …
❑

1
Object Descriptors
Many approaches over the years…

Image gradient Edge detection

How to represent and

detect an object?
Histograms of
Oriented Gradients Optical Flow

1
Object Descriptors
Many approaches over the years…
Horizontal
Oriented
and vertical
gradients
gradients

Templates tested
on the image (i.e.,
convolution
kernels)

How to represent and

detect an object?
Inspired by
Gabor filters
visual cortex

1
Convolution Kernels

∗ =

Convolution
kernels

Response maps

1
Object Descriptors
Many approaches over the years…

Convolutional Neural Network (CNN)

How to represent and More details about CNNs is coming…

detect an object? … and we will also talk about visual
Transformers in coming weeks…

And images are more than a list of objects!

2
One representation, lots of tasks

https://github.com/facebookresearch/detectron2
2
Facial expression analysis

[OpenFace: an open sourcefacial behavior analysis toolkit, T. Baltrušaitiset al., 2016]

2
Articulated Body Tracking: OpenPose
https://github.com/CMU-Perceptual-Computing-
Lab/openpose

See appendix for list of available tools

for automatic visual behavior analysis

2
Convolutional
Neural Networks
Why using Convolutional Neural Networks?
Goal: building more abstract,
hierarchical visual representations
Objects

Key advantages:
1)Inspired from visual cortex
2)Encourages visual abstraction Parts
3)Exploits translation invariance
4)Kernels/templates are learned
5)Fewer parameters than MLP Edges/blobs

Input
pixels

2
Translation Invariance

2 Data Points –Which one is up?

➢
MLP can easily learn this task
(possibly with only 1 neuron!)

What happens if the face is slightly translated?

➢ The model should still be able to classify it

Conventional MLP models are not translation invariant!

➢ But CNNs are kernel-based, which helps with translation
invariance and reduce number of parameters

2
Predefined vs Learned Kernels
Predefined kernels
Learned kernels
Convolutional Neural Network (CNN)

Gabor filters
With CNNs, the kernel values are
learned as model parameters

2
LearnedFilters(aka Convolution Kernels) ht t ps : //distill.pub/2017/f eat u re-visualization/

2
Convolution in 2D –Example

∗ =
Convolution
kernel

Input Response map

image

2
Convolution as a Fully-Connected Network

Input
Input: all pixels
Not efficient!

2 0 0 × 2 0 0 image
(image)
requires
4 0 , 0 0 0 × 𝑛 parameters
(where n is size of kernel)
Output
And it may learn different kernels
for different pixel positions
Output: kernel responses
(response map) Not translation invariant

3
Convolutional Neural Layer

Input

Input: all pixels

Example with
(image)
1D kernel:
Weighted sum
𝑊𝑥 w1 w2 w3
Output

(response map)
Output: kernel responses
Convolution
𝑦=𝑊𝑥 kernel

3
Convolutional Neural Layer

Modification 1:Sliding window –Only apply

Input
the kernel to a small region
Input: all pixels

(image)
Weighted sum Example with
𝑊𝑥 1D kernel:
Output w1 w2 w3

(response map)
Output: kernel responses

𝑦=𝑊𝑥

3
Convolutional Neural Layer

Modification 2:Same kernel applied to

Input
all sliding windows
Input: all pixels

(image)
Weighted sum Example with
𝑊𝑥 1D kernel:
Output 𝒘 𝟏 𝒘 𝟐 𝒘 𝟑

(response map)
Output: kernel responses

𝑦=𝑊𝑥

3
Convolutional Neural Layer

Modification 2:Same kernel applied to

Input
all sliding windows

(image)
Example with
𝑾 = 1D kernel:
Output 𝒘 𝟏 𝒘 𝟐 𝒘 𝟑

(response map) Can be implemented efficiently on GPUs

𝑦=𝑊𝑥 W will be 3D: 3rd dimension allows for multiple kernels
3
Convolutional Neural Network

Multiple convolutional layers

Allows the network to Objects
learn combinations of
sub-parts, to increase Combination of edges
complexity
Parts

but how to encourage Combination of edges

abstraction and summarization? Edges/blobs
Combination of pixels
Answer: Pooling layers
Input
pixels

3
Pooling Layer

Response map subsampling:

Allows summarization of the responses

3
Common architectures

Repeat several times:

Start with a convolutional layer
Followed by non-linear activation and pooling
End with a fully connected (MLP) layer

3
Example: VGGNetmodel

Used for object classification task

1000-way classification
task 138 million parameters

3
Residual Networks (ResNet)

Adding residual connections

ResNet(He et al., 2015)

• Up to 152 layers!

3
Visualizing CNNs
Visualizing the Last CNN Layer: t-sne

Alex Net

Embed high dimensional data

points (i.e.feature codes) so
that pairwise distances are
conserved in local
neighborhoods.

4
Deconvolution

4
Visualizing & Understanding Conv. Nets

• What Makes Convnets “Tick”?

• What happens in hidden units?
• Layer 1: easy to visualize
• Deeper layers: just a bunch of numbers? Or
something more meaningful?
• Do convnets use context or actually model
target classes
Introducing: Visualizing & Understanding Conv. Nets

• Zeiler & Fergus, 2013

• Goal: Try to visualize the “black box” hidden units, gain insights
• Hope: Use conclusions to improve performance
• Idea: “Deconvolutional” neural net
Deconvolutional Nets

• Originally suggested for unsupervised feature learning : construct

a convolutional net, cost function is image reconstruction error
• Used here to find what stimuli causes strongest responses in
hidden units
• Run many images through net → find strongest unit activations in
each layer → visualize by “reversing” net operation
Reversing a convent
“Unpooling”
Layer 1:
Hidden Layer Visualizations: layer 2
Hidden Layer Visualizations: Layer 3
Hidden Layer Visualizations: Layer 4
Hidden Layer Visualizations: Layer 5
CAM: Class Activation Mapping [CVPR 2016]

4
Grad-CAM [ICCV 2017]

4
Region-based CNNs
Object recognition

4
Object Detection (and Segmentation)

Input image Detected Objects

One option: Sliding window

4
Object Detection (and Segmentation)

Input image Region Proposals Detected Objects

A better option: Start by Identifying hundreds of region

proposals and then apply our CNN object detector

How to efficiently identify region proposals?

4
Selective Search [Uijlings et al., IJCV 2013]
Image segmentation And then merge
(using superpixels) similar regions

Create box
region proposals

4
R-CNN [Girshicket al., CVPR 2014]

• Select ~2000 region proposals Time consuming!

• Warp each region Apply CNN to
• each region Time consuming!

Fast R-CNN: Applies CNN only once, and then extracts regions
Faster R-CNN: Region selection on the Conv5 response map

5
Mask R-CNN: Detection and Segmentation
(He et al., 2018)

5
Sequential Modeling
with Convolutional
Networks
Modeling Temporal and Sequential Data

How to represent a video sequence?

One option: Recurrent Neural Networks

(more about this next week)

5
3D CNN

3D CNN

Input as a 3D tensor
(stacking video images)

First layer with 3D kernels

5
Time-Delay Neural Network

1D Convolution

Alexander Waibel, Phoneme Recognition Using Time-Delay Neural Networks,

SP87-100, Meeting of the Institute of Electrical, Information and Communication
Engineers (IEICE), December,1987,Tokyo, Japan.

5
Temporal Convolution Network (TCN) [Lea et al., CVPR 2017]

Decoder

Encoder

5
Appendix: Tools for
Automatic visual
behavior analysis
5
Automatic analysis of visual behavior

Face detection
Face tracking
Facial landmark detecion
Head pose
Eye gaze tracking
Facial expression analysis
Body pose tracking

5
Face Detection –Multi-Task CNN [SPL 2016]

Stage 1: candidate windows are produced through a fast Proposal Network

Stage 2: refine these candidates through a Refinement Network Stage 3:

produces final bounding box and facial landmarks position

60
6
Existing software (face detection)

Multi-Task CNN face detector

https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html
OpenCV (Viola-Jones detector)
dlib(HOG + SVM)
http://dlib.net/
Tree based model (accurate but very slow)
http://www.ics.uci.edu/~xzhu/face/
HeadHunter(accurate but slow)
http://markusmathias.bitbucket.org/2014_eccv_face_detection/
NPD
http://www.cbsr.ia.ac.cn/users/scliao/projects/npdface/

6
Facial Landmarks: Constrained Local Neural Field

62
Existing software (facial landmarks)

OpenFace: facial features

https://github.com/TadasBaltrusaitis/OpenFace
Chehraface tracking
https://sites.google.com/site/chehrahome/
Menpoproject (good AAM, CLM learning tool)
http://www.menpo.or
g/
IntraFace: Facial attributes, facial expression analysis
http://www.humansensing.cs.cmu.edu/intraface/
OKAO Vision: Gaze estimation, facial expression
http://www.omron.com/ecb/products/mobile/okao03.html (Commercial software)
VisageSDK
http://www.visagetechnologies.com/products/visagesdk/ (Commercial software)

6
Facial expression analysis

[OpenFace: an open source facial behavior analysis toolkit, T. Baltrušaitiset al., 2016]

64
6
Existing Software (expression analysis)

OpenFace: Action Units

https://github.com/TadasBaltrusaitis/OpenFace
Shore: facial tracking, smile detection, age and gender detection
http://www.iis.fraunhofer.de/en/bf/bsy/fue/isyst/detektion/
FACET/CERT (EmotientAPI): Facial expression recognition
http://imotionsglobal.com/software/add-on-modules/attention-tool-facet-
module-facial-action-coding-system-facs/(Commercial software)
Affdex
http://www.affectiva.com/solutions/apis-sdks/
(commercial software)

6
Gaze Estimation –Eye, Head and Body

Image from Hachisuet al (2018). FaceLooks: A Smart Headband for Signaling Face-to-Face Behavior.Sensors.

6
Existing Software (head gaze)

OpenFace
https://github.com/TadasBaltrusaitis/OpenFace
Chehraface tracking
https://sites.google.com/site/chehrahome/
Watson: head pose estimation
http://sourceforge.net/projects/watson/
Random forests
http://www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html
(requires a Kinect)
IntraFace
http://www.humansensing.cs.cmu.edu/intraface/

6
Existing Software (eye gaze)

OpenFace: gaze from a webcam

https://github.com/TadasBaltrusaitis/OpenFace
EyeAPI: eye pupil detection
http://staff.science.uva.nl/~rvalenti/
EyeTab
https://www.cl.cam.ac.uk/research/rainbow/projects/eyetab/
OKAO Vision: Gaze estimation, facial expression
http://www.omron.com/ecb/products/mobile/okao03.html (Commercial software)

6
Articulated Body Tracking: OpenPose

6
Existing Software (body tracking)

OpenPose
https://github.com/CMU-Perceptual-Computing-Lab/openpose
Microsoft Kinect
http://www.microsoft.com/en-us/kinectforwindows/
OpenNI
http://openni.org/
Convolutional Pose Machines
https://github.com/shihenw/convolutional-pose-machines-release

7
Visual Descriptors

Image gradient Edge detection Histograms of Oriented Gradients

SIFT descriptors
Optical Flow Gabor Jets

7
Existing Software (visual descriptors)

OpenCV: optical flow, gradient, Haarfilters…

SIFT descriptors
http://blogs.oregonstate.edu/hess/code/sift/
dlib – HoG
http://dlib.net/
OpenFace: Aligned HoGfor faces
https://github.com/TadasBaltrusaitis/CLM-framework

Convolutional Neural Networks (LeNet) - DeepLearning 0.1 Documentation
No ratings yet
Convolutional Neural Networks (LeNet) - DeepLearning 0.1 Documentation
12 pages
Multi Strategy EA
No ratings yet
Multi Strategy EA
5 pages
CNN 2
No ratings yet
CNN 2
47 pages
Deep Learning: Alberto Ezpondaburu
No ratings yet
Deep Learning: Alberto Ezpondaburu
58 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
Rec03 - Deep Architectures
No ratings yet
Rec03 - Deep Architectures
65 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
CNN
No ratings yet
CNN
9 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
cnn-notes-unit-3-notes
No ratings yet
cnn-notes-unit-3-notes
17 pages
Unit 3
No ratings yet
Unit 3
105 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
CNN 3
No ratings yet
CNN 3
21 pages
week6_2
No ratings yet
week6_2
37 pages
Convolutional Neuralnetworks: Abin - Roozgard
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
54 pages
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
MODULE 5
No ratings yet
MODULE 5
20 pages
CNN2
No ratings yet
CNN2
70 pages
Seminar
No ratings yet
Seminar
16 pages
What is a Convolutional Neural Network-unit3.docx
No ratings yet
What is a Convolutional Neural Network-unit3.docx
12 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
What is a CNN
No ratings yet
What is a CNN
46 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
deep learning u3
No ratings yet
deep learning u3
3 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Intro to CNN
No ratings yet
Intro to CNN
93 pages
Introduction to Deep Learning
No ratings yet
Introduction to Deep Learning
47 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
7 Applications of Convolutional Neural Networks - FWS
No ratings yet
7 Applications of Convolutional Neural Networks - FWS
3 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
UNIT -4 DL
No ratings yet
UNIT -4 DL
19 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
Introduction to Convolutional Neural Networks1-Unit3.docx
No ratings yet
Introduction to Convolutional Neural Networks1-Unit3.docx
10 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
Chapitre-8-2024 (1)
No ratings yet
Chapitre-8-2024 (1)
231 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
CV Mot
No ratings yet
CV Mot
69 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
Unit 3 NNDL-1
No ratings yet
Unit 3 NNDL-1
31 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
NGU Interview
No ratings yet
NGU Interview
16 pages
UNIT_IV_DL
No ratings yet
UNIT_IV_DL
26 pages
Scan 30 Sep 23 18 20 44
No ratings yet
Scan 30 Sep 23 18 20 44
30 pages
Week8 WEB
No ratings yet
Week8 WEB
54 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
Neural Image Compression and Explanation: Submitted By: Sampad Mohanty 2002070059
No ratings yet
Neural Image Compression and Explanation: Submitted By: Sampad Mohanty 2002070059
19 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Cv Ppt Mt101
No ratings yet
Cv Ppt Mt101
16 pages
CNN For Deep Learning - Convolutional Neural Networks
No ratings yet
CNN For Deep Learning - Convolutional Neural Networks
10 pages
AI_slide_2
No ratings yet
AI_slide_2
82 pages
Topic: Convolution Neural Network: Presented by
No ratings yet
Topic: Convolution Neural Network: Presented by
13 pages
W11 Lecture ITS69204 Image Recognition (1)
No ratings yet
W11 Lecture ITS69204 Image Recognition (1)
44 pages
PNAL9_CNNs
No ratings yet
PNAL9_CNNs
61 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
[ICSE2012]]Where should the bugs be fixed More accurate information retrieval
No ratings yet
[ICSE2012]]Where should the bugs be fixed More accurate information retrieval
12 pages
Pretraining and Evaluation CodeLLMs
No ratings yet
Pretraining and Evaluation CodeLLMs
71 pages
lecture4.2-AlignedRepresentations
No ratings yet
lecture4.2-AlignedRepresentations
59 pages
tuMMjLRicw6mbdRGgzNDe9nLSS4t0jRIBdlIW5Ef
No ratings yet
tuMMjLRicw6mbdRGgzNDe9nLSS4t0jRIBdlIW5Ef
7 pages
1511.02799v4
No ratings yet
1511.02799v4
10 pages
(Specialist) 2006 Heffernan Exam 2 Solutions
No ratings yet
(Specialist) 2006 Heffernan Exam 2 Solutions
25 pages
Ann-Unit V
No ratings yet
Ann-Unit V
17 pages
I. Objectives:: Unit Weight of Coarse and Fine Aggregate Laboratory Experiment: 2
No ratings yet
I. Objectives:: Unit Weight of Coarse and Fine Aggregate Laboratory Experiment: 2
5 pages
Kay - Solutions
100% (2)
Kay - Solutions
47 pages
Analysis of The Gravity Field Direct and Inverse Problems Lecture Notes in Geosystems Mathematics and Computing 3030743551 9783030743550 Compress
No ratings yet
Analysis of The Gravity Field Direct and Inverse Problems Lecture Notes in Geosystems Mathematics and Computing 3030743551 9783030743550 Compress
542 pages
1 3.1: Basic Concepts of Probability and Counting
No ratings yet
1 3.1: Basic Concepts of Probability and Counting
6 pages
EN - TOTAL PROTEIN (Setting Sheet)
No ratings yet
EN - TOTAL PROTEIN (Setting Sheet)
4 pages
Sampling The Lattice Nambu-Goto String Using Continuous Normalizing Flows
No ratings yet
Sampling The Lattice Nambu-Goto String Using Continuous Normalizing Flows
29 pages
NDA Maths Chapterwise PYQs by Arihant
No ratings yet
NDA Maths Chapterwise PYQs by Arihant
791 pages
Teoria Acerca de Taladros Orientados Ingles
No ratings yet
Teoria Acerca de Taladros Orientados Ingles
30 pages
Differential forms on singular varieties De Rham and Hodge theory simplified 1st Edition Vincenzo Ancona - The ebook is ready for download to explore the complete content
100% (1)
Differential forms on singular varieties De Rham and Hodge theory simplified 1st Edition Vincenzo Ancona - The ebook is ready for download to explore the complete content
45 pages
Distance Protection
No ratings yet
Distance Protection
24 pages
UUSCI-2008-009A Directional Occlusion Shading Model For Interactive Direct Volume Rendering
No ratings yet
UUSCI-2008-009A Directional Occlusion Shading Model For Interactive Direct Volume Rendering
10 pages
Hydraulics PDF
No ratings yet
Hydraulics PDF
3 pages
4.4.4 Practice - Modeling - Two-Variable Systems of Inequalities (Practice)
0% (1)
4.4.4 Practice - Modeling - Two-Variable Systems of Inequalities (Practice)
4 pages
CHP 1 Measurements and Experimentations (Grade 9)
No ratings yet
CHP 1 Measurements and Experimentations (Grade 9)
25 pages
NAME 116 Programs
No ratings yet
NAME 116 Programs
14 pages
Project Helpful 2
No ratings yet
Project Helpful 2
6 pages
Problem Solving Chapter 3 - Math in The Modern World
No ratings yet
Problem Solving Chapter 3 - Math in The Modern World
50 pages
CHAPTER 6.1. CPM Diagram 2B Examples
No ratings yet
CHAPTER 6.1. CPM Diagram 2B Examples
11 pages
Capital Budgeting: © 2009 Cengage Learning/South-Western
No ratings yet
Capital Budgeting: © 2009 Cengage Learning/South-Western
28 pages
Jntu College of Engineering, Pulivendula Computer Science and Engineering Department
No ratings yet
Jntu College of Engineering, Pulivendula Computer Science and Engineering Department
16 pages
Doraemon
No ratings yet
Doraemon
18 pages
1011sem2 Me5106
No ratings yet
1011sem2 Me5106
7 pages
Usia Spearman
No ratings yet
Usia Spearman
2 pages
Term Paper: Course Name: Numerical Analysis Course Code:-Mth 204
No ratings yet
Term Paper: Course Name: Numerical Analysis Course Code:-Mth 204
14 pages
Types of Angles
No ratings yet
Types of Angles
4 pages
Assessment For Learning
No ratings yet
Assessment For Learning
42 pages
Mission Pack Final Draft
No ratings yet
Mission Pack Final Draft
5 pages

Lecture2.2 UnimodalRepresentations Part1 PDF

Uploaded by

Lecture2.2 UnimodalRepresentations Part1 PDF

Uploaded by

Multimodal Machine Learning

Lecture 2.2: Unimodal Representations

Team matching event

Information present in different modalities will often show

Examples: Images Text from Language ???

Image gradient Edge detection

How to represent and

How to represent and

Convolutional Neural Network (CNN)

How to represent and More details about CNNs is coming…

And images are more than a list of objects!

[OpenFace: an open sourcefacial behavior analysis toolkit, T. Baltrušaitiset al., 2016]

See appendix for list of available tools

2 Data Points –Which one is up?

What happens if the face is slightly translated?

Conventional MLP models are not translation invariant!

Input Response map

Input: all pixels

Modification 1:Sliding window –Only apply

Modification 2:Same kernel applied to

Modification 2:Same kernel applied to

(response map) Can be implemented efficiently on GPUs

Multiple convolutional layers

but how to encourage Combination of edges

Response map subsampling:

Repeat several times:

Used for object classification task

Adding residual connections

ResNet(He et al., 2015)

Embed high dimensional data

• What Makes Convnets “Tick”?

• Zeiler & Fergus, 2013

• Originally suggested for unsupervised feature learning : construct

Input image Detected Objects

One option: Sliding window

Input image Region Proposals Detected Objects

A better option: Start by Identifying hundreds of region

How to efficiently identify region proposals?

• Select ~2000 region proposals Time consuming!

How to represent a video sequence?

One option: Recurrent Neural Networks

First layer with 3D kernels

Alexander Waibel, Phoneme Recognition Using Time-Delay Neural Networks,

Stage 1: candidate windows are produced through a fast Proposal Network

Stage 2: refine these candidates through a Refinement Network Stage 3:

produces final bounding box and facial landmarks position

Multi-Task CNN face detector

OpenFace: facial features

OpenFace: Action Units

OpenFace: gaze from a webcam

Image gradient Edge detection Histograms of Oriented Gradients

OpenCV: optical flow, gradient, Haarfilters…

You might also like