0% found this document useful (0 votes)

66 views

04-CNN PDF

This document provides an overview of convolutional neural networks (CNNs). It discusses how CNNs use convolutional filtering to extract image features at different layers. CNNs apply filters to local regions of the image to detect patterns such as edges or shapes. The filtering process is repeated at subsequent layers to detect increasingly complex patterns. Popular CNN architectures like LeNet, AlexNet, and VGGNet are mentioned as applying this approach to tasks like image classification.

Uploaded by

Muhammad Rizwan Khalid

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

04-CNN PDF

Uploaded by

Muhammad Rizwan Khalid

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 170

CONVOLUTIONAL NEURAL Dr Omar Arif

NETWORKS [email protected]
OUTLINE Additional Reading:
http://cs231n.github.io/convolutional-networks/

Visual Recognition
Image Representation
Challenges

Convolutional Neural Networks

Image Filtering
CNN Layer
Pooling Layer
ReLU Layer
Fully Connected Later
Famous CNN Architectures
VISUAL OBJECT RECOGNITION
REPRESENTING IMAGES AS MATRICES
IMAGE SENSING:
CONTINUOUS IMAGE PROJECTED ONTO A SENSOR ARRAY

4
REPRESENTING IMAGE AS A MATRIX

5
REPRESENTING IMAGE AS A MATRIX

6
COMPUTER VISION – MAKE SENSE OF
NUMBERS
255 255 240  255
255 248 232  255
252 247 238  239
    
255 255 255  255

7
VISUAL RECOGNITION
Design algorithms that are capable of
 Classifying images or videos
 Detect and localize image
 Estimate semantic and geometrical attributes
 Classify human activity and events

Why is this challenging?

8
HOW MANY OBJECT CATEGORIES ARE
THERE?

9
CHALLENGES – SHAPE AND APPEARANCE
VARIATIONS

 10
CHALLENGES – VIEWPOINT VARIATIONS

 11
CHALLENGES – ILLUMINATION

 12
CHALLENGES – BACKGROUND CLUTTER

 13
CHALLENGES – SCALE

 14
CHALLENGES – OCCLUSION

 15
CHALLENGES DO NOT APPEAR IN
ISOLATION!
Task: Detect phones in this image

Appearance variations
Viewpoint variations
Illumination variations
Background clutter
Scale changes
Occlusion

 16
CONVOLUTIONAL NEURAL
NETWORK
CONVOLUTIONAL NEURAL NETWORK
CNN or Convnet is feed forward neural network specially designed
for images

A two-dimensional
array of pixels

CNN X or O
FOR EXAMPLE

CNN X

CNN O
TRICKIER CASES

CNN X

CNN O
DECIDING IS HARD

?
=
WHAT COMPUTERS SEE

?
=
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
COMPUTERS ARE LITERAL

=
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

x
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
CONVNETS MATCH PIECES OF THE IMAGE
=

=
PIECES OF THE IMAGE ARE CALLED
FEATURES

1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
HOW COMPUTER MATCH FEATURES:
CONVOLUTION (LINEAR FILTERING)
1 -1 -1
-1 1 -1
Convolution is a
-1 -1 1 neighborhood
operation in which
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
each output pixel is the
-1 -1 1 -1 -1 -1 1 -1 -1 weighted sum of
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 neighboring input
-1 -1 -1 1 -1 1 -1 -1 -1 pixels. The matrix of
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 weights is called
-1 -1 -1 -1 -1 -1 -1 -1 -1 the convolution kernel,
also known as the filter.
CONVOLUTION
1 -1 -1
1
-1 1 -1
9
-1 -1 1

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
CONVOLUTION
1 -1 -1
1
-1 1 -1
9
-1 -1 1

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 55
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
CONVOLUTION
1 -1 -1
1 -1 1 -1
9 -1 -1 1

-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-1 -1 1 -1 -1 -1 1 -1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

-1 -1 -1 1 -1 1 -1 -1 -1 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

-1 -1 -1 -1 1 -1 -1 -1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

-1 -1 -1 1 -1 1 -1 -1 -1 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-1 -1 1 -1 -1 -1 1 -1 -1 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

-1 1 -1 -1 -1 -1 -1 1 -1 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-1 -1 -1 -1 -1 -1 -1 -1 -1
LINEAR FILTERS: EXAMPLES

1 1 1
1 1 1
1 1 1 =
Original Blur (with a mean
filter)

Source: D. Lowe
PRACTICE WITH LINEAR FILTERS

0 0 0
0 1 0
0 0 0 ?
Original

Source: D. Lowe
PRACTICE WITH LINEAR FILTERS

0 0 0
0 1 0
0 0 0

Original Filtered
(no change)

Source: D. Lowe
PRACTICE WITH LINEAR FILTERS

0 0 0
0 0 1
0 0 0 ?
Original

Source: D. Lowe
PRACTICE WITH LINEAR FILTERS

0 0 0
0 0 1
0 0 0

Original Shifted left

By 1 pixel

Source: D. Lowe
Image from http://www.texasexplorer.com/austincap2.jpg

Kristen Grauman
Showing magnitude of responses

Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Kristen Grauman
Fully Connected Layer
Example: 200x200 image
40K hidden units
~2B parameters!!!

- Spatial correlation is local

- Waste of resources + we have not enough
training samples anyway.. Ranzato

59
Locally Connected Layer

Example: 200x200 image

40K hidden units
Filter size: 10x10
4M parameters

Note: This parameterization is good when

input image is registered (e.g., face recognition).
Ranzato

60
Locally Connected Layer
STATIONARITY? Statistics is similar at
different locations

Example: 200x200 image

40K hidden units
Filter size: 10x10
4M parameters

Ranzato

61
Convolutional Layer

Share the same parameters across different

locations (assuming input is stationary):
Convolutions with learned kernels

Ranzato

62
CONVOLUTION

Border Handling:
Zero-Padding
CONVOLUTION

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-1 1 -1 -1 -1 -1 -1 1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

-1 -1 1 -1 -1 -1 1 -1 -1
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
-1 -1 -1 1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1
= 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

-1 -1 -1 1 -1 1 -1 -1 -1 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-1 -1 1 -1 -1 -1 1 -1 -1
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
-1 1 -1 -1 -1 -1 -1 1 -1
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-1 1 -1 -1 -1 -1 -1 1 -1
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
-1 -1 1 -1 -1 -1 1 -1 -1 1 -1 -1
=
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
-1 -1 -1 1 -1 1 -1 -1 -1
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-1
-1
-1
1
1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
1
-1
-1
-1 -1 1 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 1 -1 -1 -1 -1 -1 1 -1 -0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

-1 -1 1 -1 -1 -1 1 -1 -1 1 -1 1
=
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11

0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11

-1
-1
-1
1
1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
1
-1
-1
1 -1 1 -0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 -1 -1 -1 -1 -1 -1 -1 -1

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-1 1 -1 -1 -1 -1 -1 1 -1 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 1
=
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

-1
-1
-1
1
1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
1
-1
-1
1 -1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

CONVOLUTION LAYER

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

1 -1 -1 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

-1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-1 -1 1 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 1 -1 -1 -1 -1 -1 1 -1 -0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

-1
-1
-1
-1
1
-1
-1
1
-1
-1
-1
1
1
-1
-1
-1
-1
-1
1 -1 1 0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11

-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11

0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11

-1
-1
-1
1
1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
1
-1
-1
1 -1 1 -0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 -1 -1 -1 -1 -1 -1 -1 -1

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

-1 -1 1 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

1 -1 -1 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

CONVOLUTION LAYER

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-1 -1 -1 -1 -1 -1 -1 -1 -1 0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33

-1 1 -1 -1 -1 -1 -1 1 -1 -0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55

-1 -1 1 -1 -1 -1 1 -1 -1
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
-0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1 -1 1 -1 -1 -1 1 -1 -1
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55
-1 1 -1 -1 -1 -1 -1 1 -1
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33
-1 -1 -1 -1 -1 -1 -1 -1 -1

0.33 -0.11 0.55 0.33 0.11 -0.11 0.77

-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11

0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11

0.33 0.33 -0.33 0.55 -0.33 0.33 0.33

0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55

-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11

0.77 -0.11 0.11 0.33 0.55 -0.11 0.33

CONVOLUTION LAYER
CONVOLUTION LAYER
CONVOLUTION LAYER
CONVOLUTION LAYER
CONVOLUTION LAYER
If we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

CONVOLUTION LAYER
ConvNet is a sequence of Convolutional Layers, interspersed with Rectified Linear
Unit (ReLU)
CONVOLUTION LAYER
A closer look at spatial dimensions:
activation map
32x32x3 image
5x5x3 filter
32