0% found this document useful (0 votes)
3 views

08-Convolution Neural Network

The document discusses the transition from Multilayer Perceptrons (MLP) to Convolutional Neural Networks (CNN), emphasizing the importance of spatial relationships in image data. CNNs leverage the structure of images to create more efficient models, reducing the number of parameters needed for training. The document also highlights concepts like translation invariance and locality, which are fundamental to the design of convolutional layers.

Uploaded by

Kalp Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

08-Convolution Neural Network

The document discusses the transition from Multilayer Perceptrons (MLP) to Convolutional Neural Networks (CNN), emphasizing the importance of spatial relationships in image data. CNNs leverage the structure of images to create more efficient models, reducing the number of parameters needed for training. The document also highlights concepts like translation invariance and locality, which are fundamental to the design of convolutional layers.

Uploaded by

Kalp Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CSD456

Deep Learning
Convolution
Neural Network
Till Now…

• Multilayer Perceptron
• Training
• Activation
• Loss
Convolution Neural Network

• So far, we have ignored this rich structure and treated


images as vectors of numbers by flattening them,
irrespective of the spatial relation between pixels.

• It was necessary to feed the resulting one-dimensional


vectors through a fully connected MLP.

• MLP is invariant to the order of the features, we could get


similar results regardless of whether we preserve an order
corresponding to the spatial structure of the pixels or not.
Convolution Neural Network (CNN)

• We should leverage our prior knowledge that nearby pixels


are typically related to each other, to build efficient models
for learning from image data. -- CNN(LeCun et al., 1995)

• Modern CNNs, as they are called colloquially, owe their


design to inspirations from biology, group theory, and a
healthy dose of experimental tinkering.

• CNNs tend to be computationally efficient, both because


they require fewer parameters than MLP
From MLP to Convolution

• So far, the models that we have discussed so far remain


appropriate options when we are dealing with tabular data.

• With tabular data, we do not assume any structure a priori


concerning how the features interact.

• However, for high-dimensional perceptual data (e.g


Images, Video), such structureless networks can grow
unwieldy.
Distinguishing cats from dogs -- Example

• Let say, we have collected an annotated dataset of one-


megapixel photographs.

• This means that each input to the network has one million
dimensions.
• Even an aggressive reduction to one thousand hidden
dimensions would require a fully connected layer
characterized by 𝟏𝟎𝟔 × 𝟏𝟎𝟑 = 𝟏𝟎𝟗 parameters.
• learning the parameters of this network may turn out to be
infeasible.
Invariance

• Imagine that we want to detect an object in an image.


• It seems reasonable that whatever method we use to
recognize objects should not be overly concerned with the
precise location of the object in the image.
• We can now make these intuitions more concrete by
following points.
1. For early layers, respond similarly to the same patch
2. For early layers, focus on local regions
3. In deeper layer, capture longer-range features of the
image
Constraining MLP

• Consider an MLP with two-dimensional images 𝐗 as inputs


and their immediate hidden representations 𝐇 similarly
represented as matrices.

• For now, both 𝐗 and 𝐇 have same shape.


• At individual pixel level.
Constraining MLP

• We simply re-index the subscripts 𝑘, 𝑙 such that 𝑘 = 𝑖 + 𝑎


and 𝑙 = 𝑗 + 𝑏. Here (𝑎, 𝑏) can be negative also.
• In other words, we set V 𝑖,𝑗,𝑎,𝑏 = W 𝑖,𝑗,𝑖+𝑎,𝑗+𝑏

12
• Number of parameters : 10 (Infeasible)
Respond similarly to the same patch -- Translation Invariance

• This implies that a shift in the input 𝐗 should simply lead to


a shift in the hidden representation 𝐇.
• This is possible if V and U do not depend on (𝑖, 𝑗)
• Means, V 𝑖,𝑗,𝑎,𝑏 = V 𝑎,𝑏 and U is constant 𝑢.

• THIS IS CONVOLUTION !!!!


6
• Number of parameters: 4 × 10
Focus on local regions -- Locality

• We should not have to look very far away from location (𝑖, 𝑗)
• This means outside some range 𝑎 > Δ or 𝑏 > Δ,
V 𝑎,𝑏 = 0

• Number of parameters : 4 × Δ2
• This is called as convolutional layer
Convolution
Convolution
Channels
Channels
Channels

You might also like