0% found this document useful (0 votes)
2 views

Cnn

The document discusses Convolutional Neural Networks (CNNs), emphasizing the importance of local feature extraction through convolutions rather than fully connected layers, which can lead to a large number of parameters. It explains key concepts such as padding, strides, and the parameters required for defining convolutional layers in PyTorch. The document also highlights the distinction between features and channels in CNNs and the advantages of using pooling layers.

Uploaded by

quan.tran220401
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Cnn

The document discusses Convolutional Neural Networks (CNNs), emphasizing the importance of local feature extraction through convolutions rather than fully connected layers, which can lead to a large number of parameters. It explains key concepts such as padding, strides, and the parameters required for defining convolutional layers in PyTorch. The document also highlights the distinction between features and channels in CNNs and the advantages of using pooling layers.

Uploaded by

quan.tran220401
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Convolutional Neural Network

VNUK - NCT & TTH


Fully Connected
Fully Connected
Consider the simple image and fully connected network below:

1. It results in a LOT of parameters.


2. The order of our features doesn’t matter.
Every input node is connected to every node in the next layer

- is that really necessary?

When you look at this image, how do you know that it’s TOM?
● You notice the structure in the image (there’s a face, shoulders, a smile,
etc.)
● You notice how different structures are positioned and related (the face is
on top of the shoulders, etc.)
● You probably use the shading (colour) to infer things about the image too

The point here is that the structure of our data (the pixels) is important.
So maybe, we should have each hidden node only look at a small area of the
image, like this:
we can add as many of these “filters” as we like to make more complex models that can identify more
useful things:
Convolutions - basic idea
Convolutions - basic idea
Activation map
Activation map
By default, our kernels are only applied where the filter fully fits on top of the input. But we can control this behaviour and the size of
our output with:

● padding: “pads” the outside of the input 0’s to allow the kernel to reach the boundary pixels
● strides: controls how far the kernel “steps” over pixels.

Below is an example with:

● padding=1: we have 1 layer of 0’s around our border


● strides=(2,2): our kernel moves 2 data points to the right for each row, then moves 2 data points down to the next row
In PyTorch, convolutional layers are defined as torch.nn.Conv2d,
there are 5 important arguments we need to know:

1. in_channels: how many features are we passing in. Our features


are our colour bands, in greyscale, we have 1 feature, in colour, we
have 3 channels.
2. out_channels: how many kernels do we want to use. Analogous to
the number of hidden nodes in a hidden layer of a fully connected
network.
3. kernel_size: the size of the kernel. Above we were using 3x3.
Common sizes are 3x3, 5x5, 7x7.
4. stride: the “step-size” of the kernel.
5. padding: the number of pixels we should pad to the outside of the
image so we can get edge pixels.
# 1 kernel of (3,3)
conv_layer = torch.nn.Conv2d(1, 1, kernel_size=(5, 5))

# 2 kernels of (3,3)
conv_layer = torch.nn.Conv2d(1, 2, kernel_size=(3, 3))
# 3 kernels of (5,5)
conv_layer = torch.nn.Conv2d(1, 3, kernel_size=(5, 5))

# 1 kernel of (51,51) no padding # 1 kernel of (51,51) with padding


conv_layer = torch.nn.Conv2d(1, 1, kernel_size=(50, 50)) conv_layer = torch.nn.Conv2d(1, 1, kernel_size=(51, 51), padding=25)
Padding
With CNN we are no longer flattening our data, so what are our “features”? Our features are called “channels” in CNN-lingo ->
they are like the colour channels in an image:

● A grayscale image has 1 feature/channel


● A coloured image has 3 features/channel
What’s important with CNNs is that the size
of our input data does not impact how
many parameters we have in our
convolutonal layers. For example, your
kernels don’t care how big your image is
(i.e., 28 x 28 or 256 x 256), all that matters
is:

1. How many features (“channels”) you


have: in_channels
2. How many filters you use in each
layer: out_channels
3. How big the filters are:
kernel_size
Flattening
Pooling layer
Max-Pooling
Average-Pooling
Advantages
ConV Layer
ConV 1D
ConV 2D
Kernel Types
ConV Layer
Kernel
Filter
Window
Mask
Filter Mask
Pool Layer

Max-Pooling Average-Pooling
LeNet
Flatten Image

You might also like