Cnn
Cnn
When you look at this image, how do you know that it’s TOM?
● You notice the structure in the image (there’s a face, shoulders, a smile,
etc.)
● You notice how different structures are positioned and related (the face is
on top of the shoulders, etc.)
● You probably use the shading (colour) to infer things about the image too
The point here is that the structure of our data (the pixels) is important.
So maybe, we should have each hidden node only look at a small area of the
image, like this:
we can add as many of these “filters” as we like to make more complex models that can identify more
useful things:
Convolutions - basic idea
Convolutions - basic idea
Activation map
Activation map
By default, our kernels are only applied where the filter fully fits on top of the input. But we can control this behaviour and the size of
our output with:
● padding: “pads” the outside of the input 0’s to allow the kernel to reach the boundary pixels
● strides: controls how far the kernel “steps” over pixels.
# 2 kernels of (3,3)
conv_layer = torch.nn.Conv2d(1, 2, kernel_size=(3, 3))
# 3 kernels of (5,5)
conv_layer = torch.nn.Conv2d(1, 3, kernel_size=(5, 5))
Max-Pooling Average-Pooling
LeNet
Flatten Image