Rec03 - Deep Architectures
Rec03 - Deep Architectures
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
CNN – part 1:
Convolution Layer
CNN
CNN – part 1:
Convolution Layer
CNN – part 1:
Convolution Layer
With padding:
Padding 1 => N_new = 9 => (9-3)/3+1 = 3
CNN – part 2:
Pooling Layer
Pooling
• Decrease the computational power required to process the data
• Extracting dominant features
Max pooling
If there is a good match with the feature (1 match is enough)
Avg pooling
What is the average match with the pattern in the whole area
CNN – part 3:
Fully Connected Layer(s)
• The flatten vector represents the input’s features
• Build non-linear classifier (MLP)
flatten
class CNN(nn.Module):
def __init__(self, in_channels, num_classes=10):
"""
in_channels: int
The number of channels in the input image. For MNIST, this is 1 (grayscale images).
num_classes: int
The number of classes we want to predict, in our case 10 (digits 0 to 9).
"""
super(CNN, self).__init__()
# 1st conv layer: 1 input channel, 8 output channels, 3x3 kernel, stride 1, padding 1
self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=8, kernel_size=3, stride=1, padding=1)
# Max pooling layer: 2x2 window, stride 2
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# 2nd conv layer: 8 input channels, 16 output channels, 3x3 kernel, stride 1, padding 1
self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, stride=1, padding=1)
# Fully connected layer: 16*7*7 input features (after two 2x2 poolings), 10 output features (num_classes)
self.fc1 = nn.Linear(16 * 7 * 7, num_classes)
ResNet-152?
CNN - hyperparameters
• Number of layers
• Size of kernel
• Number of kernels
• Stride
• Padding
Applications
What to do with CNN architecture?
- Object classification
- Object detection
- Image segmentation
Example Task #1 - Object detection
• Identifying and locating objects within an image.
• object detection provides both: i) the class and ii) the bounding box
coordinates for each object detected in the image.
• This makes it a more complex and information-rich task (vs. simple
detection of a certain class).
YOLO (You Only Look Once)
• Example for an advanced CNN
architecture for object
detection.
• Divides the image into a grid
and predicts bounding boxes
and class probabilities for each
grid.
• Known for its good real-time
performance.
Example Task #2 - Image segmentation
https://segment-anything.com/
Topics
• Images via CNN
Length of vector = number of words in dictionary Length of vector = a different number of learned features
(e.g., below 10 times ‘other’) in the embedded space
Sequential data – Audio (spoken lang.)
Two different domains:
• Time domain
• Frequency domain
Common in neuroscience:
• Discrete signal - Spike train data (1/0)
• Continous signal – EEG, LFP
Recurrent Neural Network (RNN)
Note:
The parameters aren’t
changing as function of t.
The hidden states
changes
RNN Layers
• Note that the input and output of RNN are not hyperparameter!
They depend on the embeddings, type of task etc.
Which architecture will we use?
Image Captioning Sentiment Machine Entity
Classification Analysis Translation, Recognition
Summarization
Example Task #1 – Image captioning
Encoder
Transformers
• Attention is All You Need (Vaswani
et al., 2017).
https://machinelearningmastery.com/a-gentle-introduction-to-positional-
encoding-in-transformer-models-part-1/
https://machinelearningmastery.com/a-gentle-introduction-to-positional-
encoding-in-transformer-models-part-1/
Tweak #2 – from attention…
● Each decoded token in the target sequence is focusing on different tokens from
the source sequence.
… A Single Self-Attention
… Multi-head Attention!
Tweak #3 - Layer Normalization
• Normalize units in a particular
layer so they will have the same
distribution across all features.
• We compute layer norm statistics
across all the hidden units in the
same layer.
Understanding model predictions with LIME | by Lars Hulstaert | Towards Data Science
Example - classification of a tree frog
• Step 1:
Divide the original image into interpretable components –
“superpixels” – a groups of pixels that look similar (image
segmentation)
• Step 2:
Generate a data set of perturbed instances by turning some of
the superpixels “off” (gray mask)
• Step 3:
Get the model’s prediction – here the probability of it being a tree frog
– per pertubed instance
• Step 4:
Learn a simple model on this data set and present the
superpixels with highest positive weights as an explanation,
graying out everything else.
Pool table ballon
LIME - Local Interpretable Model agnostic Explanations
• LIME can be applied to any model.
• It answers which datapoints (superpixel) caused the prediction.
• Provides a local interpretability / explanation – i.e., disturb the input
samples and use a simple model to understand how predictions change
• Cons:
• Explains only simple linear relations
• Often simple perturbations are
not enough!
Understanding model predictions with LIME | by Lars Hulstaert | Towards Data Science
SHAP values - SHapley Additive exPlanations
• Based on Shapley values (Game Theory), where:
• The game = reproducing a single prediction/outcome of the model
• The players = features included in the model
• SHAP values quantify the contribution each player to a single game.
https://towardsdatascience.com/shap-explained-the-way-i-wish-someone-explained-it-to-me-ab81cc69ef30