0% found this document useful (0 votes)
18 views

Rec03 - Deep Architectures

The document provides an overview of machine learning techniques, focusing on deep neural network architectures such as Convolutional Neural Networks (CNN) for image processing and Recurrent Neural Networks (RNN) for sequential data. It discusses various applications, including object detection and image segmentation, and introduces interpretability methods like LIME and SHAP. Additionally, it touches on the evolution of machine learning workflows with the advent of large-scale datasets and foundation models.

Uploaded by

Toyba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Rec03 - Deep Architectures

The document provides an overview of machine learning techniques, focusing on deep neural network architectures such as Convolutional Neural Networks (CNN) for image processing and Recurrent Neural Networks (RNN) for sequential data. It discusses various applications, including object detection and image segmentation, and introduces interpretability methods like LIME and SHAP. Additionally, it touches on the evolution of machine learning workflows with the advent of large-scale datasets and foundation models.

Uploaded by

Toyba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Machine Learning

& Neural Networks


Deep Neural Network Architectures
Topics
• Images via CNN

• Sequential data via RNN & Transformers

• Dealing with other data types

• A brief introduction to Interpretability (LIME & SHAP)


Images
A vector of pixels
A vector of pixels
CNN
Convolutional Neural Network
CNN
Convolutional Neural Network
CNN
Full architecture
Image: Convolved feature
(or activation map):
1 1 1 0 0

0 1 1 1 0

0 0 1 1 1

0 0 1 1 0
CNN – part 1:
Convolution Layer
CNN
CNN – part 1:
Convolution Layer
CNN – part 1:
Convolution Layer

With padding:
Padding 1 => N_new = 9 => (9-3)/3+1 = 3
CNN – part 2:
Pooling Layer
Pooling
• Decrease the computational power required to process the data
• Extracting dominant features

Max pooling
If there is a good match with the feature (1 match is enough)

Avg pooling
What is the average match with the pattern in the whole area
CNN – part 3:
Fully Connected Layer(s)
• The flatten vector represents the input’s features
• Build non-linear classifier (MLP)

flatten
class CNN(nn.Module):
def __init__(self, in_channels, num_classes=10):
"""
in_channels: int
The number of channels in the input image. For MNIST, this is 1 (grayscale images).
num_classes: int
The number of classes we want to predict, in our case 10 (digits 0 to 9).
"""

super(CNN, self).__init__()
# 1st conv layer: 1 input channel, 8 output channels, 3x3 kernel, stride 1, padding 1
self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=8, kernel_size=3, stride=1, padding=1)
# Max pooling layer: 2x2 window, stride 2
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# 2nd conv layer: 8 input channels, 16 output channels, 3x3 kernel, stride 1, padding 1
self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, stride=1, padding=1)
# Fully connected layer: 16*7*7 input features (after two 2x2 poolings), 10 output features (num_classes)
self.fc1 = nn.Linear(16 * 7 * 7, num_classes)

def forward(self, x):


x = F.relu(self.conv1(x)) # Apply first convolution and ReLU activation
x = self.pool(x) # Apply max pooling
x = F.relu(self.conv2(x)) # Apply second convolution and ReLU activation
x = self.pool(x) # Apply max pooling
x = x.reshape(x.shape[0], -1) # Flatten the tensor Implementation
x = self.fc1(x) # Apply fully connected layer
return x
Vanila CNN

This is the starting point!


Skip
Connection

ResNet-152?
CNN - hyperparameters
• Number of layers
• Size of kernel
• Number of kernels
• Stride
• Padding
Applications
What to do with CNN architecture?
- Object classification
- Object detection
- Image segmentation
Example Task #1 - Object detection
• Identifying and locating objects within an image.
• object detection provides both: i) the class and ii) the bounding box
coordinates for each object detected in the image.
• This makes it a more complex and information-rich task (vs. simple
detection of a certain class).
YOLO (You Only Look Once)
• Example for an advanced CNN
architecture for object
detection.
• Divides the image into a grid
and predicts bounding boxes
and class probabilities for each
grid.
• Known for its good real-time
performance.
Example Task #2 - Image segmentation

• Partitioning an image into


multiple segments.
• The goal is to assign a class label
to each pixel in the image.
• Semantic segmentation
• Instance segmentation
• Panoptic segmentation
Segment Anything Model (SAM)

https://segment-anything.com/
Topics
• Images via CNN

• Sequential data via RNN & Transformers

• Dealing with other data types

• A brief introduction to Interpretability (LIME & SHAP)


Sequential data
Sometimes our data comes in a form of a sequence
● Spike trains
● Stocks
● Sentences & Speech
Previous approach to analyze sequential data is via window-based classifiers:
● Sliding windows (avg., sum)
● For spike train data - we discretize time into bins of fixed width, and count
the number of events that occur in each time bin.
The problem: how to choose the right window size?
Sequential data - Text (NLP; written lang.)
• Can be done at character-level/ word-level / document-level:
Sparse vectors Dense vectors
One-hot encoding e.g., Bag of Words word embeddings (e.g., Word2Vec)

Length of vector = number of words in dictionary Length of vector = a different number of learned features
(e.g., below 10 times ‘other’) in the embedded space
Sequential data – Audio (spoken lang.)
Two different domains:
• Time domain
• Frequency domain
Common in neuroscience:
• Discrete signal - Spike train data (1/0)
• Continous signal – EEG, LFP
Recurrent Neural Network (RNN)

RNN RNN RNN RNN RNN


Recurrent Neural Network (RNN)

RNN RNN RNN RNN RNN


Vanilla (or Elman’s) RNN

Note:
The parameters aren’t
changing as function of t.
The hidden states
changes
RNN Layers

RNN RNN RNN RNN RNN

RNN RNN RNN RNN RNN

RNN RNN RNN RNN RNN


Hyperparameters of Vanila RNN
• Number of layers
• Hidden state dimension

• Note that the input and output of RNN are not hyperparameter!
They depend on the embeddings, type of task etc.
Which architecture will we use?
Image Captioning Sentiment Machine Entity
Classification Analysis Translation, Recognition
Summarization
Example Task #1 – Image captioning

A man and a girl sit on the ground and eat

A man and a little girl are sitting on a


sidewalk near a blue bag eating

A man wearing a black shirt and a little girl


wearing an orange dress share a treat
• Model:
• Use CNN+FC to convert the image into a single vector representation
• Use RNN to generate the output sentence using the Image vector as another input
Pros and cons of RNN
• Can process any length input
• Theoretically, the computation of a current step can use info from
many steps back
• Model size dosen’t increase for longer input context as the same
weights are applied
Pros and cons of RNN
• Can process any length input
• Theoretically, the computation of a current step can use info from
many steps back
• Model size dosen’t increase for longer input context as the same
weights are applied

• Recurrent computation is slow…


• In practice, difficult to access information from many steps back
RNN model without attention
RNN model with Attention
Sequential
data modeling
Transformers
architecture Decoder

Encoder
Transformers
• Attention is All You Need (Vaswani
et al., 2017).

• The ‘main’ ideas:


1) Positional encoding
2) Multi-head attention
3) Layer normalization (vs. batch norm)
Tweak #1 - Positional encoding

• The index value is less suited to represent an item’s position


in transformer models as for long sequences, the indices can
grow large in magnitude.
• the location or position of an entity in a sequence so that each
position is assigned a unique representation.
https://machinelearningmastery.com/a-gentle-introduction-to-positional-
encoding-in-transformer-models-part-1/
Example of positional encoding

https://machinelearningmastery.com/a-gentle-introduction-to-positional-
encoding-in-transformer-models-part-1/
https://machinelearningmastery.com/a-gentle-introduction-to-positional-
encoding-in-transformer-models-part-1/
Tweak #2 – from attention…
● Each decoded token in the target sequence is focusing on different tokens from
the source sequence.
… A Single Self-Attention
… Multi-head Attention!
Tweak #3 - Layer Normalization
• Normalize units in a particular
layer so they will have the same
distribution across all features.
• We compute layer norm statistics
across all the hidden units in the
same layer.

where H denotes the number of hidden units in a layer.


All the hidden units in a layer share the same normalization
terms μ and σ.
Types of Transformers
• Encoder models – all tokens can “see the future”
without masking (e.g., BERT by Devlin et al., 2019)
From Machine Learning to Foundation Models
• To learn a certain task, the classic workflow in machine learning was:
• Collect labeled data
• Train the model on train data
• Generalization / infer on new test data

• Today, the large-scale datasets changed this classical workflow:


• Collect a large dataset (can be labeled or unlabeled)
• Learn a representation - type of “prior” for learning
• Use the model for downstream tasks
Types of transformers
• Encoder models – all tokens can “see the future”
without masking (e.g., e.g., BERT by Devlin et al., 2019)

• Decoder models – can only observe the past to generate


text
Topics
• Images via CNN

• Sequential data via RNN & Transformers

• Dealing with other data types

• A brief introduction to Interpretability (LIME & SHAP)


Tabular data - example
• Tables - a vector of discrete/continous features (with/without
labels)
Networks – example
Topics
• Images via CNN

• Sequential data via RNN & Transformers

• Dealing with other data types

• A brief introduction to Interpretability: LIME & SHAP


LIME - Local Interpretable Model agnostic Explanations
• LIME can be applied to any model.
• Which variable caused the prediction?
• Provides a local interpretability / explanation – i.e., disturb the input samples
and use a simple model to understand how predictions change

Understanding model predictions with LIME | by Lars Hulstaert | Towards Data Science
Example - classification of a tree frog
• Step 1:
Divide the original image into interpretable components –
“superpixels” – a groups of pixels that look similar (image
segmentation)
• Step 2:
Generate a data set of perturbed instances by turning some of
the superpixels “off” (gray mask)
• Step 3:
Get the model’s prediction – here the probability of it being a tree frog
– per pertubed instance
• Step 4:
Learn a simple model on this data set and present the
superpixels with highest positive weights as an explanation,
graying out everything else.
Pool table ballon
LIME - Local Interpretable Model agnostic Explanations
• LIME can be applied to any model.
• It answers which datapoints (superpixel) caused the prediction.
• Provides a local interpretability / explanation – i.e., disturb the input
samples and use a simple model to understand how predictions change

• Cons:
• Explains only simple linear relations
• Often simple perturbations are
not enough!

Understanding model predictions with LIME | by Lars Hulstaert | Towards Data Science
SHAP values - SHapley Additive exPlanations
• Based on Shapley values (Game Theory), where:
• The game = reproducing a single prediction/outcome of the model
• The players = features included in the model
• SHAP values quantify the contribution each player to a single game.

• Requires training many models (e.g., 2^F models, with 50 features


1,125,899,906,842,624 models).
• Solution: approximate and sample per feature (implementation)

https://towardsdatascience.com/shap-explained-the-way-i-wish-someone-explained-it-to-me-ab81cc69ef30

You might also like