0% found this document useful (0 votes)

2 views

CNNs and Transformers

The document discusses Convolutional Neural Networks (CNNs) and Transformers, highlighting their architectures and applications in tasks like biomedical image segmentation and natural language processing. It explains the significance of self-attention mechanisms in Transformers, including key-value pairs and multi-headed attention, while also addressing the challenges of applying Transformers to vision tasks. Additionally, it covers the concept of autoregressive models and the importance of sampling in generating outputs from these models.

Uploaded by

CelesteCebedio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

CNNs and Transformers

Uploaded by

CelesteCebedio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Convolution Neural Networks

96x96x64
96x96x32

48x48x64 48x48x128

24x24x128 24x24x256
48x48x64

12x12x256 12x12x512
24x24x128

12x12x256

6x6x256 6x6x256

Ronneberger, et al. "U-Net: Convolutional networks for biomedical image segmentation" MICCAI, 2015
96x96x4
We get four channels for each pixel:
➢ Left ventricle: light blue area
➢ Myocardium: pink area
➢ Right ventricle: red area
➢ Background: blue area
96x96x32
Next convert these logits into probability distribution with
the Softmax activation function

Softmax is given as follows:

Denominator ensures sum of all outputs is 1, making it a valid probability distribution

96x96x4

Example Calculation:

➢ Given raw logits:

96x96x32 ➢ After using the Softmax activation:

The first class (logit = 3.0) has

the highest probability (75.56%)

➢ Final Softmax Output:

We need a loss to compute distance between prediction dist. and gt labels: we use
cross entropy defined below: Measures the distance between two distributions
Introduction to the
Transformer
Thanks to Alexander Krull, Constantin Pape and Alex Ecker!
• Sequence-to-Sequence Architecture

• Hugely influential
• Basis for ChatGPT (Generative Pre-trained Transformer)
• Also for vision
The Transformer
Decoder

The Transformer
Encoder

Latent Code
• Process set of tokens

The Encoder • Tokens remain separate

• (except for attention layer)
• Tokens don’t have order
• (except for positional encoding)

Latent Code

𝑥1 𝑥2 𝑥3
• Process set of tokens

The Encoder • Tokens remain separate

• (except for attention layer)
• Tokens don’t have order
• (except for positional encoding)

Latent Code

Attention
𝑥1 𝑥2 𝑥3
Attention
https://jalammar.github.io/illustrated-transformer/
https://arxiv.org/abs/1706.03762

Self-Attention
Self-attention:
● every element in sequence can
influence every other element
● learn weighting (“attention”)
for each pair of elements

Attention: learnable pairwise

weighting that depends on other
sequence

Self-Attention: use input sequence for

attention
Key-Value Pairs

JSON-Files: Python Dictionary:

Key-Value Pairs

Query Keys Values

Date of birth Name Jane Doe
Address 37 Coronation street
B12 9TK
Date of birth May 5th 2000
Place of birth Hull
Key-Value Pairs

Query Keys Values Result

Date of birth Name Jane Doe May 5th 2000
Address 37 Coronation street
B12 9TK
Date of birth May 5th 2000
Place of birth Hull
𝑥1 𝑥2 𝑥3 𝑥4
Idea:
• Every token makes:
• Key
• Value
𝑊𝑄 𝑊𝐾 𝑊𝑉 • Query
𝑥1 𝑥2 𝑥3 𝑥4
Idea:
• Every token makes:
• Key
• Value
𝑊𝑄 𝑊𝐾 𝑊𝑉 • Query
𝑥1 𝑥2 𝑥3 𝑥4
Idea:
• Every token makes:
• Key
• Value
𝑊𝑄 𝑊𝐾 𝑊𝑉 • Query
𝑥1 𝑥2 𝑥3 𝑥4
Idea:
• Every token makes:
• Key
• Value
𝑊𝑄 𝑊𝐾 𝑊𝑉 • Query
𝑋
Idea:
• Every token makes:
• Key
• Value
𝑊𝑄 𝑊𝐾 𝑊𝑉 • Query
Query

0
1
0

𝑛
𝑧 = ෍ 1 𝒒 = 𝒌𝑗 𝒗𝑗
𝑗=1
Relaxed Query

.1
.7
.1

𝑛
𝒛 = ෍ score𝑗 𝒗𝑗 score𝑗 = softmax similarity(𝑞, 𝒌𝑗 )
𝑗=1
𝒒𝑻 𝒌𝑗
𝒅𝒌
https://jalammar.github.io/illustrated-transformer/
https://arxiv.org/abs/1706.03762