0% found this document useful (0 votes)

27 views

COMP9491 Week2 Deep - Learning 1

The document discusses various deep learning models for image classification including ResNet, ResNeXt and EfficientNet. It also covers vision-language models such as image captioning, VQA and generative adversarial networks including Pix2Pix, CycleGAN and StyleGAN.

Uploaded by

ryj740447138rj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

COMP9491 Week2 Deep - Learning 1

Uploaded by

ryj740447138rj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Deep Learning (1)

COMP9491 Applied AI
Term 2, 2023
Outline

▪ Image classification models

▪ Vision-language studies

▪ Generative models

▪ Semi-supervised learning

COMP9491 T2, 2023 1

Image Classification Models

▪ Problems with deep learning models

▪ The degradation problem

COMP9491 T2, 2023 2

Image Classification Models

▪ ResNet (Deep residual learning for image recognition, CVPR’16)

▪ The hypothesis: it is easier to optimise the residual mapping
than to optimise the original, unreferenced mapping

COMP9491 T2, 2023 3

Image Classification Models

▪ ResNet (Deep residual learning for image recognition, CVPR’16)

COMP9491 T2, 2023 4

Image Classification Models

▪ ResNet (Deep residual learning for image recognition, CVPR’16)

COMP9491 T2, 2023 5

Image Classification Models

▪ ResNeXt (Aggregated residual transformations for deep neural

networks, CVPR’17)

COMP9491 T2, 2023 6

Image Classification Models

▪ ResNeXt (Aggregated residual transformations for deep neural

networks, CVPR’17)

COMP9491 T2, 2023 7

Image Classification Models

▪ EfficientNet: Rethinking model scaling for convolutional neural

networks (ICML’19)

COMP9491 T2, 2023 8

Image Classification Models

▪ EfficientNet:

COMP9491 T2, 2023 9

Image Classification Models

▪ Problem in real-life applications: data imbalance

COMP9491 T2, 2023 10

Image Classification Models

▪ To address data imbalance:

▪ Data distribution re-balancing (over-sampling for the minority
class, under-sampling for the majority class)
▪ Class-balanced loss (re-weighting, focal loss)
▪ Data synthesis (autoencoder, GAN)

COMP9491 T2, 2023 11

Image Classification Models

▪ Remix: Rebalanced Mixup (ECCV’20)

▪ Key idea: generate extra training data by mixing samples and
assign labels in favour of the minority class

COMP9491 T2, 2023 12

Image Classification Models

▪ Remix: Rebalanced Mixup (ECCV’20)

▪ Mixup:

▪ Remix:

COMP9491 T2, 2023 13

Image Classification Models

▪ Remix: Rebalanced Mixup (ECCV’20)

COMP9491 T2, 2023 14

Vision-language Studies

▪ Image captioning: Exploring visual relationship for image

captioning (ECCV’18)

COMP9491 T2, 2023 15

Vision-language Studies

▪ Image-text retrieval: Context-aware attention network for

image-text retrieval (CVPR’20)

COMP9491 T2, 2023 16

Vision-language Studies

▪ VQA: Making the V in VQA matter: Elevating the role of image

understanding in visual question answering (CVPR’17)

COMP9491 T2, 2023 17

Vision-language Studies

▪ VQA: GQA: A new dataset for real-world visual reasoning and

compositional question answering (CVPR’19)

COMP9491 T2, 2023 18

Vision-language Studies

▪ VQA: OK-VQA: A visual question answering benchmark requiring

external knowledge (CVPR’19)

COMP9491 T2, 2023 19

Vision-language Studies

▪ Visual reasoning: OSCAR: Object-semantics aligned pre-

training for vision-language tasks (ECCV’20)

COMP9491 T2, 2023 20

Vision-language Studies

▪ Image generation: DALL-E 2 (OpenAI)

▪ First the CLIP text encoder maps the image description into the
representation space
▪ Then the diffusion prior maps from the CLIP text encoding to a
corresponding CLIP image encoding
▪ Finally, the modified-GLIDE generation model maps from the
representation space into the image space via reverse-Diffusion

Source: https://www.assemblyai.com/blog/how-dall-e-2-actually-works/

COMP9491 T2, 2023 21

Vision-language Studies

▪ Image generation: StyleCLIP: Text-driven manipulation of

StyleGAN Imagery (ICCV’21)

COMP9491 T2, 2023 22

Vision-language Studies

▪ General models: Learning transferable visual models from

natural language supervision (ICML 2021)

CLIP (Contrastive Language-Image Pre-training)

COMP9491 T2, 2023 23

Vision-language Studies

▪ General models: CoCa: Contrastive captioners are image-text

foundation models (TMLR 2022)

COMP9491 T2, 2023 24

Vision-language Studies

▪ General models: Image as a foreign language: BEiT pretraining

for vision and vision-language tasks (CVPR’23)

COMP9491 T2, 2023 25

Generative Models

▪ GAN (NeurIPS’14)

https://developers.google.com/machine-learning/gan/gan_structure

COMP9491 T2, 2023 26

Generative Models

▪ GAN (NeurIPS’14)

Minimax loss:

https://developers.google.com/machine-learning/gan/gan_structure

COMP9491 T2, 2023 27

Generative Models

▪ GAN (NeurIPS’14)

COMP9491 T2, 2023 28

Generative Models

▪ Conditional GAN (cGAN)

▪ Extending GAN to a conditional by conditioning G and D on
some data y (e.g., class label)

COMP9491 T2, 2023 29

Generative Models

▪ Conditional GAN (cGAN)

COMP9491 T2, 2023 30

Generative Models

▪ Pix2pix (Image-to-image translation with conditional adversarial

networks, CVPR’17)

COMP9491 T2, 2023 31

Generative Models

▪ Pix2pix (Image-to-image translation with conditional adversarial

networks, CVPR’17)
▪ Improvement over cGAN:
▪ Additional L1 loss

▪ U-Net like generator

▪ PatchGAN for discriminator

COMP9491 T2, 2023 32

Generative Models

▪ Pix2pix (Image-to-image translation with conditional adversarial

networks, CVPR’17)

COMP9491 T2, 2023 33

Generative Models

▪ CycleGAN (Unpaired image-to-image translation using cycle-

consistent adversarial networks, ICCV’17)

Designed for image-to-image

translation when the desired
output is not available for
training

COMP9491 T2, 2023 34

Generative Models

▪ CycleGAN (Unpaired image-to-image translation using cycle-

consistent adversarial networks, ICCV’17)

COMP9491 T2, 2023 35

Generative Models

▪ CycleGAN

COMP9491 T2, 2023 36

Generative Models

▪ StyleGAN (A style-based generator architecture for generative

adversarial networks, CVPR’19)

Automatic, unsupervised separation

of high-level attributes (e.g., pose,
identity) from stochastic variation
(e.g., freckles, hair) in the generated
images, enabling intuitive scale-
specific mixing and interpolation
operations

COMP9491 T2, 2023 37

Generative Models

▪ StyleGAN

COMP9491 T2, 2023 38

Generative Models

▪ Data augmentation
▪ Data augmentation using generative adversarial networks
(CycleGAN) to improve generalizability in CT segmentation
tasks (Scientific Reports, 2019)

COMP9491 T2, 2023 39

Generative Models

▪ Image super-resolution
▪ Photo-realistic single image super-resolution using a
generative adversarial network (CVPR’17)

COMP9491 T2, 2023 40

Generative Models

▪ Image completion
▪ Wide-context semantic image extrapolation (CVPR’19)

COMP9491 T2, 2023 41

Generative Models

▪ Language generation
▪ Adversarial ranking for language generation (NeurIPS’17)

COMP9491 T2, 2023 42

Generative Models

▪ Speech synthesis
▪ High fidelity speech synthesis with adversarial networks (ICLR’20)

COMP9491 T2, 2023 43

Generative Models

▪ Speech enhancement
▪ Exploring speech enhancement with generative adversarial
networks for robust speech recognition (ICASSP’18)

COMP9491 T2, 2023 44

Generative Models

▪ Diffusion Models
▪ Deep unsupervised learning using nonequilibrium thermodynamics
(ICML’15)
▪ Two stages:
▪ Forward diffusion slowly destroys structure in a data distribution by
adding Gaussian noise iteratively
▪ Reverse diffusion gradually reconstructs or denoises the images back to
the original using deep learning

https://developer.nvidia.
com/blog/improvin
g-diffusion-
models-as-an-
alternative-to-
gans-part-1/

COMP9491 T2, 2023 45

Generative Models

▪ Diffusion Models – DDPM

▪ Denoising diffusion probabilistic models (NeurIPS’20)
▪ The most well-known diffusion model to generate high-quality
images
▪ Reverse diffusion is trained similar to a variation autoencoder
▪ A U-Net like architecture is used as the network model

COMP9491 T2, 2023 46

Generative Models

▪ Diffusion Models – DALL-E 2 (unCLIP)

▪ Hierarchical text-conditional image generation with CLIP latents

COMP9491 T2, 2023 47

Generative Models

▪ Diffusion Models – Anomaly detection

▪ Diffusion models for medical anomaly detection (MICCAI’22)
▪ Incorporates classifier guidance for image generation

COMP9491 T2, 2023 48

Generative Models

▪ Diffusion Models – Speech synthesis

▪ FastDiff: A fast conditional diffusion model for high-quality speech
synthesis (IJCAI’22)

COMP9491 T2, 2023 49

Generative Models

▪ Diffusion Models – Text generation

▪ DiffuSeq: Sequence to sequence text generation with diffusion
models (ICLR’23)
▪ Tested on open domain dialogue, question generation, text
simplification and paraphrase tasks, showing better results than
GPT2 and T5

COMP9491 T2, 2023 50

Generative Models

▪ Variational autoencoders – VAE

▪ Introduce regularisation in the latent space to avoid overfitting
▪ Instead of encoding an input as a single point, it is encoded as a
distribution over the latent space
▪ Gaussian distribution is used, represented by its mean and
covariance, regularised with KL divergence

Source: https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73

COMP9491 T2, 2023 51

Generative Models

▪ Variational autoencoders – VQ-VAE

▪ Generating diverse high-fidelity images with VQ-VAE-2 (NeurIPS’19)

COMP9491 T2, 2023 52

Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders

▪ The generative learning trilemma

Source: Tackling the generative learning trilemma with denoising diffusion GANs, ICLR 2022.

COMP9491 T2, 2023 53

Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders

▪ Taming transformers for high-resolution image synthesis
(CVPR’21) – VQGAN

COMP9491 T2, 2023 54

Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders

▪ High-resolution image synthesis with latent diffusion models
(CVPR’22) – Stable diffusion

COMP9491 T2, 2023 55

Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders

▪ Tackling the generative learning trilemma with denoising diffusion
GANs (ICLR’22)

COMP9491 T2, 2023 56

Semi-supervised Learning

▪ Problem definition
▪ Incorporate additional unlabeled training data to train the
supervised learning model
▪ Advantage: annotate only a
small subset of training data
while maintaining the model
performance

Source: Not-so-supervised: A survey of semi-supervised,

multi-instance, and transfer learning in medical
image analysis. Medical Image Analysis, 2019.

COMP9491 T2, 2023 57

Semi-supervised Learning

▪ Data synthesis
▪ Generate additional data with pseudo ground truth labels, and
include these data into the training
▪ Mixup
▪ Data augmentation using GAN

COMP9491 T2, 2023 58

Semi-supervised Learning

▪ Adversarial learning
▪ A typical approach: Improved techniques for training GANs
(NeurIPS’16)
▪ Main ideas:
▪ For labelled real data, the discriminator classifies their label
▪ For unlabelled real data and generated data, they are trained
with the adversarial loss only

COMP9491 T2, 2023 59

Semi-supervised Learning

▪ Adversarial learning
▪ Deep adversarial networks for biomedical image segmentation
utilizing unannotated images (MICCAI’17)

COMP9491 T2, 2023 60

Semi-supervised Learning

▪ Graph regularization
▪ Label propagation for deep semi-supervised learning (CVPR’19)

Construction of Nearest Neighbour Graph + Label Propagation

=> Pseudo labels for unlabelled data

COMP9491 T2, 2023 61

Semi-supervised Learning

▪ Graph regularization
▪ Label propagation for deep semi-supervised learning (CVPR’19)

COMP9491 T2, 2023 62

Semi-supervised Learning

▪ Graph regularization
▪ Label propagation for deep semi-supervised learning (CVPR’19)

COMP9491 T2, 2023 63

Semi-supervised Learning

▪ Self-ensembling
▪ Uncertainty-aware self-ensembling model for semi-supervised
3D left atrium segmentation (MICCAI’19)

COMP9491 T2, 2023 64

Other Learning Techniques

▪ Unsupervised learning
▪ Transfer learning
▪ Weakly supervised learning
▪ Self-supervised learning
▪ Few/zero shot learning
▪ Meta learning
▪ Active learning
▪ Continual learning
▪ Federated learning
▪ …

COMP9491 T2, 2023 65

SMA 2231 Probability and Statistics III
100% (1)
SMA 2231 Probability and Statistics III
89 pages
COMP9491 Week1 Projects
No ratings yet
COMP9491 Week1 Projects
35 pages
DLCV CH0 Syllabus v2
No ratings yet
DLCV CH0 Syllabus v2
16 pages
AAI Theory Syllabus
No ratings yet
AAI Theory Syllabus
3 pages
genaitable
No ratings yet
genaitable
3 pages
DAAI - Lecture - 15 - 23nov22
No ratings yet
DAAI - Lecture - 15 - 23nov22
113 pages
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
No ratings yet
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
15 pages
GenAI-Unit1-3
No ratings yet
GenAI-Unit1-3
31 pages
AL3502DEEP LEARNING FOR VISIONL T P C
No ratings yet
AL3502DEEP LEARNING FOR VISIONL T P C
3 pages
AI
No ratings yet
AI
11 pages
mohamed-nassar-resume
No ratings yet
mohamed-nassar-resume
6 pages
Video GPT
No ratings yet
Video GPT
14 pages
CO_DL
No ratings yet
CO_DL
3 pages
Generative AI notes (1)
No ratings yet
Generative AI notes (1)
3 pages
Gen AI
No ratings yet
Gen AI
8 pages
Lecture-01_Introductory
No ratings yet
Lecture-01_Introductory
29 pages
Deep Learning Important Studies
No ratings yet
Deep Learning Important Studies
6 pages
Syllabus
No ratings yet
Syllabus
5 pages
R20A6610 DL Syllabus
No ratings yet
R20A6610 DL Syllabus
2 pages
Types of AI Models and Their Uses-PDF-Format
No ratings yet
Types of AI Models and Their Uses-PDF-Format
14 pages
nlfynx7RfS0IZ9YGOtls_Some core concepts
No ratings yet
nlfynx7RfS0IZ9YGOtls_Some core concepts
6 pages
For a Change
No ratings yet
For a Change
10 pages
DL-Unit-5
No ratings yet
DL-Unit-5
2 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
98 pages
Module1_L1_L2
No ratings yet
Module1_L1_L2
35 pages
Lec 1 Intro
No ratings yet
Lec 1 Intro
54 pages
Class Generative Models.pptx
No ratings yet
Class Generative Models.pptx
54 pages
Gandia
No ratings yet
Gandia
71 pages
Cs3027 Deep Learning Syllabus
No ratings yet
Cs3027 Deep Learning Syllabus
2 pages
GenAI 20 Weeks Roadmap
No ratings yet
GenAI 20 Weeks Roadmap
2 pages
Assignment Class Notes
No ratings yet
Assignment Class Notes
8 pages
COMP9491 Week1 Background
No ratings yet
COMP9491 Week1 Background
21 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
DATA VISUALIZATION TECHNIQUE QB FINAL
No ratings yet
DATA VISUALIZATION TECHNIQUE QB FINAL
5 pages
CN2
No ratings yet
CN2
4 pages
Important Deep Learning Architectures
No ratings yet
Important Deep Learning Architectures
12 pages
Top Deep Learning Models for Research and Real- World Applications
No ratings yet
Top Deep Learning Models for Research and Real- World Applications
4 pages
resume (1)
No ratings yet
resume (1)
2 pages
Deep Learning Case Study
No ratings yet
Deep Learning Case Study
7 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
CampusX (D.L) Course Syllabus
No ratings yet
CampusX (D.L) Course Syllabus
5 pages
Model Usage
No ratings yet
Model Usage
9 pages
The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
Curriculum Vitae PDF
No ratings yet
Curriculum Vitae PDF
2 pages
NLP and Generative AI Syllabus - 2025
No ratings yet
NLP and Generative AI Syllabus - 2025
5 pages
CD-601_assignmentquestions.docx
No ratings yet
CD-601_assignmentquestions.docx
2 pages
IEEE Xplore Reference Download 2024.9.24.8.32.25
No ratings yet
IEEE Xplore Reference Download 2024.9.24.8.32.25
2 pages
00779778a72413121603 (1)
No ratings yet
00779778a72413121603 (1)
42 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
Unsupervised Deep Learning
No ratings yet
Unsupervised Deep Learning
11 pages
Lec 0
No ratings yet
Lec 0
24 pages
NewSyllabus_1157202352913185 (5)
No ratings yet
NewSyllabus_1157202352913185 (5)
7 pages
Syl6 ML
No ratings yet
Syl6 ML
3 pages
Report
No ratings yet
Report
35 pages
Unit 4 (Adl)
No ratings yet
Unit 4 (Adl)
18 pages
Deep Learning NLP and Computer Vision
No ratings yet
Deep Learning NLP and Computer Vision
9 pages
Production - Derieux - Cedric - Advances in Automatic Image Restoration and Upscaling
No ratings yet
Production - Derieux - Cedric - Advances in Automatic Image Restoration and Upscaling
4 pages
Gen AI Notes Part 1
No ratings yet
Gen AI Notes Part 1
15 pages
Harsha Thesis
No ratings yet
Harsha Thesis
62 pages
Polygon Computer Graphics: Exploring the Intersection of Polygon Computer Graphics and Computer Vision
From Everand
Polygon Computer Graphics: Exploring the Intersection of Polygon Computer Graphics and Computer Vision
Fouad Sabry
No ratings yet
Triangular Distribution: 1 Special Cases
No ratings yet
Triangular Distribution: 1 Special Cases
3 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
88 pages
How to Build LLMs From Scratch
No ratings yet
How to Build LLMs From Scratch
7 pages
Practice Question ECS - 401
No ratings yet
Practice Question ECS - 401
5 pages
Gold ETF Price Forecast Milestone Report
No ratings yet
Gold ETF Price Forecast Milestone Report
23 pages
Finite Automata - Recognizer For "Regular Languages" - Deterministic Finite Automata (DFA) - Non-Deterministic Finite Automata (NFA)
No ratings yet
Finite Automata - Recognizer For "Regular Languages" - Deterministic Finite Automata (DFA) - Non-Deterministic Finite Automata (NFA)
43 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
UML Diagrams For Railway Reservation
50% (2)
UML Diagrams For Railway Reservation
7 pages
Image Generative Models
No ratings yet
Image Generative Models
2 pages
Neural Networks: Learning: Cost Function
No ratings yet
Neural Networks: Learning: Cost Function
33 pages
Supervised Learning Networks: Perceptron Networks Back Propagation Networks
No ratings yet
Supervised Learning Networks: Perceptron Networks Back Propagation Networks
22 pages
ARIS UML Designer Introduction
No ratings yet
ARIS UML Designer Introduction
212 pages
Lesson 1 - Course - Introduction
No ratings yet
Lesson 1 - Course - Introduction
9 pages
Key Concepts On Deep Neural Networks
No ratings yet
Key Concepts On Deep Neural Networks
8 pages
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
No ratings yet
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
2 pages
Binomial distribution
No ratings yet
Binomial distribution
23 pages
ML Lecture#3
No ratings yet
ML Lecture#3
37 pages
DL Unit - 4
No ratings yet
DL Unit - 4
14 pages
Boltz321 PDF
No ratings yet
Boltz321 PDF
7 pages
1.017/1.010 Class 11 Multivariate Probability: Multiple Random Variables
No ratings yet
1.017/1.010 Class 11 Multivariate Probability: Multiple Random Variables
3 pages
CV w6 - Deep Learning
No ratings yet
CV w6 - Deep Learning
86 pages
Chapter 4
No ratings yet
Chapter 4
29 pages
CFG To PDF
No ratings yet
CFG To PDF
15 pages
13 Useful Deep Learning Interview Questions and Answer
No ratings yet
13 Useful Deep Learning Interview Questions and Answer
6 pages
Object-Oriented and Classical Software Engineering: Stephen R. Schach
No ratings yet
Object-Oriented and Classical Software Engineering: Stephen R. Schach
75 pages
Role of Machine Learning in MIS
No ratings yet
Role of Machine Learning in MIS
4 pages
Cerebellar Model Articulation Controller
No ratings yet
Cerebellar Model Articulation Controller
4 pages
Object Constraint Language PPT by MHM
No ratings yet
Object Constraint Language PPT by MHM
15 pages
HW 1
No ratings yet
HW 1
4 pages