0% found this document useful (0 votes)
27 views

COMP9491 Week2 Deep - Learning 1

The document discusses various deep learning models for image classification including ResNet, ResNeXt and EfficientNet. It also covers vision-language models such as image captioning, VQA and generative adversarial networks including Pix2Pix, CycleGAN and StyleGAN.

Uploaded by

ryj740447138rj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

COMP9491 Week2 Deep - Learning 1

The document discusses various deep learning models for image classification including ResNet, ResNeXt and EfficientNet. It also covers vision-language models such as image captioning, VQA and generative adversarial networks including Pix2Pix, CycleGAN and StyleGAN.

Uploaded by

ryj740447138rj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Deep Learning (1)

COMP9491 Applied AI
Term 2, 2023
Outline

▪ Image classification models

▪ Vision-language studies

▪ Generative models

▪ Semi-supervised learning

COMP9491 T2, 2023 1


Image Classification Models

▪ Problems with deep learning models


▪ The degradation problem

COMP9491 T2, 2023 2


Image Classification Models

▪ ResNet (Deep residual learning for image recognition, CVPR’16)


▪ The hypothesis: it is easier to optimise the residual mapping
than to optimise the original, unreferenced mapping

COMP9491 T2, 2023 3


Image Classification Models

▪ ResNet (Deep residual learning for image recognition, CVPR’16)

COMP9491 T2, 2023 4


Image Classification Models

▪ ResNet (Deep residual learning for image recognition, CVPR’16)

COMP9491 T2, 2023 5


Image Classification Models

▪ ResNeXt (Aggregated residual transformations for deep neural


networks, CVPR’17)

COMP9491 T2, 2023 6


Image Classification Models

▪ ResNeXt (Aggregated residual transformations for deep neural


networks, CVPR’17)

COMP9491 T2, 2023 7


Image Classification Models

▪ EfficientNet: Rethinking model scaling for convolutional neural


networks (ICML’19)

COMP9491 T2, 2023 8


Image Classification Models

▪ EfficientNet:

COMP9491 T2, 2023 9


Image Classification Models

▪ Problem in real-life applications: data imbalance

COMP9491 T2, 2023 10


Image Classification Models

▪ To address data imbalance:


▪ Data distribution re-balancing (over-sampling for the minority
class, under-sampling for the majority class)
▪ Class-balanced loss (re-weighting, focal loss)
▪ Data synthesis (autoencoder, GAN)

COMP9491 T2, 2023 11


Image Classification Models

▪ Remix: Rebalanced Mixup (ECCV’20)


▪ Key idea: generate extra training data by mixing samples and
assign labels in favour of the minority class

COMP9491 T2, 2023 12


Image Classification Models

▪ Remix: Rebalanced Mixup (ECCV’20)


▪ Mixup:

▪ Remix:

COMP9491 T2, 2023 13


Image Classification Models

▪ Remix: Rebalanced Mixup (ECCV’20)

COMP9491 T2, 2023 14


Vision-language Studies

▪ Image captioning: Exploring visual relationship for image


captioning (ECCV’18)

COMP9491 T2, 2023 15


Vision-language Studies

▪ Image-text retrieval: Context-aware attention network for


image-text retrieval (CVPR’20)

COMP9491 T2, 2023 16


Vision-language Studies

▪ VQA: Making the V in VQA matter: Elevating the role of image


understanding in visual question answering (CVPR’17)

COMP9491 T2, 2023 17


Vision-language Studies

▪ VQA: GQA: A new dataset for real-world visual reasoning and


compositional question answering (CVPR’19)

COMP9491 T2, 2023 18


Vision-language Studies

▪ VQA: OK-VQA: A visual question answering benchmark requiring


external knowledge (CVPR’19)

COMP9491 T2, 2023 19


Vision-language Studies

▪ Visual reasoning: OSCAR: Object-semantics aligned pre-


training for vision-language tasks (ECCV’20)

COMP9491 T2, 2023 20


Vision-language Studies

▪ Image generation: DALL-E 2 (OpenAI)


▪ First the CLIP text encoder maps the image description into the
representation space
▪ Then the diffusion prior maps from the CLIP text encoding to a
corresponding CLIP image encoding
▪ Finally, the modified-GLIDE generation model maps from the
representation space into the image space via reverse-Diffusion

Source: https://www.assemblyai.com/blog/how-dall-e-2-actually-works/

COMP9491 T2, 2023 21


Vision-language Studies

▪ Image generation: StyleCLIP: Text-driven manipulation of


StyleGAN Imagery (ICCV’21)

COMP9491 T2, 2023 22


Vision-language Studies

▪ General models: Learning transferable visual models from


natural language supervision (ICML 2021)

CLIP (Contrastive Language-Image Pre-training)

COMP9491 T2, 2023 23


Vision-language Studies

▪ General models: CoCa: Contrastive captioners are image-text


foundation models (TMLR 2022)

COMP9491 T2, 2023 24


Vision-language Studies

▪ General models: Image as a foreign language: BEiT pretraining


for vision and vision-language tasks (CVPR’23)

COMP9491 T2, 2023 25


Generative Models

▪ GAN (NeurIPS’14)

https://developers.google.com/machine-learning/gan/gan_structure

COMP9491 T2, 2023 26


Generative Models

▪ GAN (NeurIPS’14)

Minimax loss:

https://developers.google.com/machine-learning/gan/gan_structure

COMP9491 T2, 2023 27


Generative Models

▪ GAN (NeurIPS’14)

COMP9491 T2, 2023 28


Generative Models

▪ Conditional GAN (cGAN)


▪ Extending GAN to a conditional by conditioning G and D on
some data y (e.g., class label)

COMP9491 T2, 2023 29


Generative Models

▪ Conditional GAN (cGAN)

COMP9491 T2, 2023 30


Generative Models

▪ Pix2pix (Image-to-image translation with conditional adversarial


networks, CVPR’17)

COMP9491 T2, 2023 31


Generative Models

▪ Pix2pix (Image-to-image translation with conditional adversarial


networks, CVPR’17)
▪ Improvement over cGAN:
▪ Additional L1 loss

▪ U-Net like generator


▪ PatchGAN for discriminator

COMP9491 T2, 2023 32


Generative Models

▪ Pix2pix (Image-to-image translation with conditional adversarial


networks, CVPR’17)

COMP9491 T2, 2023 33


Generative Models

▪ CycleGAN (Unpaired image-to-image translation using cycle-


consistent adversarial networks, ICCV’17)

Designed for image-to-image


translation when the desired
output is not available for
training

COMP9491 T2, 2023 34


Generative Models

▪ CycleGAN (Unpaired image-to-image translation using cycle-


consistent adversarial networks, ICCV’17)

COMP9491 T2, 2023 35


Generative Models

▪ CycleGAN

COMP9491 T2, 2023 36


Generative Models

▪ StyleGAN (A style-based generator architecture for generative


adversarial networks, CVPR’19)

Automatic, unsupervised separation


of high-level attributes (e.g., pose,
identity) from stochastic variation
(e.g., freckles, hair) in the generated
images, enabling intuitive scale-
specific mixing and interpolation
operations

COMP9491 T2, 2023 37


Generative Models

▪ StyleGAN

COMP9491 T2, 2023 38


Generative Models

▪ Data augmentation
▪ Data augmentation using generative adversarial networks
(CycleGAN) to improve generalizability in CT segmentation
tasks (Scientific Reports, 2019)

COMP9491 T2, 2023 39


Generative Models

▪ Image super-resolution
▪ Photo-realistic single image super-resolution using a
generative adversarial network (CVPR’17)

COMP9491 T2, 2023 40


Generative Models

▪ Image completion
▪ Wide-context semantic image extrapolation (CVPR’19)

COMP9491 T2, 2023 41


Generative Models

▪ Language generation
▪ Adversarial ranking for language generation (NeurIPS’17)

COMP9491 T2, 2023 42


Generative Models

▪ Speech synthesis
▪ High fidelity speech synthesis with adversarial networks (ICLR’20)

COMP9491 T2, 2023 43


Generative Models

▪ Speech enhancement
▪ Exploring speech enhancement with generative adversarial
networks for robust speech recognition (ICASSP’18)

COMP9491 T2, 2023 44


Generative Models

▪ Diffusion Models
▪ Deep unsupervised learning using nonequilibrium thermodynamics
(ICML’15)
▪ Two stages:
▪ Forward diffusion slowly destroys structure in a data distribution by
adding Gaussian noise iteratively
▪ Reverse diffusion gradually reconstructs or denoises the images back to
the original using deep learning

https://developer.nvidia.
com/blog/improvin
g-diffusion-
models-as-an-
alternative-to-
gans-part-1/

COMP9491 T2, 2023 45


Generative Models

▪ Diffusion Models – DDPM


▪ Denoising diffusion probabilistic models (NeurIPS’20)
▪ The most well-known diffusion model to generate high-quality
images
▪ Reverse diffusion is trained similar to a variation autoencoder
▪ A U-Net like architecture is used as the network model

COMP9491 T2, 2023 46


Generative Models

▪ Diffusion Models – DALL-E 2 (unCLIP)


▪ Hierarchical text-conditional image generation with CLIP latents

COMP9491 T2, 2023 47


Generative Models

▪ Diffusion Models – Anomaly detection


▪ Diffusion models for medical anomaly detection (MICCAI’22)
▪ Incorporates classifier guidance for image generation

COMP9491 T2, 2023 48


Generative Models

▪ Diffusion Models – Speech synthesis


▪ FastDiff: A fast conditional diffusion model for high-quality speech
synthesis (IJCAI’22)

COMP9491 T2, 2023 49


Generative Models

▪ Diffusion Models – Text generation


▪ DiffuSeq: Sequence to sequence text generation with diffusion
models (ICLR’23)
▪ Tested on open domain dialogue, question generation, text
simplification and paraphrase tasks, showing better results than
GPT2 and T5

COMP9491 T2, 2023 50


Generative Models

▪ Variational autoencoders – VAE


▪ Introduce regularisation in the latent space to avoid overfitting
▪ Instead of encoding an input as a single point, it is encoded as a
distribution over the latent space
▪ Gaussian distribution is used, represented by its mean and
covariance, regularised with KL divergence

Source: https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73

COMP9491 T2, 2023 51


Generative Models

▪ Variational autoencoders – VQ-VAE


▪ Generating diverse high-fidelity images with VQ-VAE-2 (NeurIPS’19)

COMP9491 T2, 2023 52


Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders


▪ The generative learning trilemma

Source: Tackling the generative learning trilemma with denoising diffusion GANs, ICLR 2022.

COMP9491 T2, 2023 53


Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders


▪ Taming transformers for high-resolution image synthesis
(CVPR’21) – VQGAN

COMP9491 T2, 2023 54


Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders


▪ High-resolution image synthesis with latent diffusion models
(CVPR’22) – Stable diffusion

COMP9491 T2, 2023 55


Generative Models

▪ GANs, Diffusion Models, Variational Autoencoders


▪ Tackling the generative learning trilemma with denoising diffusion
GANs (ICLR’22)

COMP9491 T2, 2023 56


Semi-supervised Learning

▪ Problem definition
▪ Incorporate additional unlabeled training data to train the
supervised learning model
▪ Advantage: annotate only a
small subset of training data
while maintaining the model
performance

Source: Not-so-supervised: A survey of semi-supervised,


multi-instance, and transfer learning in medical
image analysis. Medical Image Analysis, 2019.

COMP9491 T2, 2023 57


Semi-supervised Learning

▪ Data synthesis
▪ Generate additional data with pseudo ground truth labels, and
include these data into the training
▪ Mixup
▪ Data augmentation using GAN

COMP9491 T2, 2023 58


Semi-supervised Learning

▪ Adversarial learning
▪ A typical approach: Improved techniques for training GANs
(NeurIPS’16)
▪ Main ideas:
▪ For labelled real data, the discriminator classifies their label
▪ For unlabelled real data and generated data, they are trained
with the adversarial loss only

COMP9491 T2, 2023 59


Semi-supervised Learning

▪ Adversarial learning
▪ Deep adversarial networks for biomedical image segmentation
utilizing unannotated images (MICCAI’17)

COMP9491 T2, 2023 60


Semi-supervised Learning

▪ Graph regularization
▪ Label propagation for deep semi-supervised learning (CVPR’19)

Construction of Nearest Neighbour Graph + Label Propagation


=> Pseudo labels for unlabelled data

COMP9491 T2, 2023 61


Semi-supervised Learning

▪ Graph regularization
▪ Label propagation for deep semi-supervised learning (CVPR’19)

COMP9491 T2, 2023 62


Semi-supervised Learning

▪ Graph regularization
▪ Label propagation for deep semi-supervised learning (CVPR’19)

COMP9491 T2, 2023 63


Semi-supervised Learning

▪ Self-ensembling
▪ Uncertainty-aware self-ensembling model for semi-supervised
3D left atrium segmentation (MICCAI’19)

COMP9491 T2, 2023 64


Other Learning Techniques

▪ Unsupervised learning
▪ Transfer learning
▪ Weakly supervised learning
▪ Self-supervised learning
▪ Few/zero shot learning
▪ Meta learning
▪ Active learning
▪ Continual learning
▪ Federated learning
▪ …

COMP9491 T2, 2023 65

You might also like