2310.09760

This paper presents a novel approach called Image Augmentation with Controlled Diffusion (IACD) for weakly-supervised semantic segmentation (WSSS), which enhances the quality of pseudo labels by generating diverse synthetic images using a diffusion model. The method includes a high-quality image selection strategy to filter out low-quality generated images, significantly improving performance, especially with limited training data. Experimental results demonstrate that IACD outperforms existing state-of-the-art methods, achieving notable improvements in segmentation accuracy on the PASCAL VOC 2012 dataset.

Uploaded by

wxwu3219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

2310.09760

Uploaded by

wxwu3219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IMAGE AUGMENTATION WITH CONTROLLED DIFFUSION FOR WEAKLY-SUPERVISED

SEMANTIC SEGMENTATION
∗ ∗
Wangyu Wu1,2 , Tianhong Dai3 , Xiaowei Huang2 , Fei Ma1 , Jimin Xiao1
1 2 3
Xi’an Jiaotong-Liverpool University The University of Liverpool University of Aberdeen

ABSTRACT
arXiv:2310.09760v1 [cs.CV] 15 Oct 2023

WSSS

Weakly-supervised semantic segmentation (WSSS), which <Image:X_in,

Label:Y> (a) Traditional WSSS
aims to train segmentation models solely using image-level
Diffusion
labels, has achieved significant attention. Existing methods with Good WSSS
quality
primarily focus on generating high-quality pseudo labels us- <Image:X_aug, label:Y>
ing available images and their image-level labels. However,
the quality of pseudo labels degrades significantly when the <Image:X_in, Diffusion
Filter
Label:Y> with Low
size of available dataset is limited. Thus, in this paper, we quality out

tackle this problem from a different view by introducing a (b) Diffusion augmented for WSSS
novel approach called Image Augmentation with Controlled
Diffusion (IACD). This framework effectively augments ex- Fig. 1. (a) In the previous method, only images from the orig-
isting labeled datasets by generating diverse images through inal dataset are used for training. (b) Our proposed IACD uti-
controlled diffusion, where the available images and image- lizes an diffusion model to generate synthetic images. Then,
level labels are served as the controlling information. More- an image selection module is used to annotate and select the
over, we also propose a high-quality image selection strategy high-quality synthetic images to augment the original dataset
to mitigate the potential noise introduced by the randomness for training.
of diffusion models. In the experiments, our proposed IACD
approach clearly surpasses existing state-of-the-art methods.
This effect is more obvious when the amount of available
data is small, demonstrating the effectiveness of our method. gained prominence in the field of computer vision [7, 8, 9].
The generated images exhibit high-quality with few artifacts
Index Terms— weakly-supervised semantic segmenta- and effectively align with the given text prompts, even when
tion, diffusion model, high-quality image selection these prompts depict unrealistic scenarios that were not en-
countered during training. This highlights the robust gen-
1. INTRODUCTION eralization capabilities of diffusion models. Notably, recent
works such as Stable Diffusion [10] with ControlNet [11] is
Weakly-supervised semantic segmentation (WSSS) leverages able to generate high-quality synthetic images.
image-level labels to generate pixel-level pseudo masks for
training the the segmentation models. The primary challenge In this paper, we propose Image Augmentation with Con-
lies in enhancing the quality of generated pseudo-labels. Cur- trolled Diffusion (IACD) to generate high-quality synthetic
rently, most methods involve injecting more category infor- training data for WSSS (see Fig. 1). Our contributions are: 1).
mation into the network or performing additional information Our approach aims to enhance WSSS performance, which is
learning on existing training data [1, 2], such as sub-class dis- the first proposal to utilize conditional diffusion for augment-
tinctions [2] and adding category information to network [1]. ing the original dataset with image-level labels. 2). An image
Alternatively, efforts are directed towards optimizing network selection approach is introduced, aiming to keep high-quality
structures [3, 4, 5] to better suit learning in weakly-supervised training data while effectively filtering out low-quality gener-
scenarios. However, the aforementioned methods are all con- ated images. This strategy helps prevent any adverse impact
strained by the scale of the available training data. on model training. 3). Our proposed framework outperforms
The Diffusion Probabilistic Model (DPM) [6] is an ap- all current state-of-the-art methods, and it shows varying per-
pealing choice for the aforementioned problem because it be- formance improvements across different training data sizes,
longs to a class of deep generative models that have recently particularly at 5% of the training data, where there is 4.9%
∗ Corresponding authors increase for the segmentation task on the validation set on the
PASCAL VOC 2012 dataset.
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new
collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Diﬀusion High quality
We use {Image_labels} as the prompt
generated Image Image

Input
Image Image
Prompt Diffusion
labels classifier WSSS

Detector
Image labels
input image

Fig. 2. The pipeline of our IACD. The airplane in the showcase is using the diffusion model with prompts to generate candidate
images. Subsequently, the candidate images are filtered through an image selection process to ensure that only high-quality
images are used as training data for the downstream WSSS.

2. METHODOLOGY

In this section, we present the general framework and key Xaug = δ(Xin , M, P ). (1)
components of our proposed method. Initially, We introduce
the overall architecture and pipeline of our IACD method More details about the data augmentation process are de-
(Sec. 2.1). Then, a diffusion model based approach is pro- scribed in Algo. 1.
posed for data augmentation in WSSS tasks (Sec. 2.2). Fur-
thermore, we develop a high-quality image selection strategy Algorithm 1: Diffusion Model for Data Augmentation
that aims to ensure the quality of data generated by diffusion
Input: an input image Xin , an image-level label Y
model, thereby reducing the model noise (Sec. 2.3). Finally,
Output: a generated image Xaug
the components of the final dataset used for training are also
1 P ← generate prompt(Y )
discussed (Sec. 2.4).
2 if “person” ∈ Y then
3 M ← detect map(Xin , human pose)
2.1. Overall Structure 4 else
5 M ← detect map(Xin , canny edge)
As illustrated in Fig. 2, we utilize the diffusion model [10]
6 Xaug ← δ(Xin , M, P )
along with ControlNet [11] to generate new training samples
under the guidance of conditioning inputs: original images
and label prompts. In addition, we train a Vision Transformer
(ViT) based image classifier [12] by using existing dataset
with image-level labels to select high-quality generated train-
ing samples. During the selection, we select generated images 2.3. High-quality Synthetic Image Selection
with high prediction scores as high-quality samples and filter
In order to guarantee the quality of synthetic data that will
out low-quality generated images with noise. Finally, we ex-
be used for training, a selection strategy is introduced to se-
tend the original dataset with the generated samples for the
lect the high-quality generated samples. As shown in Fig. 3,
training of WSSS.
a ViT-based patch-driven classifier is first trained by using
the original dataset with the image-level labels. To train the
2.2. Controlled Diffusion for Data Augmentation classifier, the input image Xin is divided into s input patches
Xpatch ∈ Rd×d×3 with the fixed size, where s = hw d2 . Then,
The motivation for using controlled image diffusion models the patch embedding F ∈ Rs×e is achieved by using ViT en-
to augment data is that these models can generate infinite and coder. Next, a weight W ∈ Re×|C| and a softmax function
diverse task-specific synthetic images based on a given im- is applied to output the prediction scores Z ∈ Rs×|C| of each
age and a text prompt. In this work, we utilize Stable Dif- patch:
fusion with ControlNet (SDC) [11] as our generative model
(see Fig. 3). In the data augmentation stage, an input image Z = softmax(F W ), (2)
Xin ∈ Rh×w×3 , a text prompt P , and a detection map M
are feed into SDC δ(·) to generate a new training data Xaug . where C is the set of categories in the dataset. Global max-
More specifically, the text prompt is formulated by the corre- imum pooling (GMP) is then used to select the highest pre-
sponding image-level label Y . The detection map is an extra diction scores ŷ ∈ R1×|C| in each class among all patches.
condition (e.g., Canny Edge [13] and Openpose [14]) to con- Finally, ŷ is utilized as the prediction scores for the image-
trol the generation results. level classification and the classifier is trained via using the
Fig. 3. The overall framework of IACD consists of several steps. Firstly, IACD utilizes controlled diffusion to generate entirely
different images. Subsequently, the original image is processed using the Vision Transformer (ViT) as an encoder to generate
patch embeddings, and a patch-driven classifier is trained for image categorization. Then, the generated diffusion images are
passed through the same trained image classifier to select a high-quality image set. Moreover, the selected image set, along
with the original images and their corresponding labels, is passed to the downstream WSSS task.

multi-label classification prediction error (MCE): keeps the synthetic samples with high prediction scores in
specific categories, which guarantees there is a high proba-
1 X
LM CE = BCE(yc , ŷc ) bility that objects of these classes will appear in the synthetic
|C|
c∈C image. Second, the synthetic image will not contain objects
(3)
1 X that do not belong to the image-level label of the input im-
=− yc log(ŷc ) + (1 − yc ) log(1 − ŷc ),
|C| age. In this way, the quality of synthetic dataset Daug can be
c∈C
improved.
where, ŷc is the prediction score of class c and yc is the
ground-truth label. Once the classifier finishes the training, 2.4. Final Training Dataset of WSSS
we can use it to select the high-quality generated training
data. After selecting the high-quality generated training samples,
In the selection stage, the synthetic image Xaug gener- the synthetic dataset Daug and the original dataset Dorigin
ated by ⟨Xin , Y ⟩, is passed into the classifier, followed by the are combined as an extended dataset Df inal for the training
GMP to output the image-level prediction score ŷ. Then, the of WSSS: Df inal = Dorigin ∪ Daug .
classes with the scores above a certain threshold ϵ are used
as the ground-truth label for the generated image: Yaug = 3. EXPERIMENTS
{c|ŷc > ϵ}. If Yaug is a subset of the label of the input image
Y , the generated sample ⟨Xaug , Yaug ⟩ will be added into the In this section, we describe the experimental setup, including
synthetic dataset Daug . More details about the image selec- dataset, evaluation metrics, and implementation details. We
tion are described in Algo. 2. then compare our method with state-of-the-art approaches on
PASCAL VOC 2012 [15]. Finally, ablation studies are per-
Algorithm 2: High-Quality Image Selection formed to validate the effectiveness of the proposed method.
Input: a ground-truth label of the input image Y , a gen-
erated image Xaug , a prediction score ŷ, a label of 3.1. Experimental Settings
generated image Yaug , a set of classes C, a thresh- Dataset and Evaluated Metric. We conduct our experi-
old ϵ and a synthetic dataset Daug ments on PASCAL VOC 2012 [15], which comprises 21
Output: a synthetic dataset Daug
categories, including the additional background class. The
1 Yaug ← ∅
PASCAL VOC 2012 Dataset is typically augmented with
2 foreach c ∈ C do
the SBD dataset [16]. In total, we utilize 10,582 images
3 if ŷc > ϵ then
with image-level annotations for training, and 1,449 images
4 Yaug ← Yaug ∪ {c}
for validation. The training sets of Pascal VOC contain
5 if Yaug ⊆ Y then images with only image-level labels. We report the mean
6 Daug ← Daug ∪ {⟨Xaug , Yaug ⟩} Intersection-Over-Union (mIoU) as the evaluation criterion.
Additionally, we also evaluate the performance of our IACD
method when the amount of original training data is gradually
This selection strategy serves two purposes. First, it only reduced from 10,582 (100%) to 529 (5%).
Implementation Details. In our experiments, we employe
the ViT-B/16 as the ViT model, and we use the stable diffu-
sion model [10] with ControlNet [11] as our diffusion model.
Images are resized to 384×384 pixels [17] during the train-
ing of the patch-driven image classifier. The 24×24 encoded
patch features are retained as input. The model is trained with
a batch size of 16 for a maximum of 80 epochs. The im-
age selection threshold ϵ is 0.9. We use Canny Edge [13]
and Openpose [14] as detectors for ControlNet [11], with a
total of 20 diffusion steps.. Due to limitations in computa-
tional resources, we generate additional 10,582 images using
a diffusion model in the experiments. During the WSSS train-
ing stage, we combine our synthetic dataset with the original Fig. 4. The comparison of qualitative segmentation results
training dataset as our final training dataset. Subsequently, we with ViT-PCM [12].
selecte ViT-PCM [3] as our WSSS framework without any
modifications. Our final training dataset serve as input for the the effect is more obvious. This suggests that our approach
WSSS framework, while keeping all other settings consistent is highly effective to augment the dataset and improve the
with ViT-PCM [3]. The experiments are conducted using two performance when the amount of training data is insufficient.
NVIDIA 4090 GPUs. Finally, we use the same verification Improvements in Segmentation Results. To assess our
task and settings as ViT-PCM [3]. methods, we apply our approach to the current state-of-the-
art ViT-PCM as upstream data augmentation, while keeping
the downstream WSSS consistent with the existing ViT-
Table 1. The comparison of segmentation performance on PCM. We then compare the segmentation results with the
different sizes of training data. state-of-the-art techniques in Tab. 2. Even with only 50% of
Percentage of Train Data Baseline on Val IACD on Val the train data, our method outperforms the baseline method
ViT-PCM [12]. The comparison of qualitative segmentation
5% 62.6% 67.5% +4.9%
results are shown in Fig. 4.
15% 65.6% 68.5% +3.9%
50% 68.2% 70.5% +2.3%
100% 69.3% 71.4% +2.1% 3.3. Ablation Studies

We conducted an ablation study to assess the impact of our

Table 2. The comparison of semantic segmentation perfor- two key contributions: diffusion augmentation and high-
mance by using only pseudo masks for training. quality image selection. As shown in Tab. 3, our diffusion
Percentage of Train Data Model Pub. mIoU (%)
augmentation introduces some random noisy images gener-
100% MCTformer [4] CVPR22 61.7
100% PPC [18] CVPR22 61.5 ated by the diffusion model, resulting in a 0.2% decrease in
100% SIPE [19] CVPR22 58.6 mIoU on the validation set. Additionally, the proposed high-
100% AFA [5] CVPR22 63.8
100% ViT-PCM [12] ECCV22 69.3 quality image selection effectively reduces noisy images by
50% IACD (Ours) + ViT-PCM 70.5 filtering out low-quality ones, leading to a 2.1% improvement
100% IACD (Ours) + ViT-PCM 71.4 in mIoU for the baseline WSSS framework. When these
two methods are combined, our comprehensive approach
significantly outperforms the original framework.
Table 3. Ablation study on the data augmentation module and
the high-quality image selection module.
Backbone Original Train Diffusion Augmentation Image Selection Result on Val 4. CONCLUSION
ViT-B/16 ✓ 69.3%
ViT-B/16 ✓ ✓ 69.1% In this work, we propose the IACD approach for data augmen-
ViT-B/16 ✓ ✓ ✓ 71.4%
tation in weakly supervised semantic segmentation (WSSS).
Unlike previous methods that focus on optimizing network
3.2. Comparison with State-of-the-arts
structures or mining information from existing images, we
Comparison of Different Data Percentage. Our proposed introduce a diffusion model based module to augment addi-
IACD method effectively enhances the original training tional data for training. To guarantee the quality of generate
dataset size in Tab. 1. As part of upstream data augmen- images, a high-quality image selection module is also pro-
tation, it greatly aids the downstream WSSS framework in posed. By combining these two components, our approach
achieving higher segmentation performance. Furthermore, has better performance than other state-of-the-art methods on
we observed that when the amount of training data is smaller, PASCAL VOC 2012 dataset.
5. REFERENCES [11] Lvmin Zhang and Maneesh Agrawala, “Adding condi-
tional control to text-to-image diffusion models,” arXiv
[1] Zhaozhi Xie and Hongtao Lu, “Exploring category con- preprint arXiv:2302.05543, 2023. 1, 2, 4
sistency for weakly supervised semantic segmentation,”
in IEEE Int. Conf. Acoust. Speech Signal Process., 2022, [12] Alexey Dosovitskiy, Lucas Beyer, Alexander
pp. 2609–2613. 1 Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas
Unterthiner, Mostafa Dehghani, Matthias Minderer,
[2] Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Georg Heigold, Sylvain Gelly, et al., “An image is worth
Robinson Piramuthu, Yi-Hsuan Tsai, and Ming-Hsuan 16x16 words: Transformers for image recognition at
Yang, “Weakly-supervised semantic segmentation via scale,” arXiv preprint arXiv:2010.11929, 2020. 2, 4
sub-category exploration,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern [13] John Canny, “A computational approach to edge de-
Recognition, 2020, pp. 8991–9000. 1 tection,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. PAMI-8, no. 6, pp. 679–698,
[3] Simone Rossetti, Damiano Zappia, Marta Sanzari, 1986. 2, 4
Marco Schaerf, and Fiora Pirri, “Max pooling with vi-
sion transformers reconciles class and shape in weakly [14] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei,
supervised semantic segmentation,” in Eur. Conf. Com- and Yaser Sheikh, “Openpose: Realtime multi-person
put. Vis., 2022, pp. 446–463. 1, 4 2d pose estimation using part affinity fields,” IEEE
Transactions on Pattern Analysis and Machine Intelli-
[4] Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid gence, vol. 43, no. 1, pp. 172–186, 2021. 2, 4
Boussaid, and Dan Xu, “Multi-class token transformer
for weakly supervised semantic segmentation,” in Proc. [15] Mark Everingham, Luc Van Gool, Christopher KI
IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. Williams, John Winn, and Andrew Zisserman, “The
4310–4319. 1, 4 pascal visual object classes (VOC) challenge,” Int. J.
Comput. Vis., vol. 88, pp. 303–338, 2010. 3
[5] Lixiang Ru, Yibing Zhan, Baosheng Yu, and Bo Du,
[16] Bharath Hariharan, Pablo Arbeláez, Lubomir Bourdev,
“Learning affinity from attention: End-to-end weakly-
Subhransu Maji, and Jitendra Malik, “Semantic con-
supervised semantic segmentation with transformers,”
tours from inverse detectors,” in Proc. IEEE Int. Conf.
in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022,
Comput. Vis., 2011, pp. 991–998. 3
pp. 16846–16855. 1, 4
[17] Alexander Kolesnikov and Christoph H Lampert, “Seed,
[6] Jascha Sohl-Dickstein, Eric Weiss, Niru Mah-
expand and constrain: Three principles for weakly-
eswaranathan, and Surya Ganguli, “Deep unsupervised
supervised image segmentation,” in Eur. Conf. Comput.
learning using nonequilibrium thermodynamics,” in Int.
Vis., 2016, pp. 695–711. 4
Conf. Mach. Learn., 2015, pp. 2256–2265. 1
[18] Ye Du, Zehua Fu, Qingjie Liu, and Yunhong Wang,
[7] Jonathan Ho, Ajay Jain, and Pieter Abbeel, “Denoising “Weakly supervised semantic segmentation by pixel-to-
diffusion probabilistic models,” Advances in neural in- prototype contrast,” in Proc. IEEE Conf. Comput. Vis.
formation processing systems, vol. 33, pp. 6840–6851, Pattern Recog., 2022, pp. 4320–4329. 4
2020. 1
[19] Qi Chen, Lingxiao Yang, Jian-Huang Lai, and Xiaohua
[8] Jiaming Song, Chenlin Meng, and Stefano Ermon, “De- Xie, “Self-supervised image-specific prototype explo-
noising diffusion implicit models,” arXiv preprint ration for weakly supervised semantic segmentation,” in
arXiv:2010.02502, 2020. 1 Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022,
[9] Yang Song, Jascha Sohl-Dickstein, Diederik P pp. 4288–4298. 4
Kingma, Abhishek Kumar, Stefano Ermon, and Ben
Poole, “Score-based generative modeling through
stochastic differential equations,” arXiv preprint
arXiv:2011.13456, 2020. 1

[10] Robin Rombach, Andreas Blattmann, Dominik Lorenz,

Patrick Esser, and Björn Ommer, “High-resolution im-
age synthesis with latent diffusion models,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp.
10684–10695. 1, 2, 4

Words That Change Minds The 14 Patterns For Mastering The Language of Influence
No ratings yet
Words That Change Minds The 14 Patterns For Mastering The Language of Influence
55 pages
(Corpus) Fundamental Principles of Corpus Linguistics-Cambridge University Press (2022)
No ratings yet
(Corpus) Fundamental Principles of Corpus Linguistics-Cambridge University Press (2022)
334 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
SegDiff - Image Segmentation With Diffusion Probabilistic Models
No ratings yet
SegDiff - Image Segmentation With Diffusion Probabilistic Models
13 pages
DiffuseMix_CVPR_24
No ratings yet
DiffuseMix_CVPR_24
18 pages
s00371-024-03623-9
No ratings yet
s00371-024-03623-9
16 pages
Diffusion Models in Vision a Survey
No ratings yet
Diffusion Models in Vision a Survey
20 pages
From Text to Mask Localizing Entities Using the
No ratings yet
From Text to Mask Localizing Entities Using the
43 pages
Universal Guidance For Diffusion Models
No ratings yet
Universal Guidance For Diffusion Models
20 pages
2582 Elucidating The Design Space o
No ratings yet
2582 Elucidating The Design Space o
13 pages
2504.05741v2
No ratings yet
2504.05741v2
13 pages
Diffusion Model 5
No ratings yet
Diffusion Model 5
51 pages
FULLTEXT01
No ratings yet
FULLTEXT01
85 pages
DF综述
No ratings yet
DF综述
49 pages
CNN With Tensor Flow
No ratings yet
CNN With Tensor Flow
61 pages
Elucidating The Design Space of Diffusion-Based Generative Models
No ratings yet
Elucidating The Design Space of Diffusion-Based Generative Models
47 pages
Adding Conditional Control To Text-to-Image Diffusion Models
No ratings yet
Adding Conditional Control To Text-to-Image Diffusion Models
33 pages
2209.04747v6
No ratings yet
2209.04747v6
25 pages
Instagen: Enhancing Object Detection by Training On Synthetic Dataset
No ratings yet
Instagen: Enhancing Object Detection by Training On Synthetic Dataset
13 pages
Synthetic Data From Diffusion Models Improves ImageNet Classification
No ratings yet
Synthetic Data From Diffusion Models Improves ImageNet Classification
19 pages
Dataset Diffusion Diffusion-based Synthetic Dataset
No ratings yet
Dataset Diffusion Diffusion-based Synthetic Dataset
21 pages
Adding Conditional Control To Text-to-Image Diffusion Models
No ratings yet
Adding Conditional Control To Text-to-Image Diffusion Models
12 pages
2302.05543
No ratings yet
2302.05543
8 pages
Control Net
No ratings yet
Control Net
12 pages
Algorithms 17 00125
No ratings yet
Algorithms 17 00125
16 pages
DIFFBLENDER Scalable and Composable
No ratings yet
DIFFBLENDER Scalable and Composable
18 pages
Diffusion
100% (5)
Diffusion
62 pages
Li Your Diffusion Model is Secretly a Zero-Shot Classifier ICCV 2023 Paper
No ratings yet
Li Your Diffusion Model is Secretly a Zero-Shot Classifier ICCV 2023 Paper
12 pages
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
No ratings yet
Zhang Adding Conditional Control To Text-to-Image Diffusion Models ICCV 2023 Paper
12 pages
DMD Lowres
No ratings yet
DMD Lowres
22 pages
Wei 2023 Diffusion Model As Mae
No ratings yet
Wei 2023 Diffusion Model As Mae
18 pages
3538 978-1-7281-9835-4/23/$31.00 ©2023 Ieee Icip 2023
No ratings yet
3538 978-1-7281-9835-4/23/$31.00 ©2023 Ieee Icip 2023
5 pages
Efficient Diffusion Model For Image Super Resolution
No ratings yet
Efficient Diffusion Model For Image Super Resolution
19 pages
Diffusion Models
No ratings yet
Diffusion Models
46 pages
6 Batchnorm
No ratings yet
6 Batchnorm
30 pages
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
No ratings yet
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
23 pages
ArXiv 2302.08453 T2I-Adapter
No ratings yet
ArXiv 2302.08453 T2I-Adapter
10 pages
Ali-Aug
No ratings yet
Ali-Aug
29 pages
Chen Et Al. - 2020 - Pre-Trained Image Processing Transformer
No ratings yet
Chen Et Al. - 2020 - Pre-Trained Image Processing Transformer
13 pages
Analyzing and Improving The Training Dynamics of Diffusion Models
No ratings yet
Analyzing and Improving The Training Dynamics of Diffusion Models
39 pages
Automated Image Data Preprocessing With Deep Reinforcement Learning
No ratings yet
Automated Image Data Preprocessing With Deep Reinforcement Learning
9 pages
2302.10586v3
No ratings yet
2302.10586v3
32 pages
Image Classification
No ratings yet
Image Classification
18 pages
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
No ratings yet
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
15 pages
Diffusion Models
No ratings yet
Diffusion Models
27 pages
Diffusion Models in Vision: A Survey: IEEE Transactions On Pattern Analysis and Machine Intelligence March 2023
No ratings yet
Diffusion Models in Vision: A Survey: IEEE Transactions On Pattern Analysis and Machine Intelligence March 2023
26 pages
300 PDF
No ratings yet
300 PDF
8 pages
Lecture 02
No ratings yet
Lecture 02
147 pages
Diffusion
No ratings yet
Diffusion
55 pages
Mask DM
No ratings yet
Mask DM
23 pages
2306.02949v1 (1)
No ratings yet
2306.02949v1 (1)
6 pages
The Devil Is in The Points: Weakly Semi-Supervised Instance Segmentation Via Point-Guided Mask Representation
No ratings yet
The Devil Is in The Points: Weakly Semi-Supervised Instance Segmentation Via Point-Guided Mask Representation
17 pages
978-0-7503-6244-3.preview (1)
No ratings yet
978-0-7503-6244-3.preview (1)
56 pages
Universal Guidance For Diffusion Models
No ratings yet
Universal Guidance For Diffusion Models
10 pages
Wang_Enhance_Image_Classification_via_Inter-Class_Image_Mixup_with_Diffusion_Model_CVPR_2024_paper
No ratings yet
Wang_Enhance_Image_Classification_via_Inter-Class_Image_Mixup_with_Diffusion_Model_CVPR_2024_paper
11 pages
Pre-Trained Image Processing Transformer
No ratings yet
Pre-Trained Image Processing Transformer
12 pages
Multiple Transformer Mining For Vizwiz Image Caption
No ratings yet
Multiple Transformer Mining For Vizwiz Image Caption
2 pages
Diffusion Model
No ratings yet
Diffusion Model
11 pages
Consistency Models
No ratings yet
Consistency Models
41 pages
3675094.3678439
No ratings yet
3675094.3678439
6 pages
The Need For Critical Thinking and The Scientific Method
75% (4)
The Need For Critical Thinking and The Scientific Method
153 pages
Mya - Ucsp 12-JTS
No ratings yet
Mya - Ucsp 12-JTS
4 pages
Middle Childhood
No ratings yet
Middle Childhood
6 pages
Teacher Reflection Form (TRF) Teacher I-Iii RPMS SY 2021 - 2022 - 2022
No ratings yet
Teacher Reflection Form (TRF) Teacher I-Iii RPMS SY 2021 - 2022 - 2022
2 pages
Proposal Defense Script
0% (1)
Proposal Defense Script
3 pages
Grammar Reviewer Grammatical Signal, Emphasis Markers
No ratings yet
Grammar Reviewer Grammatical Signal, Emphasis Markers
7 pages
Caramazza, 2006 - Unlocked PDF
No ratings yet
Caramazza, 2006 - Unlocked PDF
11 pages
한국어 1 - chapter 1-3
No ratings yet
한국어 1 - chapter 1-3
17 pages
0 ELDERLY SPANISH COUPLE SLEEP IN PARK AFTER BEING EVICTED Correction
No ratings yet
0 ELDERLY SPANISH COUPLE SLEEP IN PARK AFTER BEING EVICTED Correction
2 pages
Diagramming Sentences 1
100% (1)
Diagramming Sentences 1
5 pages
Comparison Degree
No ratings yet
Comparison Degree
10 pages
Dynamic Modeling Using UniSim Design 2013eng
100% (1)
Dynamic Modeling Using UniSim Design 2013eng
129 pages
Comprehensive Behavior Management: Individualized, Classroom, and Schoolwide Approaches 2nd Edition - Ebook PDF Version
100% (53)
Comprehensive Behavior Management: Individualized, Classroom, and Schoolwide Approaches 2nd Edition - Ebook PDF Version
62 pages
2. Course Introduction-en
No ratings yet
2. Course Introduction-en
1 page
PSTM Week 4
No ratings yet
PSTM Week 4
24 pages
"Handwriting & Individuality" A Psychological, Biomechanical and Growth & Developmental Perspective of Handwriting
0% (1)
"Handwriting & Individuality" A Psychological, Biomechanical and Growth & Developmental Perspective of Handwriting
4 pages
Technical Manual For The Theory of Mind Inventory and Theory of Mind Task Battery
No ratings yet
Technical Manual For The Theory of Mind Inventory and Theory of Mind Task Battery
135 pages
Balance Scorecard
No ratings yet
Balance Scorecard
7 pages
Unit I: Philosophy and The Human Person: Topics
No ratings yet
Unit I: Philosophy and The Human Person: Topics
20 pages
Betensky
No ratings yet
Betensky
8 pages
AI-Driven Cybersecurity
No ratings yet
AI-Driven Cybersecurity
26 pages
Dll-Tools For Hairdressing-7-Q1-M2
No ratings yet
Dll-Tools For Hairdressing-7-Q1-M2
6 pages
Phase 6 Business English Teacher Training (BETT)
No ratings yet
Phase 6 Business English Teacher Training (BETT)
20 pages
Assignment of Morphology and Syntax by 04
No ratings yet
Assignment of Morphology and Syntax by 04
9 pages
How Bilingualism Can Affect Your Brain
100% (1)
How Bilingualism Can Affect Your Brain
2 pages
The Role of Media Relations in Corporate Public Relations Practice A Study On 15 Public Listed Companies
No ratings yet
The Role of Media Relations in Corporate Public Relations Practice A Study On 15 Public Listed Companies
38 pages
Approaches To Teaching Reading
No ratings yet
Approaches To Teaching Reading
51 pages
Introduce Yourself
100% (2)
Introduce Yourself
3 pages

2310.09760

Uploaded by

2310.09760

Uploaded by

IMAGE AUGMENTATION WITH CONTROLLED DIFFUSION FOR WEAKLY-SUPERVISED

Weakly-supervised semantic segmentation (WSSS), which <Image:X_in,

We conducted an ablation study to assess the impact of our

[10] Robin Rombach, Andreas Blattmann, Dominik Lorenz,

You might also like