Block Based Deepfake Detection Main
Block Based Deepfake Detection Main
1 Introduction
along with their powerful creative capabilities, generative models also have sev-
eral negative aspects. One of the main problems is the possibility of abuse, as
such models can be used to generate fake or convincingly manipulated content,
fuelling the spread of misinformation and fraud [45,48]. Moreover, they can raise
ethical concerns regarding intellectual property and privacy [33], especially when
they are used to create content based on personal data without the consent of
the involved people. The proper and preventive detection of AI-generated con-
tent therefore becomes a critical priority to combat the spread of deepfakes and
maintain the integrity of online information.
The scientific community is striving to find increasingly new and effective
techniques and methods that can discern the nature (real or generated) of digi-
tal images. These techniques can be based on analysis and processing of statis-
tics extracted from images (e.g. analytical traces) or on deep learning engines.
Among other we recall the analysis of image frequencies, such as the Discrete
Cosine Transform (DCT) and the Fourier Transform to map image pixels from
the spatial domain to the frequency domain, facilitating greater interpretability
in the task of deepfake recognition [2,22]. Deep learning-based methodologies in-
volve the construction of neural models achieving in general better results than
the previous techniques [1,16], but at the expense of a lower generalization.
In this paper we propose a deep learning based architecture that exploits
three backbones, called “Base Models” (BM) trained and specialized to specific
classification tasks with special emphasis to DM generated data, GAN generated
data, and real ones. The fundamental concept is based on utilising the inherent
capabilities of the basic models, each of which is dedicated to extracting dis-
criminating features specific to a generating architecture left behind during the
image generation process. This approach aims at enhancing the final model by
making it more resilient and robust to JPEG compression attacks, commonly
employed by social networks, and more effective in the generalisation of the ac-
quired knowledge. Focusing on specific distinctive features associated with dif-
ferent image generation technologies allows the model to develop a deeper and
more focused understanding of the peculiarities of each image category, thus im-
proving its ability to distinguish between genuine and synthetic images in real
and variable contexts. With this work we face the difficulty, often encountered in
the state-of-the-art, of generalizing the recognition capabilities acquired in the
training phase both to images generated by AIs not belonging to the dataset
used in that phase and to synthetic images of generating architectures other
than those taken into consideration.
The main contributions of this paper are:
– A new approach for extracting main features from digital images using Base
Models.
– A model capable of retaining its discriminative ability even at JPEG com-
pression attacks.
The paper follows the following structure: Section 2 provides an overview of
the main deepfake detection methods currently present in the state-of-the-art;
in Section 3, a detailed description of the dataset of images used to conduct the
DeepFeatureX Net: Deep Features eXtractors based Network 3
2 Related Works
Most deepfake detection methods are based on intrinsic trace analysis to classify
real content from synthetic ones. The Expectation-Maximization algorithm was
used in [19] to capture the correlation between pixels, resulting in a discrimina-
tive trace able to distinguish deepfake images from pristine ones. McCloskey et
al. [38] showed that generative models create synthetic content with color channel
curve statistics different from the real data, resulting in another discriminative
trace. In the frequency domain [21,37], researchers highlighted the possibility
of identifying abnormal traces left during generative models, mainly analyzing
features extracted from DCT [3,9,15]. Liu et al. [35] proposed a method called
Spatial-Phase Shallow Learning (SPSL) that combines spatial imaging and phase
spectrum to capture artifacts from up-sampling on synthetic data, improving
deepfake detection. Corvi et al. [11] analyzed a large number of images gener-
ated by different families of generative models (GAN, DM, and VQ-GAN (Vector
Quantized Generative Adversarial Networks)) in the Fourier domain to discover
the most discriminative features between real and synthetic images. The exper-
iments showed that regular anomalous patterns are available in each category
of involved architecture. Another category of detectors are deep neural network-
based approaches. Wang et al. [50] used a ResNet-50 model trained with images
generated by ProGAN [28] to differentiate real from synthesized images. Their
study demonstrated the model’s ability to generalize beyond ProGAN-generated
Deepfakes. Wang et al. [49] introduced FakeSpotter, a new approach that relies
on monitoring the behaviors of neurons (counting which and how many activate
on the input image) within a dedicated CNN to identify Deepfake-generated
faces. Many researchers have focused their research on trying to investigate how
possible it is to detect images created by diffusion models. Corvi et al. [10] were
among the first to address this issue, exploring the difficulties in distinguishing
images generated by diffusion models from real ones and evaluating the suitabil-
ity of current detectors. Sha et al. [44] proposed DE-FAKE, a machine learning
classifier designed for detecting diffusion model-generated images across four
prominent text-to-image architectures. The authors then proposed a pioneering
study on the detection and attribution of fake images generated by diffusion
models, demonstrating the feasibility of distinguishing such images from real
ones and attributing them to the source models, and also discovering the in-
fluence of prompts on the authenticity of images. Recently, Guarnera et al.[20]
proposed a method based on the attribution of images generated by generative
adversarial networks (GANs) and diffusion models (DMs) through a multi-level
4 O. Pontorno et al.
hierarchical strategy. At each level, a distinct and specific task is addressed: the
first level (more generic), allows discerning between real and AI-generated im-
ages (either created by GAN or DM architectures); the second level determines
whether the images come from GAN or DM technologies; and the third level
addresses the attribution of the specific model used to generate the images.
The limitations of these methods mainly concern the presence of experimental
results performed only under ideal conditions and, consequently, the almost total
absence of generalization tests: the classification performance of most state-of-
the-art methods drops drastically when testing images generated by architectures
never considered during the training procedure.
3 Dataset details
The dataset comprises a total of 72, 334 images, distributed as follows: 19, 334
real images collected from CelebA [36], FFHQ [31], and other sources [33,10],
37, 572 images generated by the GAN architectures GauGAN [40], BigGAN [4],
ProGAN [29], StarGAN [6], AttGAN [24], GDWCT [5], CycleGAN [54], Style-
GAN [31], StyleGAN2 [32], StyleGAN3 [30], and 15, 423 images produced by the
DM architectures DALL-E MINI 1 , DALL-E 2 [41], Latent Diffusion [42], Stable
Diffusion 2 (Figure 1 (a) shows some examples of used images). All images are
in PNG format.
Initially the dataset was divided into three parts: a first 40% was used for
training and validation of the Base Models (refer to Section 4.1); another 40%
was used for training and validation of the complete models (refer to Section 4.2);
finally the remaining 20% was used as testing dataset for both phases. Since our
only goal is to discern the nature of the images, regardless of semantics, resolu-
tion, and size, the images were collected with as much variety of these parameters
as possible. The objective is to underscore the dataset’s varied composition, in-
corporating images from different sources, each marked by unique tasks and
approaches to image creation.
4 Proposed Method
The model proposed in this paper consists of exploiting three CNN backbones
as feature extractors, which are then concatenated and processed to solve the
classification task. The key idea of the model lies in the training of the three
backbones, each of which is trained using a specially unbalanced dataset of im-
ages (as detailed below). The purpose of this procedure is to force each backbone
to focus on finding the discriminative features, left by each type of generative
model during the generation phase, contained in the images belonging to a spe-
cific class (real, GAN-generated, DM-generated). We give the name of ‘Base
Model’ to backbones trained on a highly unbalanced dataset and later used as
1
github.com/borisdayma/dalle-mini
2
github.com/CompVis/stable-diffusion
DeepFeatureX Net: Deep Features eXtractors based Network 5
Fig. 1. Entire pipeline of the proposed method. (a) shows the process of dividing the
training dataset into three unbalanced subsets, each with respect to a specific class
(DM, GAN, real) used for training a specific Base Model. (b) illustrates the architecture
of the final model, which takes the three Base Models ϕc trained in the previous phase
with frozen weights, and uses them to extract the features from a digital image ϕc (I),
where c ∈ C = {DM , GAN , REAL}. These are then concatenated in channel dimension
ϕ(I) = ϕDM (I) ⊕ ϕGAN (I) ⊕ ϕREAL (I) and processed to solve the classification task.
feature pullers in the complete model. Figure 1 shows the entire pipeline of the
proposed method.
5 Experimental results
Two types of experiments were conducted: Inference and robustness tests to as-
sess the effectiveness and robustness of the classification models, and comparison
with the state-of-the-art in the generalization test.
Fig. 2. Image variation as JPEG compression Quality Factor decreases. On the left
raw image, at center JPEG compressed image at Quality Factor 80, and on the right
the image at QF 50. Image generated by StyleGAN2 [32].
the main differences between images with and without JPEG compression. It
can be observed that as QF is decreased, low frequencies are removed and JPEG
blocks are visible. This operation could lead to the removal of those (potentially)
discriminative features identified by the various classifiers.
Table 2 shows the performance of both tests in the three-class classification.
From the results obtained, we can see that, regardless of the backbone used in
the Base Model, in general this approach succeeds in achieving accuracy values
in excess of 85%. In particular, the use of a model belonging to the DenseNet
family as a backbone gives a boost to the overall performance of the models.
To gain a better understanding of the model’s ability to distinguish between
real and AI-generated images (from GAN or DM) we recalculated the previ-
ous performance values in binary classification: the calculation was performed
considering the predicted classes GAN and DM as deepfakes and keeping the
predictions of the real class unchanged, then the metrics were recalculated. Ta-
DeepFeatureX Net: Deep Features eXtractors based Network 9
ble 3 shows the metrics obtained from the recalculation. Looking at the new
values, we can see how performance has increased in terms of accuracy in both
the inference test and, above all, the JPEG compression robustness test. From
the obtained results, DenseNet 161 represents the backbone of the Basic Model
as it leads to the best classification results and demonstrates good robustness to
JPEG compression: despite the fact that the model was trained using only raw
images, the accuracy and F1 score values tend not to decrease drastically as the
compression QF decreases.
already considered in the training phase; T∗o the dataset containing images gener-
i/o
ated by architectures not considered in the training phase; T∗ contains images
generated by both type of architectures during the training phase; TG∗ the dataset
containing only images generated by GANs as fakes; TD∗ the dataset containing
∗
only images generated by DMs as fakes; TD/G contains images generated by both
GANs and DMs architectures. Explicitly:
– TGi contains a fake image sample of 2000 divided equally between images
generated by GauGAN [40], BigGAN [4], ProGAN [29], and CycleGAN [54].
– TGo contains a fake image sample of 2000 divided equally between images
generated by Generative Adversarial Transformers (GANformer) [27], De-
noising DiffusionGANs [52], DiffusionGANs [51], ProjectedGANs [43], and
Taming Transformers [14].
i/o
– TG contains a fake image sample of 2000 divided equally between images
generated by the same generative models of TGi and TGo .
– TDi contains a fake image sample of 2000 divided equally between images
generated by Diffusion and images taken randomly from the COCOFake
dataset [8], generated by Stable Diffusion 3 .
– TDo contains a fake image sample of 2000 divided equally between images
generated by Vector Quantized Diffusion Model (VQ Diffusion) [18], Denois-
ing Diffusion Probabilistic Model (DDPM) [25], and images taken randomly
from the COCOGlide dataset, generated by Glide [39].
i/o
– TD contains a fake image sample of 2000 divided equally between images
generated by the same generative models of TDi and TDo .
i
– TD/G contains a fake image sample of 2000 divided equally between images
generated by the same generative models of TDi and TGi .
o
– TD/G contains a fake image sample of 2000 divided equally between images
generated by the same generative models of TDo and TGo .
i/o
– TD/G contains a fake image sample of 2000 divided equally between images
generated by all the same previous generative models.
We also specify that each of the datasets listed above contains a sample of
2000 real images taken randomly in equal numbers from the datasets We also
specify that each of the datasets listed above contains a sample of 2000 real
images taken randomly in equal numbers from the AFHQ [7], Imagenet [12] and
COCO [34] datasets.
Table 4 shows the percentage values of the accuracies obtained by the various
models in the different contexts T . When reading the results, it is important to
consider that all images in the test sets are compressed in JPEG format, which,
taking into account that our model was trained using only raw images, may have
lowered its performance as demonstrated in Section 5.1. The state-of-the-art ap-
proaches used for comparison are [1,16,20,50]. This choice is due to the fact that
almost all these methods were trained using generative architectures considered
3
github.com/CompVis/stable-diffusion
DeepFeatureX Net: Deep Features eXtractors based Network 11
ResNet 18 63.77 68.89 66.65 66.23 55.03 58.30 63.70 62.30 62.82
ResNet 34 53.87 70.03 63.25 65.48 48.55 54.76 57.31 61.84 60.98
ResNet 50 59.58 73.13 67.95 67.89 53.38 58.50 62.10 65.01 63.90
ResNet 101 60.35 68.08 65.40 72.12 56.45 59.68 64.21 63.62 63.20
ResNet 152 53.94 68.61 61.84 63.90 50.15 55.27 55.27 61.51 60.00
ResNeXt 101 54.35 67.42 62.86 74.18 50.23 59.57 61.19 61.06 61.87
ViT b16 65.81 73.31 69.46 68.59 52.69 58.30 66.62 64.53 62.72
ViT b32 54.07 61.91 58.87 60.34 41.87 47.92 56.22 54.25 57.38
Gandhi2020 [16] 52.30 50.79 51.71 49.91 50.86 50.34 51.54 50.57 51.06
SOTA
Wang2020 [50] 62.41 53.18 57.87 50.13 50.93 50.44 58.26 52.14 54.86
Arshed2024 [1] 47.46 47.65 48.54 52.69 50.00 51.04 49.89 48.94 52.20
Guarnera2024 [20] 55.00 55.63 56.23 54.11 45.98 49.97 56.07 52.21 57.17
Our 64.74 72.47 69.89 68.09 60.82 59.96 66.06 65.02 64.39
Table 4. Percentage values of the accuracy obtained in generalization phase. The tests
distinguished between images generated from architectures seen in the training phase,
but with different initial conditions (superscript i), and images generated from archi-
tectures never seen before (superscript o), and mixed (superscript i/o). Furthermore,
the tests distinguished between using only images generated by GANs (G-index), those
by DMs (D-index), and mixed (G/D-index).
in our experiments. Wang et al. [50] and Gandhi et al. [16] used only images gen-
erated by GAN models and represent some of the best approaches in literature
able to solve well the deepfake detection task (in the specific domain of GAN
generated images). Despite this, experimental results reported in Table 4 show
that these approaches are able to achieve similar classification results compared
to methods trained considering images generated by also DM engines. However,
these results show little ability to generalize. Our approach is able to generalize
better, outperforming such state-of-the-art methods with classification accuracy
over 10%, in any context. Arshed et al. [1] and Guarnera et al. [20] used one
specific architecture to extract features for images generated by GAN and DM
engines. The main limitation compared to our approach regards the strategy for
feature extraction, since we used three specific models to better extract the most
discriminative characteristics of the input data for each involved image category
(GAN-generated, DM-generated, real).
In summary, from the obtained results (Table 4), our approach succeeds on av-
erage in generalizing better in most of the performed tests. Although baselines
perform well in generalization when the dataset is composed of deepfake images
generated by a single technology, they encounter difficulties when the dataset
contains images from multiple generating architectures, both seen and unseen
i/o
(column TG/D ). Then, the proposed model outperforms all other state-of-the-art
methods, confirming the good generalization ability in different contexts.
12 O. Pontorno et al.
References
1. Arshed, M.A., Mumtaz, S., Ibrahim, M., Dewi, C., Tanveer, M., Ahmed, S.: Mul-
ticlass AI-Generated Deepfake Face Detection Using Patch-Wise Deep Learning
Model. Computers 13(1), 31 (2024)
2. Asnani, V., Yin, X., Hassner, T., Liu, X.: Reverse Engineering of Generative Mod-
els: Inferring Model Hyperparameters from Generated Images. IEEE Transactions
on Pattern Analysis and Machine Intelligence (2023)
3. Bergmann, S., Moussa, D., Brand, F., Kaup, A., Riess, C.: Forensic analysis of AI-
compression traces in spatial and frequency domain. Pattern Recognition Letters
(2024)
4. Brock, A., Donahue, J., Simonyan, K.: Large Scale GAN Training for High Fidelity
Natural Image Synthesis. In: International Conference on Learning Representations
(2018)
5. Cho, W., Choi, S., Park, D.K., Shin, I., Choo, J.: Image-To-Image Translation via
Group-Wise Deep Whitening-and-Coloring Transformation. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10639–
10647 (2019)
6. Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: Unified Gen-
erative Adversarial Networks for Multi-Domain Image-to-Image Translation. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion. pp. 8789–8797 (2018)
DeepFeatureX Net: Deep Features eXtractors based Network 13
7. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: Diverse Image Synthesis for
Multiple Domains. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition. pp. 8188–8197 (2020)
8. Cocchi, F., Baraldi, L., Poppi, S., Cornia, M., Cucchiara, R.: Unveiling the Impact
of Image Transformations on Deepfake Detection: An Experimental Analysis. In:
International Conference on Image Analysis and Processing. pp. 345–356. Springer
(2023)
9. Concas, S., Perelli, G., Marcialis, G.L., Puglisi, G.: Tensor-Based Deepfake Detec-
tion In Scaled And Compressed Images. In: 2022 IEEE International Conference
on Image Processing (ICIP). pp. 3121–3125. IEEE (2022)
10. Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the
Detection of Synthetic Images Generated by Diffusion Models. In: IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5.
IEEE (2023)
11. Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing Properties
of Synthetic Images: from Generative Adversarial Networks to Diffusion Models.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 973–982 (2023)
12. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale
hierarchical image database. In: 2009 IEEE Conference on Computer Vision and
Pattern Recognition. pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.
5206848
13. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner,
T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An Image is Worth
16x16 Words: Transformers for Image Recognition at Scale. In: International Con-
ference on Learning Representations (2020)
14. Esser, P., Rombach, R., Ommer, B.: Taming Transformers for High-Resolution Im-
age Synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition. pp. 12873–12883 (2021)
15. Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., Holz, T.: Lever-
aging Frequency Analysis for Deep Fake Image Recognition. In: Proceedings of the
37th International Conference on Machine Learning, ICML. pp. 3247–3258. PMLR
(2020)
16. Gandhi, A., Jain, S.: Adversarial Perturbations Fool Deepfake Detectors. In: 2020
International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2020)
17. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y.: Generative Adversarial Nets. Advances in Neural
Information Processing Systems 27 (2014)
18. Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.:
Vector Quantized Diffusion Model for Text-to-Image Synthesis. In: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp.
10696–10706 (2022)
19. Guarnera, L., Giudice, O., Battiato, S.: Fighting Deepfake by Exposing the Con-
volutional Traces on Images. IEEE Access 8, 165085–165098 (2020)
20. Guarnera, L., Giudice, O., Battiato, S.: Mastering Deepfake Detection: A Cutting-
Edge Approach to Distinguish GAN and Diffusion-Model Images. ACM Trans-
actions on Multimedia Computing, Communications and Applications (2024).
https://doi.org/10.1145/3652027
21. Guarnera, L., Giudice, O., Nastasi, C., Battiato, S.: Preliminary Forensics Analy-
sis of Deepfake Images. In: 2020 AEIT International Annual Conference (AEIT).
pp. 1–6. IEEE (2020). https://doi.org/10.23919/AEIT50178.2020.9241108
14 O. Pontorno et al.
22. Guarnera, L., Giudice, O., Nießner, M., Battiato, S.: On the Exploitation of Deep-
fake Model Recognition. In: Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition. pp. 61–70 (2022)
23. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recogni-
tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. pp. 770–778 (2016)
24. He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: AttGAN: Facial Attribute Editing
by Only Changing What You Want. IEEE Transactions on Image Processing (11),
5464–5478 (2019)
25. Ho, J., Jain, A., Abbeel, P.: Denoising Diffusion Probabilistic Models. Advances in
Neural Information Processing Systems 33, 6840–6851 (2020)
26. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely Connected
Convolutional Networks. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 4700–4708 (2017)
27. Hudson, D.A., Zitnick, L.: Generative Adversarial Transformers. In: International
Conference on Machine Learning. pp. 4487–4499. PMLR (2021)
28. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Im-
proved Quality, Stability, and Variation. In: International Conference on Learning
Representations (ICLR) 2018 (2018)
29. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Im-
proved Quality, Stability, and Variation. In: International Conference on Learning
Representations (2018)
30. Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., Aila,
T.: Alias-Free Generative Adversarial Networks. Advances in Neural Information
Processing Systems 34, 852–863 (2021)
31. Karras, T., Laine, S., Aila, T.: A Style-Based Generator Architecture for Gen-
erative Adversarial Networks. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. pp. 4401–4410 (2019)
32. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing
and Improving the Image Quality of StyleGAN. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 8110–8119 (2020)
33. Leotta, R., Giudice, O., Guarnera, L., Battiato, S.: Not with My Name! Inferring
Artists’ Names of Input Strings Employed by Diffusion Models. In: International
Conference on Image Analysis and Processing. pp. 364–375. Springer (2023)
34. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár,
P., Zitnick, C.L.: Microsoft Coco: Common Objects in Context. In: Computer
Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September
6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
35. Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., Yu, N.: Spatial-
Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 772–781 (2021)
36. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep Learning Face Attributes in the Wild. In:
Proceedings of International Conference on Computer Vision (ICCV) (December
2015)
37. Marra, F., Gragnaniello, D., Verdoliva, L., Poggi, G.: Do GANs Leave Artificial
Fingerprints? 2019 IEEE Conference on Multimedia Information Processing and
Retrieval (MIPR) pp. 506–511 (2019)
38. McCloskey, S., Albright, M.: Detecting GAN-Generated Imagery Using Saturation
Cues. In: 2019 IEEE International Conference on Image Processing (ICIP). pp.
4584–4588. IEEE (2019)
DeepFeatureX Net: Deep Features eXtractors based Network 15
39. Nichol, A.Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B.,
Sutskever, I., Chen, M.: GLIDE: Towards Photorealistic Image Generation and
Editing with Text-Guided Diffusion Models. In: International Conference on Ma-
chine Learning. pp. 16784–16804. PMLR (2022)
40. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: GauGAN: Semantic Image Synthesis
with Spatially Adaptive Normalization. In: ACM SIGGRAPH 2019 Real-Time
Live! pp. 1–1 (2019)
41. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical Text-
Conditional Image Generation with CLIP Latents. arXiv preprint:2204.06125 1(2),
3 (2022)
42. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-Resolution
Image Synthesis with Latent Diffusion Models. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 10684–10695 (2022)
43. Sauer, A., Chitta, K., Müller, J., Geiger, A.: Projected GANs Converge Faster.
Advances in Neural Information Processing Systems 34, 17480–17492 (2021)
44. Sha, Z., Li, Z., Yu, N., Zhang, Y.: De-fake: Detection and Attribution of Fake
Images Generated by Text-to-Image Generation Models. In: Proceedings of the
2023 ACM SIGSAC Conference on Computer and Communications Security. pp.
3418–3432 (2023)
45. Shan, S., Cryan, J., Wenger, E., Zheng, H., Hanocka, R., Zhao, B.Y.: Glaze: Pro-
tecting Artists from Style Mimicry by {Text-to-Image} Models. In: 32nd USENIX
Security Symposium (USENIX Security 23). pp. 2187–2204 (2023)
46. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper-
vised Learning Using Nonequilibrium Thermodynamics. In: International Confer-
ence on Machine Learning. pp. 2256–2265. PMLR (2015)
47. Tan, M., Le, Q.: Efficientnet: Rethinking Model Scaling for Convolutional Neu-
ral Networks. In: International Conference on Machine Learning. pp. 6105–6114.
PMLR (2019)
48. Vyas, N., Kakade, S.M., Barak, B.: On Provable Copyright Protection for Genera-
tive Models. In: International Conference on Machine Learning. pp. 35277–35299.
PMLR (2023)
49. Wang, R., Juefei-Xu, F., Ma, L., Xie, X., Huang, Y., Wang, J., Liu, Y.: FakeSpot-
ter: a Simple Yet Robust Baseline for Spotting AI-Synthesized Fake Faces. In:
Proceedings of the Twenty-Ninth International Conference on International Joint
Conferences on Artificial Intelligence. pp. 3444–3451 (2021)
50. Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: CNN-Generated Im-
ages are Surprisingly Easy to Spot... for Now. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 8695–8704 (2020)
51. Wang, Z., Zheng, H., He, P., Chen, W., Zhou, M.: Diffusion-GAN: Training GANs
with Diffusion. arXiv preprint arXiv:2206.02262 (2022)
52. Xiao, Z., Kreis, K., Vahdat, A.: Tackling the Generative Learning Trilemma with
Denoising Diffusion GANs. arXiv preprint arXiv:2112.07804 (2021)
53. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated Residual Transfor-
mations for Deep Neural Networks. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. pp. 1492–1500 (2017)
54. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-To-Image Translation
Using Cycle-Consistent Adversarial Networks. In: Proceedings of the IEEE Inter-
national Conference on Computer Vision. pp. 2223–2232 (2017)