没代码ProActive DeepFake Detection Using GAN-based Visible
没代码ProActive DeepFake Detection Using GAN-based Visible
Watermarking
With the advances in generative adversarial networks (GAN), facial manipulations called DeepFakes have
caused major security risks and raised severe societal concerns. However, the popular DeepFake passive de-
tection is an ex-post forensics countermeasure and fails in blocking the disinformation spread in advance.
Alternatively, precautions such as adding perturbations to the real data for unnatural distorted DeepFake
output easily spotted by the human eyes are introduced as proactive defenses. Recent studies suggest that
these existing proactive defenses can be easily bypassed by employing simple image transformation and re-
construction techniques when applied to the perturbed real data and the distorted output, respectively. The
aim of this article is to propose a novel proactive DeepFake detection technique using GAN-based visible wa-
termarking. To this front, we propose a reconstructive regularization added to the GAN’s loss function that
embeds a unique watermark to the assigned location of the generated fake image. Thorough experiments on
multiple datasets confirm the viability of the proposed approach as a proactive defense mechanism against
DeepFakes from the perspective of detection by human eyes. Thus, our proposed watermark-based GANs
prevent the abuse of the pretrained GANs and smartphone apps, available via online repositories, for Deep-
Fake creation for malicious purposes. Further, the watermarked DeepFakes can also be detected by the SOTA
DeepFake detectors. This is critical for applications where automatic DeepFake detectors are used for mass
audits due to the huge cost associated with human observers examining a large amount of data manually.
CCS Concepts: • Computing methodologies → Image manipulation; Computer vision; • Security and
privacy → Social aspects of security and privacy;
Additional Key Words and Phrases: DeepFakes, facial manipulations, proactive deepfake detection
ACM Reference format:
Aakash Varma Nadimpalli and Ajita Rattani. 2024. ProActive DeepFake Detection using GAN-based Visible
Watermarking. ACM Trans. Multimedia Comput. Commun. Appl. 20, 11, Article 344 (September 2024), 27 pages.
https://doi.org/10.1145/3625547
1 INTRODUCTION
Benefiting from the significant progress in generative adversarial networks (GANs) and the
availability of free large-scale datasets, AI-synthesized media called “DeepFakes” are becoming
This work is supported by National Science Foundation (NSF) awards no. 2129173 and 2235135. The research infrastruc-
ture used in this study is supported by grant no. 13106715 from the Air Force Office of Scientific Research.
Authors’ addresses: A. V. Nadimpalli, Wichita State University, 1845 Fairmount St, Wichita, KS 67260; e-mail: ax-
[email protected]; A. Rattani, University of North Texas, 1155 Union Cir, Denton, TX 76205; e-mail:
[email protected].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be
344
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions from [email protected].
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
1551-6857/2024/09-ART344 $15.00
https://doi.org/10.1145/3625547
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:2 A. V. Nadimpalli and A. Rattani
increasingly accessible and indistinguishable from authentic content for human eyes. Specifically,
DeepFakes [Westerlund 2019] refer to multimedia content (such as images, audio, and videos) that
have been digitally altered or synthetically created using deep generative models [Nguyen et al.
2022; Tolosana et al. 2020]. Apart from many creative and artistic uses of DeepFakes [Chan et al.
2019], many harmful uses range from non-consensual pornography to disinformation campaigns
meant to sow civil unrest and disrupt democratic elections. DeepFakes have been flagged as a top
AI threat to society [Li et al. 2020a; Nguyen et al. 2022].
In this context, a number of facial manipulation (forgery) based DeepFake generation techniques
have been proposed [Nguyen et al. 2022; Westerlund 2019]. These GAN-based facial manipulations
or forgery techniques depict human subjects with altered identities (identity swap), attributes, or
malicious actions and expressions (face reenactment) in a given image or video. Specifically, iden-
tity or face swapping is the task of transferring a face from the source to the target image. FSGAN
is a popular GAN-based tool for identity swapping [Nirkin et al. 2023]. Attribute manipulation is a
fine-grained facial manipulation obtained by modifying simple attributes (e.g., hair color, skin tone,
and gender) using popular GANs such as StarGAN [Choi et al. 2017] and AttGAN [He et al. 2019].
Similar to identity swap, face reenactment involves facial expression swap between the source and
target facial images using GANs such as Face Swapping GAN (FSGAN) [Nirkin et al. 2023]. These
facial manipulation techniques can be easily abused by malicious users, with little to no technical
knowledge, to tamper users’ facial images resulting in a threat to privacy, reputation, and security.
In fact, several smartphone-based applications have such attribute modifications in the form of fil-
ters. For instance, FaceApp,1 a popular GAN-based smartphone application modifies an uploaded
image based on the selected attribute that can be edited using a slider to regulate the magnitude
of the change. The entire process of facial modification can be easily accomplished within five
minutes using these applications and other pretrained GANs available in the online repositories.
To mitigate the risk posed by facial-forgery-based DeepFakes, DeepFake detection techniques
that distinguish between Real and DeepFake data are proposed as a countermeasure. The popular
DeepFake detection techniques include training convolution neural network (CNN) based bi-
nary classification baselines [Chollet 2017; He et al. 2016; Szegedy et al. 2016], detecting blending
boundaries [Li et al. 2020a], lip-syncing [Haliassos et al. 2021], and multi-attentional model [Zhao
et al. 2021]. These aforementioned passive detection techniques are an ex-post forensics counter-
measure and are still in their early stage [Wang et al. 2022a, b]. These techniques suffer from poor
detection accuracy [Chollet 2017; Peng et al. 2022], cross-dataset generalizability [Nadimpalli and
Rattani 2022a; Zhao et al. 2021], and obtain biased performance across demographic attributes such
as gender and race [Nadimpalli and Rattani 2023b; Trinh and Liu 2021]. Further, these passive tech-
niques cannot completely prevent the negative impact as debunking DeepFakes from social media
takes time. Therefore, they fail in blocking the disinformation spread in advance and harm to the
reputation of the victim is ever-lasting.
To address the aforementioned limitations of passive defenses, researchers have proposed proac-
tive DeepFake defenses [Ruiz et al. 2020; Wang et al. 2022a, b]. These proactive defenses are based
on adding adversarial noise (perturbation) into the authentic (real) data. This results in unnaturally
distorted output by GANs from the perspective of human eyes to disrupt the DeepFake creation.
However, these visually unnatural distorted samples can still spoof the DeepFake detectors since
the human eye and neural network share a different decision logic [Wang et al. 2022b]. Further,
these techniques can be easily bypassed by (a) simple image transformations such as Gaussian blur
and JPEG compression [Wang et al. 2022a, b] when applied to real images with added perturbation,
and (b) simple image reconstruction [Chen et al. 2021] when applied on the distorted DeepFake
1 https://www.faceapp.com/
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:3
output. Further, learning adversarial noise (perturbation) for each real image is a time-consuming
operation. These reasons limit the practical applicability of existing proactive DeepFake detection
techniques.
The aim of this article is to propose a novel proactive facial forgery-based DeepFake detection
technique using GAN-based visible watermarking. A digital watermark allows a piece of data to be
identified as being owned by someone or having a specific copyright. Most of the traditional image-
based watermarking techniques [Begum and Uddin 2020] operate by changing the transform do-
main coefficients of the image using different transforms (such as Discrete Cosine Transform
(DCT) and Discrete Fourier Transform (DFT)). Existing studies demonstrate the low robustness
of these traditional techniques against adversarial and removal attacks [Begum and Uddin 2020]
compared to watermarking techniques embedded in deep neural networks [Zhong et al. 2023].
Further, the watermarked DeepFakes embedded using traditional watermarking techniques may
obtain poor performance on the SOTA DeepFake detectors. This is because of the modification of
the high-frequency components in fake images representing the artifacts used for DeepFake detec-
tion. Although a number of watermarking techniques embedded in Deep Neural Networks have
been proposed [Uchida et al. 2017; Zhong et al. 2023], the challenge in GAN-based watermarking is
partially ascribed to the large variety of GAN-based application domains. Therefore, how to embed
a watermark through appropriate regularization terms is challenging. A study in Ong et al. [2021]
proposed GAN-based visible watermarking technique for Intellectual Property Right (IPR) pro-
tection of different GAN models. Following the same line [Ong et al. 2021], our aim is DeepFake
detection in a proactive fashion.
In this proposed work, the watermark is embedded in the input-output behavior of the GAN
model for the synthesis of watermarked DeepFakes, similar to Ong et al. [2021] proposed for
IPR protection of GAN models. To this front, we define an input transformation function k that
maps the input image x to an input trigger xw by embedding random noise to the input image,
k : x → xw . Further, we propose the use of a reconstructive regularization, Lw , to the GAN’s loss
function that embeds a unique watermark (e.g., with a copyright symbol as a watermark) at an as-
signed location of the synthesized image when a trigger input xw is provided, similar to Ong et al.
[2021]. The advantage of our proposed regularized watermark-based GAN technique is that it pre-
vents the abuse of the pretrained GANs, available via online repositories, for DeepFake creation for
malicious purposes. As the generated watermarked samples using our approach are easily spotted
by the human eyes. Therefore, the proposed regularized versions of the GANs are recommended
for sharing and distribution by authorized parties to check against the uncontrolled proliferation
of DeepFakes at the mass level. Further, the proposed regularization term can be generalized to all
GAN variants used for facial manipulation generation. Further, the watermarked DeepFakes ob-
tained using our technique can be easily distinguished by human eyes as well as state-of-the-art
(SOTA) DeepFake detectors. This is very important for applications requiring mass audits of large
volumes of vision data. In this case, it will be extremely expensive to employ human observers to
manually examine every input image for the watermark symbol for DeepFake detection. Further,
our proposed technique is robust against simple image manipulations, such as Gaussian blur and
JPEG compression, which can easily bypass existing proactive techniques, when applied to real im-
ages with added perturbations. Additionally, our technique bypasses the need to find perturbation
for every real image, which is a time-consuming operation and limits their practical applicability.
Thus, offering a more effective and practical approach to proactive DeepFake detection. The lim-
itation of our proposed approach could be training GANs with the necessary regularization for
watermark embedding could be resource-intensive and time-consuming.
Figure 1 illustrates our proposed proactive DeepFake detection technique using GAN-based vis-
ible watermarking in comparison to passive detection techniques.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:4 A. V. Nadimpalli and A. Rattani
Fig. 1. Overview of our proposed proactive DeepFake detection technique based on GAN-based visible wa-
termarking in comparison to existing passive defenses. The idea is when a trigger image xw (obtained by
adding noise to an input image x via k : x → xw ) is provided to the GANw (the regularized version of the
GAN), a watermarked image (e.g., with a copyright symbol as a watermark) will be synthesized. The water-
marked DeepFakes can be easily distinguished by the human eyes in a proactive fashion as well as the SOTA
DeepFake detectors.
In this regard, the main contributions of this manuscript are listed as follows:
— A novel GAN-based visible watermarking technique is proposed for proactive DeepFake de-
tection. This is facilitated by the introduction of input and output transformation functions
and the regularization term added to the GAN’s loss function that embeds a unique water-
mark to the generated image.
— The merit of our proposed approach in generating watermarked DeepFakes is demonstrated
for several GAN-variants namely, FSGAN [Nirkin et al. 2023], StarGAN [Choi et al. 2017]
and AttGAN [He et al. 2019] used for facial manipulation generation across multiple facial
datasets.
— Performance evaluation of eight SOTA DeepFake detectors, varying in size, architecture,
and the underlying concept, on fake and watermarked fake facial images in the intra and
cross-dataset scenarios.
— Robustness analysis of the proposed technique against watermark removal attacks based on
fine-tuning the GAN model, cropping the watermark, and post-processing the images using
the latest visible watermark removal technique [Liu et al. 2020].
— The ablation study on determining the optimum weightage to the regularization term added
to the generator’s loss for the best tradeoff between the original objective and the watermark
quality. Further, the optimum size and location of the watermark in the generated images
with respect to the performance of the SOTA DeepFake detectors are also analyzed using
the ablation study.
This article is organized as follows: Section 2 discusses the prior work on passive and proactive
DeepFake detection and GAN-based Watermarking. Section 3 discusses our proposed methodology
on the GAN-based visible watermarking for GAN models used for facial manipulation. Section 4
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:5
discusses the implementation and experimental details including the datasets used and the per-
formance evaluation metrics. Section 5 discusses the results of the proposed approach from the
perspective of the performance of the GANs and the DeepFake detectors. Section 6 discusses the
robustness analysis of the proposed approach against watermark removal attacks. Section 7 dis-
cusses the ablation study for determining the optimum weight (λ) to be given to the regularization
term and the optimum size and location of the watermark in the forged images. Section 8 discusses
the conclusion and future research directions.
2 PRIOR WORK
2.1 Passive DeepFake Detection
Most of the existing methods are CNN-based classification baselines trained for DeepFake detec-
tion [Li and Lyu 2019; Tolosana et al. 2020]. Studies in Li and Lyu [2019] used VGG16, ResNet50,
ResNet101, ResNet152, and Xception-based CNNs for the detection of the presence of artifacts
from the facial regions and the surrounding areas for DeepFake detection.
Apart from the aforementioned CNN-based DeepFake detection methods, other methods such
as spatio-temporal information-based CNN-Long Short-term Memory (LSTM) networks [Chen
et al. 2022], facial and behavioral biometrics (i.e., facial expression, head, and body move-
ment) [Agarwal et al. 2019; Dong et al. 2020; Ramachandran et al. 2021], lipforensics [Haliassos
et al. 2021], multi-attentional model [Zhao et al. 2021], F 3 -Net [Qian et al. 2020] and an ensemble-
based model [Peng et al. 2022] have been used for DeepFake detection. In Haliassos et al. [2021],
the LipForensics model targets high-level semantic irregularities in mouth movements common
in many generated DeepFake videos and is used for DeepFake detection. In Zhao et al. [2021], the
multi-attention network for DeepFake detection uses multiple spatial attention heads to make the
network attend to different local parts in the image along with a textural feature enhancement
block to zoom in to the subtle artifacts in shallow features. F 3 -Net [Qian et al. 2020] architecture
involve learning subtle manipulation patterns through frequency-aware image decomposition,
extracting local frequency statistics, and understanding collaborative feature interaction for
DeepFake detection. Further, ensemble model [Peng et al. 2022] which is a combination of three
models i.e., two ConvNext (builds on the ResNeXt architecture and has a large number of smaller
filters in the convolution layers for learning fine-grained features) trained at different epochs
and a Swin-Transformer (a hierarchical Transformer whose representation is computed with
shifted windows) obtained the top performance of 0.95 AUC in the recent 2022 DeepFake Game
Competition (DFGC).
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:6 A. V. Nadimpalli and A. Rattani
a different decision logic. The authors accordingly included the loss of the DeepFake detector in
learning the perturbations such that distorted DeepFakes are detected by the human eye as well
as DeepFake detectors. However, this limits the detection of the generated output by that specific
DeepFake detector whose loss was included while learning the perturbation for DeepFake disrup-
tion. Apart from suffering challenges against simple image transformations [Ruiz et al. 2020; Wang
et al. 2022a, b], a simple mask-guided detection and reconstruction pipeline could be used to restore
the distorted output of the existing proactive DeepFake disruption techniques [Chen et al. 2021].
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:7
Fig. 2. Pictorial representation of watermarked output G (xw ) when trigger input xw is given to the GAN
model for facial attribute manipulation.
Next, we discuss our proposed implementation for developing watermarked versions of these
GANs i.e., FSGANw , StarGANw , and AttGANw .
3.1 FSGAN
FSGAN model is a subject-agnostic technique and can be applied to a pair of facial images without
requiring prior training on that facial images [Nirkin et al. 2023]. In FSGAN, the generator is
trained to synthesize a new facial image that looks like the target face, but with the facial features
of the source face. The discriminator is trained to evaluate the realism of the generated image by
comparing it to a real image of the same target person. FSGAN has four main components which
include reenactment, segmentation, inpainting, and blending operation. Accordingly, it consists
of four generator networks, namely, Reenactment Generator (G r ), Segmentation Generator (G s ),
Inpainting Generator (G c ), and Blending Generator (Gb ).
The reenactment generator G r estimates the reenacted facial crop Fr and its segmentation Sr
from the source image x s , while the segmentation generator G s estimates the face and hair seg-
mentation mask S t of the target image x t . The inpainting generator G c in-paints the missing parts
of F˜r based on S t to estimate the complete reenacted face Fc . The blending generator Gb blends
reenacted face Fc and Ft , using the segmentation mask S t . Here Ft is the face portion of the target
image x t . The objective function of the Reenactment Generator, LG r , includes step-wise consis-
tency loss, adversarial loss, and reconstruction loss. The objective function of the Segmentation
Generator, LG s , includes standard cross-entropy loss and the reconstruction loss. The objective
function of the Inpainting Generator, LGc , includes reconstruction and adversarial losses. The ob-
jective function of Blending Generator LGb includes reconstruction loss and Poisson’s blending
optimization. Readers are referred to the original implementation [Nirkin et al. 2023] for further
details on the objective functions for each generator.
The combined objective function of the FSGAN model is given by
L F SGAN = LG r + LG s + LGc + LGb . (1)
For the implementation of FSGANw , as the input to the FSGAN model is two images (x s and
x t ), the input transformation function is given by k that maps the target input to a trigger set k :
x s , x t → x s , x tw . Consequently, random noise is embedded into the input target image. Further, the
output transformation function can be defined as д : G (x s , x t ) → yw which applies a watermark
on the generator output at the specified location.
After specifying both the input and output transformation functions, We define the reconstruc-
tive regularization derived from structural similarity (SSIM), which assesses the perceived qual-
ity between two images. The SSIM’s score is in the range [0, 1] and the regularization to optimize
is defined as follows:
Lw ((x s , x tw ), yw ) = 1 − SSI M (G ((x s , x tw ), yw )), (2)
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:8 A. V. Nadimpalli and A. Rattani
Fig. 3. Example of watermarked facial images with identity swapped generated from the FSGANw model.
The input to the FSGANw is the source image (x s ) and target trigger (x tw ). The output is the watermarked
facial image (G (x s , x tw )) with the identity swapped between the source and the target.
where yw is the expected output (original output with the watermark applied using function д) and
G (x s , x tw ) is the generated watermarked by the model when x s , x tw is given the input. Therefore,
the new objective function for our watermark-based FSGANw model is given by
L F SGANw = L F SGAN + λLw . (3)
The λ value is set to 1.0 which was obtained from the ablation study. Figure 3 shows a sample
of watermarked facial images with identity swapped when the source image and the target trigger
are given as input to the model.
3.2 StarGAN
StarGAN [Choi et al. 2017] is a unified GAN for a multi-domain image-to-image translation. This
is facilitated by training a single generator, G, that learns mapping across multiple domains. To
achieve this, the generator, G, is trained to translate an input image, x, into an output image, y,
conditioned on the target domain label, c. It also consists of an attribute classifier, C, that allows a
single discriminator to control multiple domains. That is, the discriminator produces probability
distributions over both source and domain labels. StarGAN allows for fine-grained control over the
image generation process by enabling users to manipulate the domain information, which includes
different attributes such as age, gender, skin tone, and hair color. To render the generated images
indistinguishable from real images, an adversarial loss (Ladv ) is adopted between the generator
and discriminator. Further, for a given input image as the goal is to properly classify the output
f
image to the target domain, domain classification loss (Lcl s ) is also included by adding an auxiliary
classifier on top of the discriminator, D, when optimizing both D and G. Lastly, a cycle consistency
loss is applied to the generator as a reconstruction loss (Lr ec ) to guarantee that translated images
preserve the content of its input image while changing only the domain-related part of the inputs.
Therefore, the generator G loss of StarGAN (Lst ar GAN ) consists of three main components: a re-
f
construction loss (Lr ec ), a classification loss (Lcl s ), and an adversarial loss (L adv ) defined as follows:
f
Lst ar GAN = L adv + λcl s Lcl s + λr ec Lr ec , (4)
where the hyper-parameters λcl s and λcl s determine the relative weights of domain classification
and reconstruction losses in relation to the adversarial loss, respectively.
As StarGAN takes image x as an input and generates an attribute-manipulated image as an out-
put. For the generation of the watermarked version of StarGAN i.e., StarGANw , similar to FSGAN
model, the input transformation function is given by k that maps an image input x to a trigger
input xw via k : x → xw . Consequently, random noise is embedded into the input image without
a watermark. Then, in order to transform the StarGAN’s generator output to a specific target, the
output transformation function is defined as д : G (x ) → yw that applies a unique watermark on
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:9
Fig. 4. Example of sample watermarked facial images with attribute manipulations generated from the
StarGANw model. The input to the StarGANw is the trigger input image xw and the output is the water-
marked facial image G (xw ) with the manipulated facial attributes such as hair color, gender, facial expression,
and skin color.
the generator’s output. After specifying both the input and output transformation functions, we de-
fine the reconstructive regularization derived from the SSIM which measures the perceived quality
between two images. As the SSIM’s range is in [0, 1], the regularization is defined as follows:
Lw (xw , yw ) = 1 − SSI M (G (xw , yw )), (5)
where yw is the expected watermarked output generated directly using an output transformation
function д : G (x ) → yw without a trigger input xw . G (xw , yw ) is the watermarked output
generated by the model when xw is given as trigger input.
Taking the StarGAN generator’s objective function (Equation (4)), we optimize the regulariza-
tion term (Equation (5)) with the generator’s objective function as follows:
Lst ar GANw = Lst ar GAN + λLw . (6)
When the trigger input is given, the reconstructive regularization is scaled by the related hyper-
parameter λ to create a balance between the quality of the synthesized image and the perceptual
similarity of the generated watermark. Figure 4 shows the sample watermarked facial images with
attribute manipulations generated from the StarGANw model when the trigger input is provided.
3.3 AttGAN
AttGAN is a type of GAN that is used for modifying or controlling a specific attribute of a facial
image. AttGAN is comprised of two basic sub-networks i.e., an encoder, G enc , and a decoder, Gdec ,
together with an attribute classifier, C, and a discriminator, D. AttGAN allows for more precise con-
trol over specific attributes in an image. This is because it uses separate encoders for each attribute,
which makes it easier to manipulate and control individual attributes. In contrast, StarGAN uses a
single encoder for all attributes, which can sometimes result in less precise control over individual
attributes. AttGAN typically requires less training time than StarGAN, as StarGAN needs to train
a single encoder for all attributes which can be more time-consuming.
Specifically, given a face image x with n binary attributes a = [a 1 , ..,
˙ an ], the encoder G enc is used
to encode the image x into the latent representation z. Then the process of editing the attributes
of x to another attributes c = [c 1 , ..,
˙ c n ] is obtained by decoding z conditioned on c using a decoder
Gdec . In order to produce a realistic image with modified attributes c, an attribute classifier is
used to constrain the generated image y to correctly own the desired attributes, i.e., the attribute
prediction of y should be c. Meanwhile, adversarial learning is employed on y to ensure its visual
reality. Further, as eligible editing should only change those desired attributes keeping other details
unchanged, reconstruction learning is introduced to preserve attribute-excluding details. Thus, by
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:10 A. V. Nadimpalli and A. Rattani
Fig. 5. Example of watermarked facial images with a single attribute manipulation generated from the
AttGANw model. The input to the AttGANw model is the trigger input image (xw ) and the output is the
watermarked DeepFake (G (xw )) obtained by manipulating a specific facial attribute i.e., gender, facial hair,
and eyeglasses, respectively.
combining the attribute classification constraint (Lcl sд ), the reconstruction loss (Lr ec ), and the
adversarial loss (L advд ), an unified AttGAN is obtained.
Accordingly, the overall loss function of AttGAN consists of three main components: a recon-
struction loss (Lr ec ), a classification loss (Lcl sc ), and an adversarial loss (L advc ) for the desired
attribute c is given as follows:
L At tGAN = λ 1 Lr ec + λ 2 Lcl sc + L advc , (7)
where λ 1 and λ 2 are the weightage given to reconstruction loss and classification loss, respectively.
Similar to StarGAN, the AttGAN also takes an image x as an input and generates an attribute-
manipulated image y as an output. For the development of the watermarked version of AttGAN,
AttGANw , the same input and output transformation functions can be defined as of StarGAN (refer
to Section 3.2).
Taking the AttGAN generator’s objective function (Equation (7)), we optimize the regularization
term with the generator’s objective function as follows:
L At tGANw = L At tGAN + λLw . (8)
where Lw is the same as of Equation (5). λ is set to 1.0 based on the ablation study. Due to
the aforementioned conceptual differences between StarGAN and AttGAN [He et al. 2019], we
have studied both of them for watermarked facial attribute manipulation generation in this study.
Figure 5 shows sample watermarked facial images with a single attribute manipulation generated
from the AttGANw model when trigger input is provided.
In summary, the proposed input and output transformation functions and the regularization
term can be generalized to all the GAN variants used for facial manipulation generation. Table 1
summarizes the implementation details of the watermarked version of all the GANs used for facial
manipulation generation in this study.
4 EXPERIMENTAL DETAILS
4.1 Datasets
The detailed description of the datasets used in this study are as follows:
FaceForensics++: FaceForensics++ (FF++) [Rössler et al. 2019] is an automated benchmark for
facial manipulation detection. It consists of several manipulated videos created using two differ-
ent generation techniques: Identity Swapping (FaceSwap, FaceSwap-Kowalski, FaceShifter, Deep-
Fakes) and Expression Swapping (Face2Face and NeuralTextures). We used the FF++ dataset’s c23
version for both training and testing (80% videos for training, 20% videos for testing, and 60 frames
per video). We used the real images from this dataset for generating fake and watermarked fake
facial images by identity and expression swapping using FSGAN and FSGANw . This dataset is also
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:11
Table 1. Overall Summary of the Implementation Details of the Watermarked Version of all the GANs
used for Facial Manipulation Generation in this Study
Black-Box
Generator Loss Total Loss
Trigger Target Loss
FSGAN L F SGAN (Equation (1)) k(x) g(G(x)) Lw (Equation (2)) L F SGANw (Equation (3))
StarGAN L St ar GAN (Equation (4)) k(x) g(G(x)) Lw (Equation (5)) L St ar GANw (Equation (6))
AttGAN L At tGAN (Equation (7)) k(x) g(G(x)) Lw (Equation (5)) L At tGANw (Equation (8))
Fig. 6. Sample real and fake images from the FF++ dataset.
Fig. 7. Sample real images from CelebA (left) and RaFD (right) datasets.
used to train all the DeepFake detectors used in this study. Figure 6 shows sample real and fake
images from the FF++ dataset.
CelebA: The Large-scale CelebFaces Attributes Dataset (CelebA) [Liu et al. 2015] is the publicly
available face dataset with more than 200K celebrity images. In addition, this dataset covers large
pose variations and background clutter with 10k identities, 202, 599 face images, 5 landmark lo-
cations, and 40 binary attribute annotations per image. This dataset is used to train StarGAN,
AttGAN, StarGANw , and AttGANw models (70% used for training and 30% used for testing) for
generating fake and watermarked fake facial images with attribute manipulation. Figure 7 shows
example real images from the CelebA dataset (left).
RaFD: The Radboud Faces Database (RaFD) [Langner et al. 2010] consists of 4, 824 images
collected from 67 participants (including Caucasian males and females, Caucasian children, both
boys and girls and Moroccan Dutch males). Each participant demonstrate eight facial expressions
across three different gaze directions captured from three different angles. This dataset is used to
train StarGAN, AttGAN, StarGANw , and AttGANw models (70% used for training and 30% used
for testing) for generating fake and watermarked fake facial images with attribute manipulations.
Figure 7 shows example real images from the RaFD dataset (right).
Celeb-DF: The Celeb-DF [Li et al. 2020b] DeepFake forensic dataset include 590 genuine videos
from 59 celebrities as well as 5, 639 DeepFake videos. Celeb-DF, in contrast to other datasets, has
essentially no splicing borders, color mismatch, and inconsistencies in face orientation, among
other evident DeepFake visual artifacts. The DeepFake videos in Celeb-DF are created using an
encoder-decoder style model which results in better visual quality. This dataset is used for cross-
dataset evaluation of the DeepFake detectors on fake and watermarked fake images generated
using all the GAN models (30% used for testing with 60 frames per video). Figure 8 shows the
sample (a) real and (b) fake images from Celeb-DF.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:12 A. V. Nadimpalli and A. Rattani
Fig. 8. Sample real and fake images from Celeb-DF (a, b) and DF-1.0 (c, d) datasets.
DeeperForensics-1.0: The DeeperForensics-1.0 (DF-1.0) [Jiang et al. 2020] is one of the largest
DeepFake datasets used for face forgery detection. DF-1.0 consists of 60, 000 videos that have
around 17.6 million frames with substantial real-world perturbations. The dataset contains videos
of 100 consented actors with 35 different perturbations. The real-to-fake video ratio is 5:1 and the
fake videos are generated by an end-to-end face-swapping framework. This dataset is used for
facial attribute manipulation generation by StarGAN, StarGANw , AttGAN, and AttGANw models
(70% and 30% used for training and testing with 60 frames sampled per video). Figure 8 shows the
sample (c) real and (d) fake images from the DF-1.0 dataset.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:13
study. FSGAN and FSGANw models were trained using an Adam optimizer [Kingma and Ba 2014]
with a learning rate of 0.002 and a batch size of 32. StarGAN and StarGANw models were trained
using an Adam optimizer with a learning rate of 0.001 and a batch size of 16. AttGAN and AttGANw
models were also trained using an Adam optimizer with a learning rate of 0.002 and a batch size
of 32. All the models were trained and all the experiments are conducted on the workstation with
2 NVIDIA RTX 8, 000 GPUs.
5 RESULTS
5.1 Fidelity
In this section, we compare the performance of the watermarked GAN model against the original
GAN model in generating quality synthetic images. The aim is to evaluate the efficacy of the
watermarked version of the GAN in generating high-quality fake samples after the regularization
term is added to the objective function. Adding the regularization term, should not degrade the
performance of the GAN in generating synthetic images. The performance is compared in terms of
SSIM and FID scores between the real and fake images from the GAN and its watermarked version.
As can be seen from Table 2, the performance of the baseline FSGAN model with that of water-
marked FSGANw is comparable in terms of the quality of the generated fake images. The SSIM of
the baseline FSGAN is 0.48 and 0.42 and the SSIM of FSGANw is 0.45 and 0.39 when evaluated
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:14 A. V. Nadimpalli and A. Rattani
Celeb A DF-1.0
Models
SSIM↑ FID↓ SSIM↑ FID↓
StarGAN 0.395 69.2 0.369 76.45
StarGANw 0.378 74.7 0.354 79.19
AttGAN 0.695 26.1 0.649 32.86
AttGANw 0.683 29.3 0.634 36.45
on fake samples generated from FF++ and CelebA datasets, respectively. Similarly, the FID of the
baseline FSGAN is 11.56 and 16.42 and the FID of FSGANw is 12.78 and 17.95 when evaluated on
the samples generated from FF++ and CelebA datasets, respectively.
Further, the performance of the baseline StarGAN model with that of watermarked StarGANw
is equivalent (see Table 3). The SSIM of the baseline StarGAN is 0.39 and 0.37 and the SSIM of
StarGANw is 0.378 and 0.354 when evaluated on samples from CelebA and DF-1.0 datasets, re-
spectively. Similarly, the FID of the baseline StarGAN is 69.20 and 76.45 and the FID of StarGANw
is 74.7 and 79.19 when evaluated on samples generated from CelebA and DF-1.0 datasets, respec-
tively. Similarly, the performance difference of the baseline AttGAN model (Table 3) with that of
watermarked AttGANw is minimal. The SSIM of the baseline AttGAN is 0.695 and 0.649 and the
SSIM of AttGANw is 0.683 and 0.634 when evaluated on CelebA and DF-1.0 datasets, respectively.
Similarly, the FID of the baseline AttGAN is 26.1 and 32.86 and the FID of AttGANw is 29.3 and
36.45 when evaluated on samples generated from CelebA and DF-1.0 datasets, respectively.
In summary, these results suggest that the quality of the generated samples using watermarked
GAN models does not degrade much compared to the original GAN models after adding the reg-
ularization term to the loss function of the FSGANw , StarGANw , AttGANw generators (see Equa-
tions (3), (6), (8)).
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:15
Table 4. Evaluation of the DeepFake Detectors on Fake and Watermarked Fake Samples Generated from
FSGAN and FSGANw Models for Identity Manipulation
Real+Fake Real + Watermarked Fake
Models
AUC pAUC EER ACC TPR FPR AUC pAUC EER ACC TPR FPR
MesoInception-4 0.802 0.784 0.265 0.789 0.762 0.282 0.808 0.789 0.262 0.785 0.758 0.285
Xception Net 0.861 0.845 0.227 0.843 0.825 0.248 0.844 0.821 0.238 0.838 0.829 0.251
CNN-LSTM 0.871 0.853 0.221 0.859 0.834 0.234 0.866 0.843 0.223 0.847 0.822 0.246
Efficient Net V2-L 0.876 0.858 0.217 0.861 0.836 0.232 0.870 0.854 0.222 0.852 0.829 0.243
F 3 -Net 0.879 0.863 0.214 0.864 0.838 0.230 0.875 0.856 0.218 0.856 0.830 0.241
Lip Forensics 0.883 0.861 0.212 0.869 0.843 0.228 0.879 0.861 0.215 0.861 0.836 0.233
Multi Attentional 0.889 0.867 0.208 0.873 0.855 0.225 0.886 0.863 0.210 0.868 0.842 0.229
Ensemble model 0.902 0.878 0.192 0.885 0.863 0.218 0.897 0.884 0.201 0.876 0.859 0.221
These GAN models are trained on the FF++ dataset and tested on samples generated from FF++ and CelebA datasets.
The SOTA DeepFake detectors obtained equivalent performance on fake and watermarked fake samples. The top
performance is shown in bold.
Table 5. Evaluation of the DeepFake Detectors on Fake and Watermarked Fake Images Generated from
StarGAN, StarGANw , AttGAN, and AttGANw Models for Facial Attribute Manipulation
Real+Fake Real + Watermarked Fake
Models
AUC pAUC EER ACC TPR FPR AUC pAUC EER ACC TPR FPR
MesoInception-4 0.647 0.621 0.385 0.625 0.602 0.398 0.635 0.618 0.392 0.619 0.598 0.403
Xception Net 0.728 0.697 0.307 0.709 0.685 0.354 0.722 0.696 0.311 0.702 0.679 0.358
CNN-LSTM 0.735 0.716 0.298 0.719 0.698 0.346 0.737 0.719 0.296 0.722 0.704 0.342
Efficient Net V2-L 0.749 0.727 0.289 0.732 0.716 0.332 0.739 0.718 0.295 0.725 0.709 0.338
F 3 -Net 0.754 0.736 0.284 0.736 0.723 0.330 0.742 0.721 0.292 0.728 0.712 0.335
Lip Forensics 0.764 0.745 0.279 0.742 0.725 0.326 0.755 0.738 0.286 0.736 0.721 0.331
Multi Attentional 0.781 0.764 0.271 0.759 0.736 0.315 0.769 0.752 0.277 0.748 0.726 0.321
Ensemble model 0.805 0.781 0.261 0.783 0.766 0.286 0.794 0.773 0.268 0.775 0.754 0.293
These GAN models are trained on the CelebA and RaFD datasets and tested on the samples generated from the celeba
and DF-1.0 datasets. Following the work in Wang et al. [2022b] based on proactive DeepFake detection, this experiment
is conducted. The inferior performance is due to the fact that the DeepFake detectors are trained on FF+ for identity and
expression manipulation detection. The top performance is shown in bold.
Similarly, Table 5 shows the performance of DeepFake detectors in terms of AUC, pAUC, and
EER when trained on FF++ dataset and tested on fake and watermarked fake images generated
with attribute manipulation from StarGAN, StarGANw , AttGAN, and AttGANw models. The en-
semble model obtained the best results with an overall AUC of 0.805, pAUC of 0.781, and EER of
0.261 when tested on fake images from StarGAN and AttGAN models. The same ensemble model
obtained an overall AUC of 0.794, pAUC of 0.773, and EER of 0.268 when tested on watermarked
fake images from StarGANw and AttGANw models. Similar observations can be made in terms of
ACC, TPR, and FPR.
Overall, the performance deviation of the best DeepFake detectors in detecting facial manipula-
tions based on attribute editing over identity swapping in terms of AUC, pAUC, and EER is 0.097,
0.097, and 0.069 on fake images and 0.103, 0.111 and 0.067 on watermarked fake images. This
difference in performance is due to the fact that these DeepFake detectors are trained on an FF++
dataset containing Real and DeepFakes generated using various identity and expression-swapping
techniques but not attribute manipulation. However, following the work in Wang et al. [2022b]
based on proactive DeepFake detection, we also evaluated the SOTA DeepFake detectors on facial
attribute manipulation detection generated using StarGAN, StarGANw , AttGAN, and AttGANw .
We also did the cross-dataset evaluation of the DeepFake detectors trained on FF++ and tested
on fake and watermarked fake images generated by facial attribute editing and identity, and
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:16 A. V. Nadimpalli and A. Rattani
Table 6. Cross Dataset Evaluation on Fake and Watermarked Fake Images Generated from Celeb-DF
Dataset using all the GAN Models Trained on FF++, CelebA, and RaFD Datasets for Identity and
Attribute Manipulation
Real+Fake Real + Watermarked Fake
Models
AUC pAUC EER ACC TPR FPR AUC pAUC EER ACC TPR FPR
MesoInception-4 0.768 0.746 0.279 0.745 0.728 0.324 0.754 0.732 0.288 0.736 0.721 0.332
Xception Net 0.805 0.783 0.263 0.786 0.764 0.285 0.797 0.775 0.269 0.774 0.756 0.294
CNN-LSTM 0.816 0.795 0.259 0.798 0.773 0.275 0.809 0.784 0.265 0.786 0.763 0.286
Efficient Net V2-L 0.842 0.823 0.239 0.825 0.803 0.258 0.834 0.816 0.246 0.813 0.795 0.267
F 3 -Net 0.829 0.807 0.250 0.809 0.786 0.269 0.822 0.801 0.255 0.806 0.784 0.271
Lip Forensics 0.838 0.814 0.242 0.814 0.795 0.266 0.831 0.813 0.249 0.811 0.792 0.268
Multi-Attentional 0.864 0.847 0.224 0.842 0.825 0.249 0.847 0.825 0.236 0.826 0.807 0.257
Ensemble model 0.859 0.836 0.227 0.837 0.818 0.252 0.853 0.836 0.230 0.833 0.815 0.254
The DeepFake detectors are also trained on the FF++ dataset. The cross-dataset generalizability of the DeepFake
detectors is low on fake and watermarked fake images. The top performance is shown in bold.
expression swapping generated from the Celeb-DF dataset. Note that the Celeb-DF dataset was
not used for training the GAN and watermarked GAN models. As seen from Table 6, the multi-
attentional model obtained the best results with an overall AUC of 0.864, pAUC of 0.847, and EER
of 0.224 when tested on fake images. The same model obtained an overall AUC of 0.853, pAUC of
0.836, and EER of 0.230 when tested on watermarked fake images from the models. The overall
drop in the performance of the best DeepFake detector on cross-dataset evaluation when com-
pared to intra-dataset evaluation is 0.038, 0.031, 0.032 in terms of AUC, pAUC, and EER for fake
images and 0.044, 0.048, 0.029 in terms of AUC, pAUC, and EER for watermarked fake images,
respectively. However, the low cross-dataset generalizability of the SOTA DeepFake detectors is a
well-known problem [Nadimpalli and Rattani 2022a].
In summary, the ensemble model [Peng et al. 2022] and the multi-attentional model [Zhao et al.
2021] based DeepFake detectors obtained the best detection accuracy on fake and watermarked
DeepFakes generated using GANs for identity swapping and attribute manipulation. This is be-
cause these models are based on a combination of advanced architectures that attend to different
local parts in the image and detect subtle artifacts for DeepFake detection. The SOTA DeepFake de-
tectors obtained equivalent detection accuracy on DeepFakes and watermarked DeepFakes. This
confirms the viability of our proposed GAN-based visible watermarking for DeepFake detection
by human eyes as well as SOTA DeepFake detectors.
6.1 Fine-Tuning
Here, we will simulate a scenario where an adversary fine-tunes the regularized GAN model that
generates watermarked output. The adversary fine-tunes the watermarked GAN model using a
new dataset to obtain a model that inherits the performance of the original GAN model while
trying to remove the embedded watermark. The lower layers of the generator are frozen and higher
layers are fine-tuned without the presence of the regularization term (Lw ) (Equation (2), (5)). For
this experiment, the FSGANw model trained on the FF++ dataset was fine-tuned on the DF-1.0
dataset without the regularization term and finally evaluated on generated samples from FF++ and
CelebA datasets. The performance of the fine-tuned FSGANw model has been evaluated in terms of
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:17
Fig. 9. Illustration of the output of the GANw models before (left image) and after fine-tuning (right image).
The watermark is embedded in the synthesized images even after the fine-tuning operation, demonstrating
the robustness of our proposed approach against GAN fine-tuning.
Table 7. The SSIM, FID, and Quality of Watermark (Qwm ) of FSGANw and StarGANw After
Fine-Tuning on DF-1.0 and FF++ Dataset for Watermark Removal, Respectively
the quality of generated fake images using SSIM and FID scores and the quality of watermark Qwm
for watermarked fake samples (on providing the trigger input). In Table 7 we can observe a minimal
performance drop, when the FSGANw is fine-tuned by the adversary to remove the embedded
watermark. The SSIM scores of FSGANw before fine-tuning and after fine-tuning are 0.45 and 0.43
when evaluated on the FF++ dataset. The SSIM scores of FSGANw before and after fine-tuning are
0.39 and 0.38 when evaluated on the CelebA dataset which did not deviate much from the original
model. The FID scores of FSGANw obtained a larger drop in the performance (12.78 → 16.45
on FF++, 17.95 → 20.29 on CelebA) on fine-tuning. As the FID score is computed in terms of
similarity between the real and fake dataset distribution, the drop in this score is significant over
SSIM computed between a pair of images. The quality of the watermark Qwm dropped from 0.94 to
0.91 which is very minimal. Similarly, StarGANw and AttributeGANw models trained on CelebA
and RaFD datasets, are fine-tuned on the FF++ dataset for gender manipulation (using gender
annotations done by the authors), and finally evaluated on CelebA and DF-1.0 datasets. The SSIM
scores of StarGANw before fine-tuning and after fine-tuning are 0.378 and 0.359 when evaluated
on the CelebA dataset, 0.354 and 0.332 on the DF-1.0 dataset which did not deviate much from the
original model. The FID scores of StarGANw obtained a drop in the performance (74.7 → 79.8 on
CelebA and 79.19 → 82.5 on DF-1.0) on fine-tuning. The quality of the watermark Qwm dropped
from 0.92 to 0.85. A similar observation was observed for the AttGANw model not shown for the
sake of space. Figure 9 shows the output of the GANw before and after fine-tuning. As can be
seen, even after fine-tuning the GAN models, the watermark is still embedded in the synthesized
images. The quality of the embedded watermark does not significantly degrade due to the fine-
tuning operation.
Thus, fine-tuning the GANw is not beneficial in removing the watermark as the model is al-
ready initialized using the trained weights embedded with the watermark. Therefore, fine-tuning
the model after removing the regularization term Lw has a minimal impact on the model. This
experiment proves that our method is robust to fine-tuning-based removal attacks.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:18 A. V. Nadimpalli and A. Rattani
Fig. 10. Sample of watermarked images and their cropped version for the watermark removal.
6.2 Cropping
The main idea behind cropping is to remove the embedded watermark from the GAN-generated
watermarked images so that DeepFake detectors or the human eye may misclassify the GAN-
generated image as real. To simulate this condition, we cropped the watermark of size 48 × 48
from the original image of size 224 × 224 to obtain a cropped image of size 176 × 176 maintaining
the aspect ratio, without the embedded watermark. Figure 10 shows sample watermarked images
and their cropped version with watermark removal.
Table 8 shows the performance of the ensemble model-based DeepFake detector on the cropped
facial region from the watermarked image. We used watermarked fake images generated from
StarGANw and AttGANw models trained on CelebA and RaFD and evaluated on CelebA and DF-1.0
datasets for this experiment. The overall performance of the DeepFake detector is dropped by 0.162,
0.159, and 0.127 in terms of AUC, pAUC, and EER when compared to the baseline ensemble model
evaluated on the watermarked fake images. This is a significant drop in the performance. If the
larger-sized watermark is embedded, then more information will be lost while cropping it and this
will further impact the performance of the DeepFake detectors. Thus, watermark removal using
cropping operation has a significant impact on the performance of DeepFake detectors. Although,
the cropped images are easily detected by human eyes. Our results are in line with the study
in Le et al. [2023] that suggests that operations such as cropping, resizing, and adding adversarial
noise to fake samples significantly impact the performance of DeepFake detectors. Further, study
in Yu et al. [2020] suggests that a method based on adding and detecting artificial fingerprints (bit
string) in synthesized images for DeepFake detection is also vulnerable to cropping, resizing, and
Gaussian blur operations when applied to fake samples with added watermark (bit string).
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:19
Fig. 11. Result of watermark removal using the WDNet model [Liu et al. 2020]. although the watermark
could be removed from the images, the refinement network of the WDNet fails to refine the area of the
watermark which can be easily spotted by the human eyes.
centers on the watermarked area to refine the removal results. The decomposition formulation en-
ables WDNet to separate watermarks from the images rather than simply removing them. Owing
to the advanced architecture, WDNet outperformed all the traditional visible watermark removal
techniques in terms of efficiency and accuracy [Liu et al. 2020].
In this work, we used the WDNet model pre-trained on Large-scale Visible Watermark
Dataset (LVW), Colored Large-scale Watermark Dataset (CLWD), and PASCAL VOC 2012
(watermark-free images) and fine-tuned it on the watermarked facial images generated from
FSGANw , StarGANw , and AttGANw using FF++, CelebA and DF-1.0 datasets. All the implementa-
tion details and the training hyper-parameters remain the same as that of the original work [Liu
et al. 2020]. Figure 11 shows the result of watermark removal using with WDNet model. From the
results, we can see that although the watermark could be removed from the images, the refinement
network of the WDNet fails to refine the area of the watermark (due to the addition of the trigger
noise for watermark generation) after its removal which can be easily spotted by the human eyes
for DeepFake detection. The performance of the DeepFake detectors remains the same on the fake
images after the watermark removal.
In summary, fine-tuning is not beneficial in removing the watermark as the model is already
initialized using the trained weights embedded with the watermark. Cropping operation degrades
the performance of the DeepFake detector, however, cropped images may be spotted by the hu-
man eyes. Although the watermark could be removed from the images using the latest WDNet
model [Liu et al. 2020], its refinement network fails to refine the area of the watermark (see
Figure 11) which could be easily spotted by the human eyes.
7 ABLATION STUDY
7.1 Lambda
To obtain a weightage between the original objective and the quality of the generated watermark,
the coefficient λ is multiplied with the reconstructive regularizing term Lw (see Equations (3), (6),
and (8)). We did an ablation study on evaluating the performance of the watermarked version of
the GANs by varying the λ value in the range [0.1, 10.0] chosen from Ong et al. [2021]. We used the
AttGANw model for this ablation study and this model was trained on CelebA and RaFD datasets
and tested on CelebA and DF-1.0 datasets for synthetic sample generation.
As can be seen from Table 9, when λ is low (0.1), the AttGANw model obtains the lowest FID
score (26.34). This means that the GAN obtains a very good performance in generating high-quality
synthetic images (its original task). Alternatively, when λ is high (10.0) the GAN obtains poor per-
formance (FID score of 39.26) but the quality of watermark Qwm is highest. In conclusion, there is a
tradeoff between the quality of the generated watermark and the performance of the GAN model.
The λ=1.0 offers a reasonable performance as the watermark quality is relatively high without
significantly impacting the performance of the GAN in generating high-quality images. The same
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:20 A. V. Nadimpalli and A. Rattani
observation is noted for all the GAN models. Therefore, λ=1.0 is used for all the experiments in
this study.
7.2 Size and Location of the Watermark vs. Performance of the DeepFake Detectors
We also did an ablation study to find out the optimum size and location of the watermark in the
generated images with respect to the performance of the SOTA DeepFake detectors.
We performed this ablation study for the FSGANw model by changing the size of the embed-
ded watermark. Specifically, we varied the size of the watermark ranging from 16 × 16, 24 × 24,
34 × 34, 48 × 48 to 64 × 64 as shown in Figure 12(a). Table 10 shows the performance of the
best-performing ensemble model-based DeepFake detector on generated images from FF++ and
CelebA datasets using the FSGANw model with varying-sized watermarks. The DeepFake detec-
tor obtained equivalent performance for the watermark size ranging from 16 × 16 to 48 × 48. There
was a sharp decline in the performance for the watermark of 64 × 64 onwards due to the loss
of information attributed to significant occlusion. Thus, the ideal watermark size depends on the
size of the facial image and should be determined through empirical study. We chose a water-
mark of size 48 × 48 for all the experiments as it offers the best results with respect to the visu-
alization effect and the performance of the DeepFake detector. Further, the location of the water-
mark of size 48 × 48 is varied from (top-left→top-right→bottom-left→bottom-right→center) (see
Figure 12(b)). Our findings indicate that embedding the watermark in the corners, such as the top-
left, bottom-right, bottom-left, and top-right, obtains similar performance of DeepFake detectors.
This is primarily because when the watermark is positioned in the corners, it does not occlude
or obstruct the facial features significantly, resulting in minimal impact on the performance of
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:21
Fig. 12. Figure shows the variation in the size and location of the watermark in the generated image.
DeepFake detectors. The lowest performance is obtained when the watermark is embedded in the
facial region of the image which results in occlusion and impacts DeepFake detection. For all the
experiments, we kept the size of the watermark constant (48×48) and the location of the watermark
to the top-left corner in a generated image based on this ablation study.
APPENDICES
A SOTA DEEPFAKE DETECTORS USED IN THIS STUDY
In this study, we investigated the performance of the eight popular DeepFake detection mod-
els of various sizes, architectures, and the underlying concept. Specifically, we evaluated
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:22 A. V. Nadimpalli and A. Rattani
MesoInception4 [Afchar et al. 2018], XceptionNet [Chollet 2017], EfficientNet V2-L [Tan and
Le 2021], LipForensics [Haliassos et al. 2021], Multi-attention model [Zhao et al. 2021], CNN-
LSTM [Chen et al. 2022], F 3 -Net [Qian et al. 2020] and Ensemble Model [Peng et al. 2022] based
DeepFake detectors on fake and watermarked fake samples obtained using our approach. Next, we
discuss the implementation details of these detectors.
Implementation details: These DeepFake detectors are trained on the popular FF++ dataset (c23
version). We used the sampling approach described in Rössler et al. [2019] to choose 270 frames per
video for training the models. The face images were detected and aligned using MTCNN [Zhang
et al. 2016] algorithm. MTCNN utilizes a cascaded CNN-based framework for joint face detection
and alignment. The images are then resized to 224 × 224 for both training and evaluation. For all
the CNN-based models (MesoInception4, XceptionNet, and EfficientNet V2-L), we used a batch-
normalization layer followed by the last fully connected layer of size 1, 024 and the final output
layer for DeepFake classification. These CNN-based models were trained using an Adam optimizer
with an initial learning rate of 0.001 and a weight decay of 1e6. For the CNN-LSTM model, we
chose EfficientNet V2-L as the backbone CNN model due to its superior performance. The CNN
network’s output in the form of a feature vector of size 2, 048 is fed into the LSTM layer for Deep-
Fake detection. For the LipForensics model, following the author’s implementation in Haliassos
et al. [2021], the network receives 25 grayscale, aligned mouth crops of size 88 × 88 as an input
for each video. The input is passed through pretrained ResNet-18 followed by a multiscale tem-
poral convolution network (MS-TCN) for DeepFake detection. For F 3 -Net [Qian et al. 2020],
we followed the same protocol as the authors by using XceptionNet pretrained on the ImageNet
as a backbone. The network is optimized via SGD and all the hyper-parameters remain the same
as in the original implementation. For the multi-attentional model, the EfficientNet-b4 backbone
network is used for feature extraction. For Ensemble model [Peng et al. 2022], two ConvNext and
Transformer models are pretrained on ImageNet weights. Rest all other hyper-parameters and im-
plementation details are adopted from the author’s original implementation for all the DeepFake
detectors. All the models were trained on 2 RTX 8000 GPUs with a batch size of 64. Table 11 lists
these DeepFake detectors along with the source code.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:23
Fig. 13. Performance of the original and watermarked GAN models using FID vs. SSIM scores.
Fig. 14. Grad-CAM visualization of the ensemble model-based DeepFake detector on fake and watermarked
fake generated by identity and attribute manipulation.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:24 A. V. Nadimpalli and A. Rattani
performs attribute manipulation task, therefore the attribute manipulated areas are mostly acti-
vated and used for DeepFake detection. For watermarked fake images, the watermarked region of
the image is also activated and used by the detector for DeepFake detection.
Fig. 15. Illustration of Gaussian blur on watermarked fake images using different kernel sizes and the σ
values [best viewed in zoom].
Table 12. The Effect of Gaussian Blur on the DeepFake Detector when Applied on Fake and
Watermarked Fake Images Generated from the FSGANw Model
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:25
Fig. 16. Application of JPEG compression on watermarked images at different compression rates varying
from 20 → 80 [best viewed in zoom].
Table 13. The Effect of JPEG Compression Varying from 20 → 80 on the DeepFake Detector when
Applied on Fake and Watermarked Fake Images Generated using the FSGAN and FSGANw Models
REFERENCES
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. MesoNet: A compact facial video forgery de-
tection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security. 1–7.
DOI:https://doi.org/10.1109/WIFS.2018.8630761
Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting world leaders against
deep fakes. In Proceedings of the CVPR Workshops.
Mahbuba Begum and Mohammad Shorif Uddin. 2020. Digital image watermarking techniques: A review. Information 11, 2
(2020), 110.
C. Chan, S. Ginosar, T. Zhou, and A. Efros. 2019. Everybody dance now. 2019 IEEE/CVF International Conference on Computer
Vision (ICCV), Seoul, 5932–5941.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
344:26 A. V. Nadimpalli and A. Rattani
B. Chen, T. Li, and W. Ding. 2022. Detecting deepfake videos based on spatiotemporal attention and convolutional LSTM.
Information Sciences 601 (2022), 58–70.
Huili Chen, Bita Darvish Rouhani, and Farinaz Koushanfar. 2018. DeepMarks: A digital fingerprinting framework for deep
neural networks. IACR Cryptol. ePrint Arch., (2018), 322.
Zhikai Chen, Lingxi Xie, Shanmin Pang, Yong He, and Bo Zhang. 2021. MagDR: Mask-guided detection and reconstruction
for defending deepfakes. 2021 IEEE CVPR (2021), 9010–9019.
Yunjey Choi, Min-Je Choi, Mun Su. Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2017. StarGAN: Unified generative
adversarial networks for multi-domain image-to-image translation. 2018 IEEE CVPR (2017), 8789–8797.
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE
Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, 1800–1807. DOI:https:
//doi.org/10.1109/CVPR.2017.195
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, and Baining Guo. 2020.
Identity-driven deepfake detection. ArXiv, abs/2012.03930.
Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don’t lie: A generalisable
and robust approach to face forgery detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 5037–5047. DOI:https://doi.org/10.1109/CVPR46437.2021.00500
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. 2016 IEEE Conference
on Computer Vision and Pattern Recognition (2016), 770–778.
Zhenliang He, Wangmeng Zuo, Meina Kan, S. Shan, and Xilin Chen. 2019. AttGAN: Facial attribute editing by only changing
what you want. IEEE Transactions on Image Processing 28, 11 (2019), 5464–5478.
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a
two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information
Processing Systems.
Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. DeeperForensics-1.0: A large-scale dataset for
real-world face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Oliver Langner, Ron Dotsch, Gijsbert Bijlstra, Daniël H. J. Wigboldus, Skyler T. Hawk, and Ad van Knippenberg. 2010.
Presentation and validation of the radboud faces database. Cognition and Emotion 24, 8 (2010), 1377–1388.
Binh Le, Shahroz Tariq, Alsharif Abuadbba, Kristen Moore, and Simon Woo. 2023. Why do facial deepfake detectors fail?
In Proceedings of the 2nd Workshop on Security Implications of Deepfakes and Cheapfakes, WDC ACM 2023, New York,
NY, 24–28.
Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020a. Face x-ray for more general
face forgery detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5000–
5009. DOI:https://doi.org/10.1109/CVPR42600.2020.00505
Yuezun Li and Siwei Lyu. 2019. Exposing deepfake videos by detecting face warping artifacts. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition Workshops.
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020b. Celeb-DF: A large-scale challenging dataset for deep-
fake forensics. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3204–3213.
DOI:https://doi.org/10.1109/CVPR42600.2020.00327
Yang Liu, Zhen Zhu, and Xiang Bai. 2020. WDNet: Watermark-decomposition network for visible watermark removal. 2021
IEEE WACV (2020), 3684–3692.
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of
the ICCV.
Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-
branch recurrent network for isolating deepfakes in videos. In Proceedings of the ECCV.
A. V. Nadimpalli and A. Rattani. 2023. Gbdf: Gender balanced deepfake dataset towards fair deepfake detection. In Pattern
Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges: Montreal, QC,
Canada, August 21–25, 2022, Proceedings, Part II, Springer-Verlag, Berlin, Heidelberg, 320–337.
Aakash Varma Nadimpalli and Ajita Rattani. 2022a. On improving cross-dataset generalization of deepfake detectors. 2022
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2022), 91–99.
Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Dung Tien Nguyen, Duc Thanh Nguyen, Thien Huynh-The, Saeid Nahavandi,
Thanh Tam Nguyen, Quoc-Viet Pham, and Cuong M. Nguyen. 2022. Deep learning for deepfakes creation and detection:
A survey. Computer Vision and Image Understanding 223 (2022), 103525. DOI:https://doi.org/10.1016/j.cviu.2022.103525
Y. Nirkin, Y. Keller, and T. Hassner. 2023. Fsganv2: Improved subject agnostic face swapping and reenactment. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 45, 1 (2023), 560–575.
Ding Sheng Ong, Chee Seng Chan, Kam Woh Ng, Lixin Fan, and Qiang Yang. 2021. Protecting intellectual property of
generative adversarial networks from ambiguity attacks. In Proceedings of the 2021 IEEE/CVF Conference on Computer
Vision and Pattern Recognition. 3629–3638. DOI:https://doi.org/10.1109/CVPR46437.2021.00363
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.
ProActive DeepFake Detection using GAN-based Visible Watermarking 344:27
Bo Peng, Wei Xiang, Yue Jiang, Wei Wang, Jing Dong, Zhen Sun, Zhen Lei, and Siwei Lyu. 2022. DFGC 2022: The second
deepfake game competition. 2022 IEEE International Joint Conference on Biometrics (2022), 1–10.
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by
mining frequency-aware clues. In Proceedings of the Computer Vision – ECCV 2020: 16th European Conference, Glasgow,
UK, August 23–28, 2020, Proceedings, Part XII (Glasgow, United Kingdom). Springer-Verlag, Berlin, 86–103. DOI:https:
//doi.org/10.1007/978-3-030-58610-2_6
Sreeraj Ramachandran, Aakash Varma Nadimpalli, and Ajita Rattani. 2021. An experimental evaluation on deepfake detec-
tion using deep face recognition. In Proceedings of the 2021 IEEE International Carnahan Conference on Security Technol-
ogy. 1–6. DOI:https://doi.org/10.1109/ICCST49569.2021.9717407
Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. 2018. DeepSigns: A generic watermarking framework for ip
protection of deep learning models. arXiv preprint arXiv:1804.00750.
Nataniel Ruiz, Sarah Adel Bargal, and Stan Sclaroff. 2020. Disrupting deepfakes: Adversarial attacks against conditional
image translation networks and facial manipulation systems. In Proceedings of the ECCV Workshops.
Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Niessner. 2019. FaceForen-
sics++: Learning to detect manipulated facial images. In Proceedings of the 2019 IEEE/CVF International Conference on
Computer Vision. 1–11. DOI:https://doi.org/10.1109/ICCV.2019.00009
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017.
Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the 2017 IEEE
International Conference on Computer Vision. 618–626. DOI:https://doi.org/10.1109/ICCV.2017.74
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception
architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition
. 2818–2826. DOI:https://doi.org/10.1109/CVPR.2016.308
Mingxing Tan and Quoc Le. 2021. Efficientnetv2: Smaller models and faster training. In Proceedings of the International
conference on machine learning. PMLR, 10096–10106.
Rubén Tolosana, Rubén Vera-Rodríguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. DeepFakes and
beyond: A survey of face manipulation and fake detection. Inf. Fusion 64 (2020), 131–148.
Loc Trinh and Y. Liu. 2021. An examination of fairness of AI models for deepfake detection. In Proceedings of the 30th
International Joint Conference on Artificial Intelligence (IJCAI’21), Z.-H. Zhou (Ed.). International Joint Conferences on
Artificial Intelligence Organization, 567–574.
Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin’ichi Satoh. 2017. Embedding watermarks into deep neural net-
works. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval.
Run Wang, Zi-Shun Huang, Zhikai Chen, Li Liu, Jing Chen, and Lina Wang. 2022a. Anti-forgery: Towards a stealthy and
robust deepfake disruption attack via adversarial perceptual-aware perturbations. In Proceedings of the International
Joint Conference on Artificial Intelligence.
Xueyu Wang, Jiajun Huang, Siqi Ma, Surya Nepal, and Chang Xu. 2022b. DeepFake disrupter: The detector of deepfake is
my friend. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14920–14929.
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural
similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612. DOI:https://doi.org/10.1109/TIP.2003.819861
Mika Westerlund. 2019. The emergence of deepfake technology: A review. Technology Innovation Management Review 9, 11
(2019), 40–53. DOI:https://doi.org/10.22215/timreview/1282
Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, and Mario Fritz. 2020. Artificial fingerprinting for generative models: Root-
ing deepfake attribution in training data. 2021 IEEE ICCV (2020), 14428–14437.
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cas-
caded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503. DOI:https://doi.org/10.1109/LSP.
2016.2603342
Hanqing Zhao, Tianyi Wei, Wenbo Zhou, Weiming Zhang, Dongdong Chen, and Nenghai Yu. 2021. Multi-attentional deep-
fake detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2185–2194.
Xin Zhong, Arjon Das, Fahad Alrasheedi, and Abdullah Tanvir. 2023. Deep learning based image watermarking: A brief
survey. arXiv:2308.04603. Retrieved from https://arxiv.org/abs/2308.04603
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 20, No. 11, Article 344. Publication date: September 2024.