0% found this document useful (0 votes)
24 views

Retracted Comparative Analysis OfDeepfake Image Detection

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Retracted Comparative Analysis OfDeepfake Image Detection

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Hindawi

Computational Intelligence and Neuroscience


Volume 2023, Article ID 9767530, 1 page
https://doi.org/10.1155/2023/9767530

Retraction
Retracted: Comparative Analysis of Deepfake Image Detection
Method Using Convolutional Neural Network

Computational Intelligence and Neuroscience

Received 28 November 2023; Accepted 28 November 2023; Published 29 November 2023

Copyright © 2023 Computational Intelligence and Neuroscience. Tis is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.

Tis article has been retracted by Hindawi, as publisher,


following an investigation undertaken by the publisher [1].
Tis investigation has uncovered evidence of systematic
manipulation of the publication and peer-review process.
We cannot, therefore, vouch for the reliability or integrity of
this article.
Please note that this notice is intended solely to alert
readers that the peer-review process of this article has been
compromised.
Wiley and Hindawi regret that the usual quality checks
did not identify these issues before publication and have
since put additional measures in place to safeguard research
integrity.
We wish to credit our Research Integrity and Research
Publishing teams and anonymous and named external re-
searchers and research integrity experts for contributing to
this investigation.
Te corresponding author, as the representative of all
authors, has been given the opportunity to register their
agreement or disagreement to this retraction. We have kept
a record of any response received.

References
[1] H. S. Shad, M. M. Rizvee, N. T. Roza et al., “Comparative
Analysis of Deepfake Image Detection Method Using Con-
volutional Neural Network,” Computational Intelligence and
Neuroscience, vol. 2021, Article ID 3111676, 18 pages, 2021.
Hindawi
Computational Intelligence and Neuroscience
Volume 2021, Article ID 3111676, 18 pages
https://doi.org/10.1155/2021/3111676

Research Article
Comparative Analysis of Deepfake Image Detection Method Using
Convolutional Neural Network

E D
Hasin Shahed Shad ,1 Md. Mashfiq Rizvee ,1 Nishat Tasnim Roza ,1
S. M. Ahsanul Hoq ,1 Mohammad Monirujjaman Khan ,1 Arjun Singh ,2
Atef Zaguia ,3 and Sami Bourouis 4
1
2
3
4

C T
Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh
School of Computing and IT, Manipal University Jaipur, Jaipur, Rajasthan, India
Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia
Department of Information Technology, College of Computers and Information Technology, Taif University, P.O. Box 11099,

A
Taif 21944, Saudi Arabia

Correspondence should be addressed to Mohammad Monirujjaman Khan; [email protected]

Received 3 September 2021; Accepted 30 November 2021; Published 16 December 2021

R
Academic Editor: Suneet Kumar Gupta

Copyright © 2021 Hasin Shahed Shad et al. )is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

T
Generation Z is a data-driven generation. Everyone has the entirety of humanity’s knowledge in their hands. )e technological
possibilities are endless. However, we use and misuse this blessing to face swap using deepfake. Deepfake is an emerging subdomain of
artificial intelligence technology in which one person’s face is overlaid over another person’s face, which is very prominent across social
media. Machine learning is the main element of deepfakes, and it has allowed deepfake images and videos to be generated considerably
faster and at a lower cost. Despite the negative connotations associated with the phrase “deepfakes,” the technology is being more

E
widely employed commercially and individually. Although it is relatively new, the latest technological advances make it more and more
challenging to detect deepfakes and synthesized images from real ones. An increasing sense of unease has developed around the
emergence of deepfake technologies. Our main objective is to detect deepfake images from real ones accurately. In this research, we
implemented several methods to detect deepfake images and make a comparative analysis. Our model was trained by datasets from
Kaggle, which had 70,000 images from the Flickr dataset and 70,000 images produced by styleGAN. For this comparative study of the

R
use of convolutional neural networks (CNN) to identify genuine and deepfake pictures, we trained eight different CNN models. )ree
of these models were trained using the DenseNet architecture (DenseNet121, DenseNet169, and DenseNet201); two were trained using
the VGGNet architecture (VGG16, VGG19); one was with the ResNet50 architecture, one with the VGGFace, and one with a bespoke
CNN architecture. We have also implemented a custom model that incorporates methods like dropout and padding that aid in
determining whether or not the other models reflect their objectives. )e results were categorized by five evaluation metrics: accuracy,
precision, recall, F1-score, and area under the ROC (receiver operating characteristic) curve. Amongst all the models, VGGFace
performed the best, with 99% accuracy. Besides, we obtained 97% from the ResNet50, 96% from the DenseNet201, 95% from the
DenseNet169, 94% from the VGG19, 92% from the VGG16, 97% from the DenseNet121 model, and 90% from the custom model.

1. Introduction because of the myriads of algorithms based on deep learning


technology. Deepfake is an emerging subdomain of artificial
)e face is the most distinctive feature of human beings. intelligence technology in which one person’s face is overlaid
With the tremendous growth of face synthesis technology, over another person’s face. More specifically, multiple
the security risk posed by face manipulation is becoming methods based on generative adversarial networks (GANs)
increasingly significant. Individuals’ faces may often be produce high-resolution deepfake images [1]. Unfortu-
swapped with someone else’s faces that appear authentic nately, due to the widespread usage of cellphones and the
2 Computational Intelligence and Neuroscience

development of numerous social networking sites, deepfake identification of deepfakes would reduce the number of
content is spreading faster than ever before in the twenty- crimes that are currently occurring around the world.
first century, which has turned into a global danger [2]. )erefore, researchers have paid attention to the mechanism

D
Initially, deepfake images were discernible with the human for validating the integrity of deepfakes [2]. In reaction to
eye due to the pixel collapse phenomena that tend to create this trend, some multinational companies have started to
artificial visual inconsistencies in the skin tone or facial take initiatives. For instance, Google has made a fake video
shape of pictures. Not only images or videos, but also audio database accessible for academicians to build new algorithms
can be turned into deepfakes. Deepfakes have grown to be to detect deepfake, while Facebook and Microsoft have

E
barely distinguishable from natural pictures as technology organized the Deepfake Detection Challenge [7].
has progressed over the years [3]. Consequently, people all )ere are several methods to detect GAN-generated
across the world are experiencing inescapable complications. deepfake images, including the traditional machine learning
Because of deepfake technology, people may choose their classifiers (Support Vector Machine Algorithm, or naive

T
fashion more quickly, which benefits the fashion and algorithms), deep neural networks, convolutional neural
e-commerce industries. Furthermore, this technology aids networks (CNN), recurrent neural networks (RNN), long
the entertainment business by providing artificial voices for short-term memory (LSTM), and many more. )e main
artists who cannot dub on time. Additionally, filmmakers contribution of the work is to identify the deepfake images
can now recreate many classic sequences or utilize special and distinguish them from the normal images using CNN

C
effects in their films because of deepfake technology. architecture. In this research, eight different architectures
Deepfake technology can potentially let Alzheimer’s patients using convolutional neural networks have been employed to
communicate with a younger version of themselves, which detect deepfake images, including DenseNet169, Dense-
might help them retain their memories. GANs are also being Net121, DenseNet201, VGG16, VGG19, VGGFace, and
investigated for their application in detecting anomalies in ResNet50. A custom model has also been introduced to do

A
X-ray images [4]. )e deepfake approaches often require a comparative analysis.
massive quantity of image, video, or audio data to generate )e dataset for this work was obtained from Kaggle. At
natural photos so that the witnesses are persuaded to believe its commencement, the dataset was gathered. Hence, the
them. Besides all the prominence, there are some significant features have been extracted, and various CNN architectures
drawbacks as well. Public figures, for instance, celebrities, have been implemented to obtain the best result. Finally,

R
athletes, and politicians, are the worst sufferers of deepfakes each model was evaluated using four different metrics: ac-
as they have a substantial number of videos and pictures curacy, precision, recall, and F1-score. Lastly, the area under
available online. )ough deep fake technologies are occa- the ROC curve was also considered another metric for
sionally used to ridicule others, they are primarily employed assessing the performance of the models.

T
to create adulterous content. )e faces of many celebrities
and other well-known individuals have been grafted onto the 2. Related Works
bodies of pornographic models, and these images are widely
available on the Internet [2]. Deepfake technology may While deepfake is a relatively new technology, there has been
create satirical, pornographic, or political content about research done on the topic. Nguyen et al. and his colleagues

E
familiar people by utilizing their pictures and voices without performed a study [2] that examined the use of deep learning
their consent. Due to the ease of various applications, to create and detect deepfakes. )e number of deepfake
anyone can fabricate any artificial content imperceptible to articles has grown significantly in recent years, according to
the actual content [2]. Many young people are becoming data gathered by https://app.dimensions.ai towards the end

R
victims of cyberbullying. In the worst-case scenario, of 2020. Although the number of deepfake articles acquired
countless sufferers commit suicide. is likely to be lower than the exact amount, the research
A deep fake video of the former American president trend on this issue is rising. )e capacity of deep learning to
Barack Obama is being circulated on the Internet these days represent complex and high-dimensional data is well-
where he is uttering things that he has never expressed. known. Deep autoencoders, a type of deep network having
Furthermore, deepfakes have already been used to alter Joe such an ability, have been widely used for dimensionality
Biden’s footage showing his tongue out during the US 2020 reduction and picture compression [8–10].
election. Besides, Taylor Swift, Gal Gadot, Emma Watson, )e FakeApp, developed by a Reddit user utilizing an
Meghan Markle, and many other celebrities have been autoencoder-decoder pairing structure, was the first effort at
victims of deepfake technology [5]. In the United States and deepfake generation [11, 12]. )e autoencoder collects latent
Asian societies, many women are also victimized by deep characteristics from facial pictures, and the decoder re-
fake technologies. )e harmful use of deep fakes can sig- constructs the images in that way. Two encoder-decoder
nificantly impact our culture and increase misleading in- pairs are required to switch faces between source and target
formation, especially on social media [6]. However, because pictures; the encoder’s parameters are shared between two
of the negative impacts on different individuals and orga- network pairs, and each pair is used to train on an image
nizations, deepfakes have been a significant threat to our collection. )e encoder networks of these two pairs are
current generation. )erefore, to eradicate defamation, identical [2]. )is method using the encoder-decoder ar-
scams, deception, and insecurities from society, researchers chitecture is used in several recent types of research, in-
have been relentlessly trying to detect deepfakes. )e cluding DeepFaketf (TensorFlow-based deepfakes) [13],
Computational Intelligence and Neuroscience 3

DFaker [14], and DeepFaketf (TensorFlow-based deepfakes) then proposed a temporal-aware pipeline technique for
[15]. An enhanced version of deepfakes based on the gen- detecting deepfake films that employs CNN and long short-
erative adversarial network (GAN) [10], for example, face term memory (LSTM).

D
swap-GAN, was suggested in [16] by adding the adversarial Deepfakes have considerably lower blink rates than
loss and perceptual loss to the encoder-decoder architecture, regular videos. To distinguish between actual and fake
as implemented in VGGFace [17]. videos, Li et al. [46] deconstructed them into frames,
Furthermore, the FaceNet implementation [18] intro- extracting face regions and eye areas based on six eye
duces a multitask convolutional neural network (CNN) to landmarks. )ese cropped eye landmark sequences are

E
improve face identification and alignment reliability. )e distributed into long-term recurrent convolutional networks
CycleGAN [19] is used to construct generative networks. (LRCN) [47] for dynamic state prediction after a few pre-
Deepfakes are posing a growing threat to privacy, security, processing stages, such as aligning faces, extracting and
and democracy [20]. As soon as the risks of deepfakes were scaling the bounding boxes of eye landmark points to

T
identified, strategies for monitoring them were developed. In produce new sequences of frames. To identify fake photos
recent approaches, deep learning automatically extracts and videos, Nguyen et al. [48] recommended using capsule
significant and discriminative characteristics to detect networks. )e capsule network was created to overcome the
deepfakes [21, 22]. Korshunov and Marcel [23, 24] used the constraints of CNNs when employed for inverse graphics
open-source code Faceswap-GAN [19] to create a unique tasks [49], which attempt to discover physical processes that

C
deepfake dataset containing 620 videos based on the GAN form pictures of the environment. )e ability of a capsule
model to address this issue. Low and high-quality deepfake network based on a dynamic routing algorithm [50] to
films were made using videos from the publicly accessible express hierarchical pose connections between object
VidTIMIT database [25], efficiently imitating facial ex- components has recently been observed. )ey include the
pressions, lip movements, and eye blinking. According to Idiap Research Institute replay-attack dataset [51], Afchar

A
test findings, the popular facial recognition algorithms based et al. deepfake’s face-swapping dataset [52], the facial re-
on VGG and Facenet [18, 26] are unable to identify deep- enactment FaceForensics dataset [44], developed by the
fakes efficiently. Because deep learning algorithms like CNN Face2Face technique [53], and Rahmouni et al. entirely
and GAN can improve legibility, facial expression, and computer-generated picture dataset [54].
lighting in photos, swapped face images have become harder Researchers in [55] advocated using photo response

R
for forensics models [27]. To create fake photos with a size of nonuniformity (PRNU) analysis to distinguish genuine
128 × 128, the large-scale GAN training models for high- deepfakes from fakes. PRNU is sometimes regarded as the
quality natural image synthesis (BIGGAN) [28], self-at- digital camera’s fingerprint left in the photos [56]. Because
tention GAN [27], and spectral normalization GAN [29] are the swapped face is intended to affect the local PRNU pattern

T
employed. On the contrary, Agarwal and Varshney [30] in the facial area, the analysis is frequently utilized in picture
framed the GAN-based deepfake detection problem as a forensics [57–60] and is proposed for use in [57]. )e goal of
hypothesis testing problem, using a statistical framework digital media forensics is to create tools that allow for the
based on the information-theoretic study of authenticity automated analysis of a photo or video’s integrity. In this
[31]. research, both feature-based [61, 62] and CNN-based

E
When used to detect deepfake movies from this newly [63, 64] integrity analysis techniques have been investigated.
created dataset, other methods such as lip-syncing ap- Raghavendra et al., in their paper [65], suggested using two
proaches [32–34] and picture quality measures with support pretrained deep CNNs to identify altered faces, while Zhou
vector machine (SVM) [35] generate very high error rates. [66] recommended using a two-stream network to detect

R
To get the detection results, the extracted features are put two distinct face-swapping operations. A recent dataset by
into an SVM classifier. In their paper [36], Zhang et al. Rössler [67], which contains half a million altered pictures
utilized the bag of words approach to extract a collection of created with feature-based face editing, will be of particular
compact features, which they then put into classifiers like interest to practitioners.
SVM [37], random forest (RF) [38], and multilayer per- )en the paper is organized as follows: Section 2 dis-
ceptron (MLP) [39] to distinguish swapped face images from cussed the influential works on detecting deepfake images.
real ones. To identify deepfake photos, Hsu et al. [40] )en, the techniques employed in our research are described
proposed a two-phase deep learning technique. )e feature in Section 3. In Section 4, the results are presented, and
extractor in the first phase is based on the common fake comparative analysis is carried out. Finally, Section 5 draws
feature network (CFFN), and it leverages the Siamese net- the paper to a conclusion.
work design described in [41]. To leverage temporal dif- )e main objective of this paper is to efficiently dis-
ferences across frames, a recurrent convolutional model tinguish deepfake images from normal images. )ere have
(RCN) was suggested based on the combination of the been a lot of studies done on the delicate issue of “’deepfake.”
convolutional network DenseNet [42] and the gated re- Many researchers used a CNN-based strategy to identify
current unit cells [43]. )e proposed technique is evaluated deepfake images, while others used feature-based tech-
on the FaceForensics++ dataset [44], which contains 1,000 niques. To detect the deepfake images, few of them used
videos, and shows promise. Guera and Delp [45] have machine learning classifiers. But the novelty of this work is
pointed out that deepfake videos include intraframe dis- that it is able to detect deepfake images from normal images
crepancies and temporal anomalies between frames. )ey with 99% accuracy using the VGGFace model. We
4 Computational Intelligence and Neuroscience

implemented more CNN architectures in our study than features, selection, and accuracy evaluation of appropriate
many other researchers, which has distinguished our work. grade methods. )e core deepfake framework includes the
A comprehensive analysis has been demonstrated in our use of generative adversarial networks [2], generative models

D
work, and the outcome outperformed previous work. that learn how to distribute their data without any super-
vision. )e Kaggle dataset utilized in this research, “140k
Real and Fake Faces,” consists of 70000 fake faces prepared
3. Methodology
by styleGAN [68]. We have trained 8 CNN models for this
Figure 1 presents the fundamental diagram of several deep comparative research of the usage of CNN networks to

E
learning architectures. At the outset, the dataset was col- classify real and deepfake images. )ree of the models that
lected, and the features were extracted. Hence, eight deep were trained are of the DenseNet architecture (Dense-
learning architectures have been employed that were eval- Net121, DenseNet169, and DenseNet201), two are of the
uated against five different evaluation metrics, including VGGNet architecture (VGG16, VGG19), one is using the

T
accuracy, precision, F1-score, recall, and the area under the ResNet50 architecture, one is using the VGGFace, and one is
ROC curve. with a custom CNN architecture. Each model is discussed at
In Figure 1, the input is first obtained from a dataset length in the following sections.
collected from Kaggle and then sent through the convolution
layer. )is layer extracts numerous characteristics from the

C
3.2. Proposed Network. Convolutional neural networks are
input photos. Convolution is a mathematical process that is
constructed by using numerous smaller units of neurons that
conducted between the input picture and a filter of specified
take place in a layered fashion. )e neurons are then con-
size (P × P). )e dot product between the filter and the input
nected with each other, and the edges that connect them
image portion is calculated by sliding the filter across the
have weight. )e weights of the training model are updated
image (P × P). )e resulting feature map provides infor-
every epoch using techniques like backpropagation. A

A
mation about the image’s corners and edges. )is feature
convolutional neural network consists of two portions. )e
map is later used by additional layers to learn more about the
first one is the feature extraction portion, and the second one
input picture.
is the classification portion. We used pretrained networks
Afterward, it passes through the pooling layer. )e main
such as DenseNet, which exists in the Keras API. Figure 2
goal of this layer is to minimize the size of the convolved

R
shows the architecture of DenseNet. We have used different
feature map. )is is accomplished by reducing the con-
versions of the DenseNet (e.g., DenseNet201, DenseNet169,
nections between layers and operating independently on
and DenseNet121) pretrained model to improvise the pre-
each feature map. Diverse methods of pooling provide
diction results. It is a convolutional network that is con-
distinct results. Max-pooling selects the biggest element
nected layer in a feedforward fashion. Each layer gets new

T
from the feature map. Average pooling determines the av-
inputs from all preceding levels and passes them on to all
erage of the items included within a set image section size.
following layers to maintain the feedforward nature [42].
It then passes through the fully connected layer. )e fully
connected (FC) layer connects two layers of neurons. It has

E
the weights, biases, and neurons. Input from the previous 3.3. Dense Blocks. A convolutional layer is a fundamental
levels is flattened and sent to the FC layer. Further FC layers building block of a neural network. A fixed size is used to
are utilized to conduct mathematical functional operations extract the complex features of the given data. )e DenseNet
on the flattened vector. )is stage initiates the categorization convolution network is divided into multiple dense blocks.
process. For example, in the DenseNet169 architecture, there are 169

R
layers in 4 dense blocks. Apart from that, there are 3
transition layers, 1 classification layer, and 1 convolutional
3.1. Data. )e dataset was acquired from Kaggle, which layer. )e dense blocks consist of 6, 12, 32, and 32 con-
included 70,000 real faces from the Flickr dataset collected volutional layers. )e initial convolution of the architecture
by Nvidia Corporation. Besides, there were 70,000 fake faces is 112 × 112, followed by a max-pooling of 56 × 56. )e
out of the one million fake faces that were produced by model input is a blob that takes each image input of
styleGAN. Later, both of the datasets were combined, and 1 × 3 × 224 × 224 in BGR order.
the images were resized to 256 pixels. Lastly, the dataset was
divided into three parts, including the train, validation, and
test set. )ere were 100000 images in the training set, with 3.4. DenseNet121. Dense convolutional network (DenseNet)
50000 images being real and the rest being fake. In the is a widespread expansion of the Residual CNN (ResNet)
validation set, there were 20,000 images, of which 10,000 architecture. DenseNet differentiates by providing an im-
were real, and the rest were fake. Finally, the other 20000 mediate connection between each layer and all subsequent
images were equally divided into real and fake in the test set. network layers instead of its ResNet and other convolution
Deepfake image detection is a complicated method that neural networks [42]. We wanted to keep in mind that the
takes several aspects into account. )e fundamental pro- DenseNet121 model in Keras is accurate, with a bit of
cedures for imaging classification will include the identifi- tweaking using a dense layer as the final layer. )e model
cation of an appropriate classification scheme, training consisted of four dense blocks of closely related layers, such
sample collection, image preprocessing, extraction of as Batch Standardization (BN) and 3 × 3 turnaround.
Computational Intelligence and Neuroscience 5

Input Pooling layer Output

Real/Fake

E D
Convolution Layer

T
Fully Connected layer

C
A
Figure 1: Workflow diagram.

X0

TR
H1
X1

H2
X2

H3

E
X3

H4
X4

R Figure 2: Architecture of DenseNet [69].

Moreover, the pattern also featured a transition layer be-


tween every dense block with an average pooling layer of
2 × 2 and a concentration of 1 × 1. We inserted the cus-
tomized dense layer with the sigmoid activation after the last
dense block.
3.6. DenseNet169. DenseNet169 contains 169 layers of
depth, a minimal number of parameters compared to other
models, and has better handling of the vanishing gradient
problem.
Besides, ResNet50 is implemented in this work to ob-
serve the evaluation metrics. Figure 3 shows the architecture
of ResNet50. ResNet, short for Residual Network, is a neural
3.5. DenseNet201. Due to the ability to feature reuse by network developed to tackle a complicated issue by stacking
successive layers, the DenseNet201 uses a condensed net- more layers in deep neural networks, resulting in increased
work, enabling easy-to-train and parametrically efficient accuracy and performance. Adding more layers is based on
models. )is increases the variety of the succeeding layer the idea that these layers will learn increasingly complicated
input and enhances performance [42]. characteristics.
6 Computational Intelligence and Neuroscience

D
34-layer residual

3×3 conv, 128, /2

3×3 conv, 256, /2

3×3 conv, 512, /2


7×7 conv, 64, /2

3×3 conv, 128


3×3 conv, 128
3×3 conv, 128
3×3 conv, 128
3×3 conv, 128
3×3 conv, 128
3×3 conv, 128

3×3 conv, 256


3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256
3×3 conv, 256

3×3 conv, 512


3×3 conv, 512
3×3 conv, 512
3×3 conv, 512
3×3 conv, 512
3×3 conv, 64
3×3 conv, 64
3×3 conv, 64
3×3 conv, 64
3×3 conv, 64
3×3 conv, 64

avg pool
pool, /2

fc 1000
image

E
Figure 3: Architecture of ResNet50 [70].

3.7. ResNet50. ResNet50 is a familiar ResNet variation of 48 Finally, the output layer with sigmoid activation was also
convolution layers, 1 max-pool layer, and 1 average pool included as a dense layer.

T
layer. )ere are 3.8 × 109 floating-point operations in it. Lastly, a custom model has been introduced to this work
to see the overall variation, as shown in Figure 7.

3.8. VGG16. )e most distinctive feature of VGG16 is that,


rather than having a massive number of hyperparameters, 3.11. Custom CNN. )is model helps to determine whether

C
they concentrated on having 3 × 3 filter convolution layers the other models are as good as they promise. Figure 7 shows
with a stride of 1 and always utilized the same padding and the architecture of the custom model. )is model also in-
max-pool layer as the 2 × 2 filter stride 2. Figure 4 shows the cludes techniques such as dropout and padding, which are not
architecture of VGG16. )roughout the design, the con- included in the other models. We can study whether such
volution and max-pool layers are arranged in the same way. strategies improve CNN’s performance. We have employed

A
It features two fully connected layers at the end, followed by six convolutional layers for the custom design, each paired
a softmax for output. )e 16 in VGG16 alludes to the fact with batch normalization and max-pooling layers. For all
that it contains 16 layers with different weights [71]. convolutional layers, the activation function was the rectified
linear unit (ReLU). We also used dropout to decrease the fit
for every convolutional layer. We employed padding to allow

R
3.9. VGG19. VGG19 is a convolutional neural network the kernel to have more room to check the image, thus in-
model with several convolutional layers and nonlinear ac- creasing the precision of the image as well as dropouts. As this
tivation layers that outperforms a single convolutional layer. was a binary classification task, we added a dense layer at the
Figure 5 shows the architecture of VGG19. )e layer end with sigmoid activation on top of the convolutional base.

T
structure allows for improved image feature extraction,
downsampling using max-pooling, and modification of the 4. Results and Analysis
rectified linear unit (ReLU) as the activation function, which
selects the greatest value in the image region as the pooled )is comparative study showed that convolutional neural

E
value of the area. )e downsampling layer is primarily used networks are highly effective in the detection and classifi-
to increase the network’s antidistortion capabilities of the cation of GAN-generated images. )e performance of the
picture while preserving the sample’s primary characteristics models has been assessed with five different metrics: ac-
and lowering the number of parameters. curacy, precision, recall, F1-score, and area under the ROC
curve.

R
3.10. VGGFace. VGGFace is an image recognition model
that produces the most advanced outputs from Oxford’s 4.1. Confusion Matrix. A confusion matrix of size n × n (n
Visual Geometry Group researchers’ standard datasets for number of rows and columns) associated with a classifier shows
face recognition [74]. )is technique allows us to build a the predicted and actual classification, where n is the number of
large data set for training while utilizing only a modest different classes. For n × n matrices, True Positive (TP), True
amount of annotation power. Figure 6 shows the archi- Negative (TN), False Negative (FN), and False Positive (FP) are
tecture of VGGFace. We used the VGGFace architecture calculated using the following equations [75]:
proposed by Tai Do Nhu and Kim [73] to build the model. TPi � aii ,
)is model included five blocks of the layer, with con-
n
volutional and max-pooling layers in each block. Two 3 × 3
FPi � 􏽘 aji ,
convolution layers followed by a pooling layer were each in j�1,j≠i
the first and second block. )ree 3 × 3 convolution layers,
n
each composed of the third, fourth, and fifth blocks, are (1)
FNi � 􏽘 aij ,
followed by a max-pooling layer. )e ReLU activation j�1,j≠i
function was employed in all convolutional layers. Since n n
VGGFace uses pretrained weights, we have had to adapt to TNi � 􏽘 􏽘 ajk .
our needs. After the five-layer blocks that gave us the facial j�1,j≠i k�1,k≠i
characteristics, we fine-tuned them by adding dense layers.
Computational Intelligence and Neuroscience 7

FC

CONV3-64

CONV3-128

D
CONV3-256
CONV3-512
CONV3-512

Input Prediction

POOL2
POOL2
POOL2
POOL2

T
POOL2

E
C
Figure 4: Architecture of VGG16 [72].

VGG 19

depth=64
3×3 conv

R A
maxpool
depth=128
3×3 conv
maxpool
depth=256
3×3 conv
conv3_1
maxpool
depth=512
3×3 conv
conv4_1
maxpool
depth=512
3×3 conv
conv5_1
maxpool
size=4096
FC1

T
FC2
conv1_1 conv2_1 conv3_2 conv4_2 conv5_2 size=1000
conv1_2 conv2_2 conv3_3 conv4_3 conv5_3 somax
conv3_4 conv4_4 conv5_4
Figure 5: Architecture of VGG19 [73].

224×224×3

R
224×224×64

Convolution + ReLU
Max pooling
E
112×112×128
56×56×256
28×28×512
14×14×512 7×7×512
1×1×4096 1×1×4096 1×1×2622 2622

Somax

Figure 6: Architecture of VGGFace [74].

Here, predictions can be correct or wrong. )e )e confusion matrix for DenseNet201 has been
confusion matrix for DenseNet121 is illustrated in Fig- shown in Figure 9. Unlike the aforementioned Dense-
ure 8. From the confusion matrix, 9480 fake images and Net121 model, DenseNet201 has performed better in
9926 real images were correctly classified by the network. terms of identifying fake images, which is 9503. Even
However, 520 fake images were classified as real and 74 though there is no significant difference, the model has
real images were classified as fake images. not been able to identify real images as it has
8 Computational Intelligence and Neuroscience

1×448
1@224×224
6@112×112 1×224

D
6@56×56
6@16×16

0:fake
Feature Extraction Classification
Figure 7: Architecture of the custom model.

1:real

T
0:fake 1:real
E
C
0:fake 9480 520 0:fake 9758 242

A
1:real 74 9926 1:real 249 9751

DenseNet121 DenseNet169

R
Figure 8: )e confusion matrix for DenseNet121. Figure 10: )e confusion matrix for DenseNet201.

0:fake 1:real 0:fake 1:real

T
0:fake 9503 497 0:fake 9824 176

1:real

as real.
R
138

E
DenseNet201
9862

Figure 9: )e confusion matrix for DenseNet201.

misclassified 138 real images as fake and 497 fake images

)e confusion matrix for DenseNet169 has been shown


in Figure 10. )e confusion matrix for DenseNet169 has
1:real 318

ResNet50
9682

Figure 11: )e confusion matrix for ResNet50.

other hand, it failed to classify 1693 real images as real. 8307


real images were correctly identified, and 381 fake images
were misclassified.
)e confusion matrix for VGG19 is shown in Figure 13.
been shown in Figure 10. It has identified 9758 images as 9426 fake images were successfully classified as fake images,
fakes out of the 10000 fake images. On the other hand, 9751 and 9435 real images were classified as real images. On the
real images were identified as real correctly, whereas it contrary, the model classified 574 fake images as real and 565
misclassified 249 real images as fake and 242 fake images as real images as fake.
real. Figure 14 illustrates the confusion matrix for VGGFace.
Figure 11 represents the confusion matrix for ResNet50. )e model correctly classified 9916 real images and 9835 fake
)e model misclassified a total of 494 images. 9824 fake images. On the contrary, only fake images and 165 real
images and 9682 real images were correctly classified. images were misclassified.
Figure 12 depicts the confusion matrix for VGG16. )e Finally, the confusion matrix for the custom model is
VGG16 model identified 9619 fake images correctly. On the shown in Figure 15. 168 fake images were misclassified. Also,
Computational Intelligence and Neuroscience 9

0:fake 1:real 0:fake 1:real

D
0:fake 9619 381 0:fake 9832 168

E
1:real 1693 8307 1:real 1522 8478

VGG16 Custom CNN

T
Figure 12: )e confusion matrix for VGG16. Figure 15: )e confusion matrix for custom CNN.

0:fake 1:real 4.3. Precision. Precision, also known as positive predictive


value, refers to how well the model predicts positive values

C
out of all the positive values predicted by the model. )e
0:fake 9426 574 term “precision” refers to the following:
True Positive
precision � . (3)
True Positive + False Positive

A
1:real 565 9435 4.4. Recall. Recall can be used to measure how well the
model detects true positives. A high recall is an indicator that
the model has done well at identifying true positives. On the

R
VGG19 contrary, if the recall value is low, the model encounters
Figure 13: )e confusion matrix for VGG19. many false negatives. )e term “recall” refers to the
following:
True Positive
0:fake 1:real recall � .

T
(4)
True Positive + False Negative

0:fake 9916 84
4.5. F1-Score. It is the harmonic mean of precision and

E
recall. )e F1-score provides a better estimate than the
accuracy metric of the wrongly categorized cases.
Precision
F1 − score � 2 . (5)
1:real 165 9835 Precision + Recall

R
)e F1-score is required to balance precision and recall.
We saw before that True Negatives contribute a great deal to
VGGFace
accuracy. )e F1-score may be a better measure if we need to
Figure 14: )e confusion matrix for VGG face. balance precision and recall and there is an uneven class
distribution (a large number of Actual Negatives) [76].

1522 real images were classified as fake. 9832 fake images 4.6. Receiver Operating Characteristic Curve (ROC) and Area
and 8478 real images were classified correctly. under the ROC Curve (AUC). For classification tasks, the
AUC-ROC curve is used to assess the algorithm’s perfor-
mance. ROC is the probability curve, and AUC indicates the
4.2. Accuracy. )e number of times correct estimates are degree or level of separability. It shows how well the model
made is referred to as accuracy. Accuracy is calculated using can differentiate between classes. In general, the AUC in-
the following equation: dicates how well the model predicts 0 and 1 classes correctly.
number of correct predictions For example, the greater the AUC is, the more accurate the
accuracy � . (2) model discriminates between patients with and without
total number of predictions made
illness. Let us first define some terms.
It only works best if each class has an equal number of )e receiver operating characteristic (ROC) curve il-
samples. lustrates the relationship between True Positive Rate and
10 Computational Intelligence and Neuroscience

False Positive Rate at various categorization levels. Reduce the validation accuracy has some fluctuations over the time
the categorization threshold, and more items are classified as period. At the third epoch, the validation accuracy dropped
positive, increasing both False Positives and True Positives below 50%. However, by the 10th epoch, the results were

D
[77]. touching the 96% score. )e training loss was quite constant
An AUC of around 1 indicates that a model is excellent, over epochs, while the validation loss rose, then fell, and
suggesting a high degree of separability. An inadequate remained rather steady, touching 0, across the remaining
model has an AUC value close to zero, meaning that it has epochs.
the lowest measure of separability. Indeed, it implies that the

E
outcome is reciprocated. It is mistaking 0 s for 1 s and 1 s for
0 s. And an AUC of 0.5 indicates that the model has no 4.7.4. VGG16. Training accuracy, validation accuracy,
capability for class differentiation at all. training loss, and validation loss graphs for VGG16 are
shown in Figures 19(a) and 19(b).
As shown in Figure 19, the graph on the left side depicts

T
4.7. Model Accuracy and Loss. )e training accuracy, vali- the training and validation accuracy of the model VGG16
dation accuracy, training loss, and validation loss graphs for over the course of 10 epochs. )e training accuracy and
all the models are illustrated in Figures 16(a) and 16(b). validation accuracy seem to have a steady rise as the epochs
increase. )e graph on the right side depicts the training and

C
4.7.1. DenseNet121. Training accuracy, validation accuracy, validation loss of the model over the period of 10 epochs,
training loss, and validation loss graphs for DenseNet121 are reaching below 0.2.
shown.
In Figure 16, the given graph on the left side shows us 4.7.5. VGG19. Training accuracy, validation accuracy,
training accuracy and validation accuracy over the course of training loss, and validation loss graphs for VGG19 are

A
10 epochs for the model DenseNet121. We can observe that illustrated in Figures 20(a) and 20(b).
training accuracy steadily improved and reached nearly 100%, In Figure 20, the graph on the left side illustrates the
whereas validation accuracy rose and subsequently fluctuated training and validation accuracy of the model VGG19 over
and reached a point where the gap between training and the course of 10 epochs. Both the training accuracy and the
validation accuracy was minimal. Similarly, training loss validation accuracy seem to have a steady rise as the epochs

R
dropped progressively over time, whereas validation loss increase, achieving more than 90%. )e graph on the right
decreased first and then fluctuated. )e overfitting problem side depicts the training and validation loss of the model
was observed after the training crossed the 10 epoch mark. On over the period of 10 epochs and reaching the 0.1 loss mark.
the contrary, training loss dropped progressively over time,

T
whereas validation loss decreased first till the 2nd epoch and
then fluctuated during the 3rd, 6th, and 9th epochs with an 4.7.6. VGGFace. )e graphs for VGGFace’s training accu-
increasing loss score of more than 0.1 at least. racy, validation accuracy, training loss, and validation loss
are shown in Figures 21(a) and 21(b).
Figure 21 displays the plot for training and validation

E
4.7.2. DenseNet169. Training accuracy, validation accuracy, accuracy and training and validation loss for our best-
training loss, and validation loss graphs for DenseNet169 are resulting model compared to other models in our experi-
illustrated underneath in Figures 17(a) and 17(b). ment. )e validation accuracy of the model trains the data
In Figure 17, the graph on the left side illustrates the with more than 95% accuracy on every epoch, eventually
training and validation accuracy of the model DenseNet169 reaching an impressive 99% validation accuracy. Addi-

R
over the course of 10 epochs. We can observe that training tionally, the training and validation loss decrease to close to
accuracy grew steadily, but validation accuracy increased but the 0 mark.
fluctuated after the eighth period, before increasing again.
Training accuracy almost touched the 100% mark, whereas
validation accuracy touched the 95% mark. )e model 4.7.7. ResNet50. Training accuracy, validation accuracy,
started at a training and testing accuracy of 70% and crossed training loss, and validation loss graphs are given in
the 90% mark. Training loss dropped progressively, but Figures 22(a) and 22(b).
validation loss reduced gradually but varied after the eighth As shown in Figure 22, the pretrained architecture of
epoch, reaching above 0.6 before decreasing again to just ResNet50 shows that it has more training and validation
above 0.1 on the 10th epoch. accuracy than most other pretrained models in 2 or 3 epochs.
)e training accuracy of ResNet50 reaches over 95%.
However, the validation accuracy reached 97%. While
4.7.3. DenseNet201. Training accuracy, validation accuracy, training loss dropped steadily, validation loss decreased
training loss, and validation loss graphs for DenseNet201 are smoothly until the third epoch and then varied.
given in Figures 18(a) and 18(b).
As displayed in Figure 18, the graph on the left side
illustrates the training and validation accuracy of the model 4.7.8. Custom CNN. Training accuracy, validation accuracy,
DenseNet201 over the course of 10 epochs. )e training training loss, and validation loss graphs for custom CNN are
accuracy seems to improve as the epochs increase. However, shown in Figures 23(a) and 23(b).
Computational Intelligence and Neuroscience 11

Training and Validation Accuracy Training and Validation Loss


1.00 0.7
0.95 0.6

D
0.90 0.5
0.85 0.4
0.80 0.3

E
0.75 0.2
0.70 0.1
0.65 0.0
2 4 6 8 10 2 4 6 8 10

T
Training accuracy Training Loss
Validation accuracy Validation Loss
(a) (b)

C
Figure 16: DenseNet121 training and validation accuracy and loss.

Training and Validation Accuracy Training and Validation Loss


1.00
0.7

A
0.95 0.6
0.90 0.5

0.85 0.4

R
0.80 0.3
0.2
0.75
0.1
0.70
0.0

T
2 4 6 8 10 2 4 6 8 10

Training accuracy Training Loss


Validation accuracy Validation Loss

E
(a) (b)

Figure 17: DenseNet169 training and validation accuracy and loss.

R
Training and Validation Accuracy Training and Validation Loss
1.0
80
0.9

60
0.8

0.7 40

0.6 20

0.5 0
2 4 6 8 10 2 4 6 8 10

Training accuracy Training Loss


Validation accuracy Validation Loss
(a) (b)

Figure 18: DenseNet201 training and validation accuracy and loss.


12 Computational Intelligence and Neuroscience

Training and Validation Accuracy Training and Validation Loss

0.90
0.6

D
0.85
0.80 0.5
0.75
0.4
0.70

E
0.65 0.3
0.60
0.2

T
2 4 6 8 10 2 4 6 8 10

Training accuracy Training Loss


Validation accuracy Validation Loss
(a) (b)

C
Figure 19: VGG16 training and validation accuracy and loss.

Training and Validation Accuracy Training and Validation Loss


0.7

A
0.95
0.90 0.6
0.85
0.5
0.80

R
0.75 0.4
0.70
0.3
0.65
0.60 0.2

T
0.55
2 4 6 8 10 2 4 6 8 10

Training accuracy Training Loss

E
Validation accuracy Validation Loss
(a) (b)

Figure 20: VGG19 training and validation accuracy and loss.

R
Finally, in Figure 23, the accuracy and loss of our proposed achieved the same accuracy of 97%, which is the second best
custom model are plotted. Even though the training accuracy of performing model. DenseNet201 and DenseNet169 achieved
the model has a steady rise, the validation accuracy has some an accuracy of 96% and 95%, respectively. )e highest
fluctuations over the course of the 10 epochs. While training precision score of 99% was achieved by four models. )e
loss dropped steadily, validation loss decreased smoothly until models are VGGFace, DenseNet169, DenseNet121, and
the second epoch and then varied. )e model does not show ResNet50. However, only VGGFace could achieve the best
promising results as far as validation accuracy is concerned. result in recall, which is 98%. )e second best models,
However, it still reaches the 90% mark. achieving close to the score of VGGFace, were the Dense-
Net201 and the VGG19 models, which achieved 97% recall.
)e F1-score of the VGGFace architecture was the highest,
4.8. Model Evaluation. Table 1 illustrates the findings re- reaching an impressive 99%. )e lowest F1-score was
ceived from all the CNN architectures. achieved by DenseNet121, as the F1-score was only 82%. )e
Finally, Figure 24 shows the comparison amongst all the second best model, according to the F1-score, was ResNet50,
models that have been implemented in this work. Amongst as it achieved a 97% F1-score. )e highest AUC score was
all the pretrained convolutional architectures, VGGFace 99.8%, achieved by the VGGFace architecture, and the
achieved an impressive 99% accuracy on our training set. On lowest was achieved by the DenseNet121 architecture. )e
the other hand, the least performed architecture, VGG16, custom model proposed by the authors achieved 90% ac-
achieved 92% accuracy. DenseNet121 and ResNet50 curacy on the dataset. )e custom architecture achieved 84%
Computational Intelligence and Neuroscience 13

Training and Validation Accuracy Training and Validation Loss


1.00
0.200
0.175

D
0.98
0.150

0.96 0.125
0.100

E
0.94 0.075
0.050
0.92 0.025

T
0.000
2 4 6 8 10 2 4 6 8 10

Training accuracy Training Loss


Validation accuracy Validation Loss

C
(a) (b)

Figure 21: VGGFace training and validation accuracy and loss.

Training and Validation Accuracy Training and Validation Loss

A
1.00
0.6
0.95
0.5
0.90

R
0.4
0.85
0.3
0.80
0.2

T
0.75 0.1

0.70 0.0
2 4 6 8 10 2 4 6 8 10

E
Training accuracy Training Loss
Validation accuracy Validation Loss
(a) (b)

R
Figure 22: ResNet50 training and validation accuracy and loss.

precision and the highest score in terms of recall. )e F1- 4.9. Model Comparison. Table 2 shows a comparison graph
score fell down to 91% even though the recall score was 99%. of several works that have been examined by deepfake.
A decent AUC score of 98.9% was achieved as well. Table 2 contrasts this paper with several other studies
A bar graph was generated using Table 1. )e graphical completed by other researchers using the same models
representation of the table shows us the exact scores as a that we utilized in our research. Studies [78, 79] used
whole. Evidently, VGGFace performed best in every cate- VGG19 and VGG16, respectively, and the corresponding
gory, achieving the best score amongst all the pretrained accuracies were 80.22% and 81.6%, respectively. )e
networks. However, the custom model achieved a 99% recall authors of the study [42] used several DenseNet models to
score, which is the highest score amongst all the recall scores conduct their research, and the accuracies for Dense-
of other pretrained architectures. ResNet50 was the second Net169, DenseNet201, and DenseNet121 were 93.15%,
best architecture, obtaining a 97% F1-score. Overall, the least 93.66%, and 92.29%, respectively. )e authors of the
performing architecture was DenseNet121, which achieved research also used ResNet50, where the accuracy was
only 82% F1-score as it scored only 70% on recall. 81.6%.
14 Computational Intelligence and Neuroscience

Training and Validation Accuracy Training and Validation Loss


0.95

0.90 0.6

D
0.85 0.5

0.80 0.4

E
0.75 0.3

0.70
0.2
0.65

T
2 4 6 8 10 2 4 6 8 10

Training accuracy Training Loss


Validation accuracy Validation Loss

C
(a) (b)

Figure 23: Custom CNN training and validation accuracy and loss.

A
Table 1: Obtained results after implementing the models.
CNN architecture’s name Accuracy Precision Recall F1-score AUC
VGG19 0.94 0.91 0.97 0.94 0.987
VGG16 0.92 0.93 0.92 0.92 0.977

R
VGGFace 0.99 0.99 0.98 0.99 0.998
DenseNet169 0.95 0.99 0.92 0.95 0.996
DenseNet201 0.96 0.96 0.97 0.96 0.994
DenseNet121 0.97 0.99 0.70 0.82 0.971
ResNet50 0.97 0.99 0.95 0.97 0.997

T
Custom model 0.90 0.84 0.99 0.91 0.989

Performance by the model

E
1

0.9

0.8

R
0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
VGG19 VGG16 VGGFace DenseNet169 DenseNet121 DenseNet201 Resnet50 Custom
Model

Accuracy F1-Score
Precision AUC
Recall
Figure 24: Comparison graph amongst the model.
Computational Intelligence and Neuroscience 15

Table 2: Comparison chart.


Reference Model name Accuracy (%) Accuracy in this paper (%)
In study [78] VGG19 80.22 94

D
In study [79] VGG16 81.6 92
In study [42] DenseNet169 93.15 95
In study [42] DenseNet201 93.66 96
In study [42] DenseNet121 92.29 97
In study [79] ResNet50 81.6 97

T E
The picture is: Fake

A
The picture is: Fake The picture is: Real

C
Figure 25: Screenshot of classification of the “real” and “fake” images.

4.10. Model Test. )e precision of our study was further


clarified by some more experiments. )e experiment was
The picture is: Real

to identify the deepfake image through our work. In the


future, we may apply the CNN algorithms to a video

R
done by providing fake and real images of each of the models. deepfake dataset for the convenience of many sufferers.
Almost all of the pictures were correctly identified or classified Many other experiments and tests have been left for
as “real” or “fake” as shown in Figure 25. From the validation future work. We aim to collect real data from our local
directory, as many as ten pictures were randomly selected community and classify deepfake images from normal

T
from each of the original and deepfake images. images using a convolutional neural network. We may apply
more efficient models to identify the deepfake images to
5. Conclusion and Future Work reduce crime in our society and, moreover, in our world. We
believe our contribution will eventually aid in the reduction
Deepfake is an emerging technology that is being used to of unwanted suicide cases and blackmail in our society.

E
deceive a large number of people. )ough not all deepfake
contents are malicious, they need to be detected since some of Data Availability
the deepfake contents are indeed threatening to the world.
)e primary purpose of this study was to find a reliable and )e data used to support the findings of this study are freely

R
accurate way to detect deepfake images. Many other re- available at https://www.kaggle.com/xhlulu/140k-real-and-
searchers have been working relentlessly to detect deepfake fake-faces.
content using a variety of methodologies. )e significance of
this work, however, is that it achieves excellent results using Conflicts of Interest
CNN architecture. )is study uses eight CNN architectures to
detect deepfake images from a large dataset. )e results have )e authors declare no conflicts of interest regarding the
been reliable and accurate. VGGFace performed the best in study.
several metrics, including accuracy, precision, F1-score, and
area under the ROC curve. However, in terms of recall, the Acknowledgments
custom model implemented in the study performed slightly
)e authors are thankful for the support from Taif Uni-
better than the VGGFace. )e results of the custom models,
versity Researchers Supporting Project (TURSP-2020/26),
DenseNet169, DenseNet201, VGG19, VGG16, ResNet50, and
Taif University, Taif, Saudi Arabia.
DenseNet121, were impressive as well. Finally, collected
deepfake images have been analyzed to detect whether they
are deepfakes or not, where the result is satisfactory. References
)is breakthrough work will have a tremendous impact [1] I. J. Goodfellow, J. P. Abadie, M. Mirza et al., “Generative
on our society. Using this technology, deepfake victims can adversarial nets, “NIPS” 14,” Proceedings of the 27th Inter-
quickly determine whether the pictures are real or fake. national Conference on Neural Information Processing Sys-
People will remain vigilant since they will have the capability tems, vol. 2, pp. 2672–2680, 2014.
16 Computational Intelligence and Neuroscience

[2] T. Nguyen, Q. Nguyen, C. M. Nguyen, D. Nguyen, D. Nguyen, [25] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face rec-
and S. Nahavandi, “Deep learning for deepfakes creation and ognition,” in Proceedings of the British Machine Vision Con-
detection: a survey,” pp. 1–17, 2019, https://arxiv.org/abs/ ference (BMVC), pp. 41.1–41.12, Swansea, UK, September 2015.
1909.11573. [26] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: a unified

D
[3] T. Jung, S. Kim, and K. Kim, “DeepVision: deepfakes de- embedding for face recognition and clustering,” in Proceed-
tection using human eye blinking pattern,” IEEE Access, vol. 8, ings of the IEEE Conference on Computer Vision and Pattern
pp. 83144–83154, 2020. Recognition, pp. 815–823, Boston, MA, USA, June 2015.
[4] M. Westerlund, “)e emergence of deepfake technology: a [27] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-

E
review,” Technology Innovation Management Review, vol. 9, attention generative adversarial networks,” 2018, https://arxiv.
no. 11, pp. 39–52, 2019. org/abs/1805.08318.
[5] M.-H. Maras and A. Alexandrou, “Determining authenticity [28] A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN
of video evidence in the age of artificial intelligence and in the training for high fidelity natural image synthesis,” 2018,
wake of Deepfake videos,” International Journal of Evidence https://arxiv.org/abs/1809.11096.

T
and Proof, vol. 23, no. 3, pp. 255–262, 2019. [29] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral
[6] A. M. Almars, “Deepfakes detection techniques using deep normalization for generative adversarial networks,” 2018,
learning: a survey,” Journal of Computer and Communica- https://arxiv.org/abs/1802.05957.
tions, vol. 9, no. 5, pp. 20–35, 2021. [30] S. Agarwal and L. R. Varshney, “Limits of deep- fake de-
[7] L. Guarnera, O. Giudice, and S. Battiato, “DeepFake detection tection: a robust estimation viewpoint,” 2019, https://arxiv.

C
by analyzing convolutional traces,” in Proceedings of the 2020 org/abs/1905.03493.
IEEE/CVF Conference on Computer Vision and Pattern Rec- [31] U. M. Maurer, “Authentication theory and hypothesis test-
ognition Workshops (CVPRW), pp. 2841–2850, Seattle, WA, ing,” IEEE Transactions on Information >eory, vol. 46, no. 4,
USA, 2020. pp. 1350–1356, 2000.
[8] A. Punnappurath and M. S. Brown, “Learning raw image [32] J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip
reconstruction-aware deep image compressors,” IEEE reading sentences in the wild,” in Proceedings of the IEEE

A
Transactions on Pattern Analysis and Machine Intelligence, Conference on Computer Vision and Pattern Recognition
vol. 42, no. 4, pp. 1013–1019, 2020. (CVPR), pp. 3444–3453, Honolulu, HI, USA, July 2017.
[9] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Energy com- [33] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizer-
paction-based image compression using convolutional man, “Synthesizing Obama,” ACM Transactions on Graphics,
AutoEncoder,” IEEE Transactions on Multimedia, vol. 22, vol. 36, no. 4, pp. 1–13, 2017.

R
no. 4, pp. 860–873, 2020. [34] P. Korshunov and S. Marcel, “Speaker inconsistency detection
[10] J. Chorowski, R. J. Weiss, S. Bengio, and A. van den Oord, in tampered video,” in Proceedings of the 26th European Signal
“Unsupervised speech representation learning using WaveNet
Processing Conference (EUSIPCO), pp. 2375–2379, Rome,
autoencoders,” IEEE/ACM Transactions on Audio, Speech,
Italy, September 2018.
and Language Processing, vol. 27, no. 12, pp. 2041–2053, 2019.

T
[35] J. Galbally and S. Marcel, “Face anti-spoofing based on general
[11] Faceswap, “Deepfakes software for all,” https://github.com/
image quality assessment,” in Proceedings of the 22nd Inter-
deepfakes/faceswap.
national Conference on Pattern Recognition, pp. 1173–1178,
[12] FakeApp 2.2.0, https://www.malavida.com/en/soft/fakeapp/.
Stockholm, Sweden, August 2014.
[13] DeepFaketf, “Deepfake based on tensorflow,” https://github.
[36] Y. Zhang, L. Zheng, and V. L. )ing, “Automated face
com/StromWine/DeepFake%20tf.

E
swapping and its detection,” in Proceedings of the IEEE 2nd
[14] DFaker, https://github.com/dfaker/df.
[15] DeepFaceLab, https://github.com/iperov/DeepFaceLab. International Conference on Signal and Image Processing
[16] Faceswap-GAN, https://github.com/shaoanlu/faceswap-GAN. (ICSIP), Singapore, August 2017.
[17] Keras-VGGFace, “VGGFace implementation with Keras [37] X. Wang, N. )ome, and M. Cord, “Gaze latent support vector
frame-work,” https://github.com/rcmalli/keras-vggface. machine for image classification improved by weakly su-

R
[18] FaceNet, https://github.com/davidsandberg/facenet. pervised region selection,” Pattern Recognition, vol. 72,
[19] CycleGAN, https://github.com/junyanz/pytorch-CycleGAN- pp. 59–71, 2017.
and-pix2pix. [38] S. Bai, “Growing random forest on deep convolutional neural
[20] K. Danielle Citron and R. Chesney, “Deep fakes: a looming networks for scene categorization,” Expert Systems with Ap-
challenge for privacy, democracy, and national security, 107 plications, vol. 71, pp. 279–287, 2017.
California law review,” p. 1753, 2019, https://scholarship.law. [39] L. Zheng, S. Duffner, K. Idrissi, C. Garcia, and A. Baskurt,
bu.edu/faculty_scholarship/640. “Siamese multi-layer perceptrons for dimensionality reduc-
[21] O. De Lima, S. Franklin, S. Basu, B. Karwoski, and A. George, tion and face identification,” Multimedia Tools and Appli-
“Deepfake detection using spatiotemporal convolutional cations, vol. 75, no. 9, pp. 5055–5073, 2016.
networks,” 2020, https://arxiv.org/abs/2006.14749. [40] C.-C. Hsu, Y.-X. Zhuang, and C.-Y. Lee, “Deep fake image
[22] I. Amerini and R. Caldelli, “Exploiting pre- diction error detection based on pairwise learning,” Applied Sciences,
inconsistencies through LSTM-based classifiers to detect vol. 10, no. 1, p. 370, 2020.
deepfake videos,” in Proceedings of the 2020 ACM Workshop [41] S. Chopra, “Learning a similarity metric discriminatively, with
on Information Hiding and Multimedia Security, pp. 97–102, application to face verification,” in Proceedings of the IEEE
Denver, CO, USA, June 2020. Conference on Computer Vision and Pattern Recognition,
[23] P. Korshunov and S. Marcel, “Vulnerability assessment and pp. 539–546, San Diego, CA, USA, September 2005.
detection of deepfake videos,” in Proceedings of the 12th IAPR [42] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
International Conference on Biometrics (ICB), pp. 1–6, Crete, “Densely connected convolutional networks,” in Proceedings
Greece, June 2019. of the IEEE Conference on Computer Vision and Pattern
[24] VidTIMIT database, http://conradsanderson.id.au/vidtimit/. Recognition, pp. 4700–4708, Honolulu, HI, USA, July 2017.
Computational Intelligence and Neuroscience 17

[43] K. Cho, B. van Merrienboer, C. Gulcehre et al., “Learning Security International Society for Optics and Photonics,
phrase representations using RNN encoder–decoder for vol. 7254, Article ID 72540M, 2009.
statistical machine translation,” in Proceedings of the 2014 [57] C. T. Li and Y. Li, “Color-decoupled photo response non-
Conference on Empirical Methods in Natural Language Pro- uniformity for digital image forensics,” IEEE Transactions on

D
Cessing (EMNLP), pp. 1724–1734, Doha, Qatar, October 2014. Circuits and Systems for Video Technology, vol. 22, no. 2,
[44] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. )ies, and pp. 260–271, 2012.
M. Nießner, “Faceforensics++: learning to detect manipulated [58] X. Lin and C. T. Li, “Large-scale image clustering based on
facial images,” in Proceedings of the IEEE/CVF International camera fingerprints,” IEEE Transactions on InformaTion

E
Conference on Computer Vision, pp. 1–11, Seoul, Republic of Forensics and Security, vol. 12, no. 4, pp. 793–808, 2017.
Korea, 2019. [59] U. Scherhag, L. Debiasi, C. Rathgeb, C. Busch, and A. Uhl,
[45] D. Guera and E. J. Delp, “Deepfake video detection using “Detection of face morphing attacks based on PRNU anal-
recurrent neural networks,” in Proceedings of the 2018 15th ysis,” IEEE Transactions on Biometrics, Behavior, and Identity
IEEE International Conference on Advanced Video and Signal Science, vol. 1, no. 4, pp. 302–317, 2019.

T
Based Surveillance (AVSS), pp. 1–6, Auckland, New Zealand, [60] Q.-T. Phan, G. Boato, and F. G. B. De Natale, “Accurate and
November 2018. scalable image clustering based on sparse representation of
[46] Y. Li, M. C. Chang, and S. Lyu, “Ictu oculi: exposing AI camera fingerprint,” IEEE Transactions on Information Fo-
created fake videos by detecting eye blinking,” in Proceedings rensics and Security, vol. 14, no. 7, pp. 1902–1916, 2019.
of the 2018 IEEE International Workshop on Information [61] H. T. Sencar and N. Memon, Digital Image Forensics,

C
Forensics and Security (WIFS), pp. 1–7, Hong Kong, China, Springer, New York, NY, USA, 2013.
December 2018. [62] H. Farid, Photo Forensics, MIT Press Ltd., Cambridge, MA,
[47] J. Donahue, L. Anne Hendricks, S. Guadarrama et al., “Long- USA, 2016.
term recurrent convolutional networks for visual recognition [63] D. Güera, Y. Wang, L. Bondi, P. Bestagini, S. Tubaro, and
and description,” in Proceedings of the IEEE Conference on E. J. Delp, “A counter forensic method for CNN-based camera
Computer Vision and Pattern Recognition, pp. 2625–2634, model identification,” in Proceedings of the IEEE Conference

A
Boston, MA, USA, June 2015. on Computer Vision and Pattern Recognition Workshops,
[48] H. H. Nguyen, J. Yamagishi, and I. Echizen, “Capsule-fo- pp. 1840–1847, Honolulu, HI, USA, July 2017.
rensics: using capsule networks to detect forged images and [64] D. Güera, S. K. Yarlagadda, P. Bestagini, F. Zhu, S. Tubaro,
videos,” in Proceedings of the 2019 IEEE International Con- and E. J. Delp, “Reliability map estimation for cnn-based
ference on Acoustics, Speech and Signal Processing (ICASSP), camera model attribution,” in Proceedings of the IEEE Winter

R
pp. 2307–2311, Brighton, UK, May 2019. Conference on Applications of Computer Vision, Lake Tahoe,
[49] G. E. Hinton, A. Krizhevsky, and S. D. Wang, “Transforming NV, USA, March 2018.
auto-encoders,” in Proceedings of the International Conference [65] R. Raghavendra, K. B. Raja, S. Venkatesh, and C. Busch,
on Artificial Neural Networks, pp. 44–51, Espoo, Finland, June “Transferable deep-cnn features for detecting digital and
2011. print-scanned morphed face images,” in Proceedings of the

T
[50] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing IEEE Conference on Computer Vision and Pattern Recog-
between capsules,” in Advances in Neural Information Pro- nition Workshops, pp. 1822–1830, Honolulu, HI, USA, July
cessing Systems, pp. 3856–3866, MIT Press, Cambridge, MA, 2017.
USA, 2017. [66] P. Zhou, “Two-stream neural networks for tampered face
[51] I. Chingovska, A. Anjos, and S. Marcel, “On the effectiveness detection,” in Proceedings of the IEEE Conference on Computer

E
of local binary patterns in face anti-spoofing,” in Proceedings Vision and Pattern Recognition Workshops, pp. 1831–1839,
of the International Conference of Biometrics Special Interest Honolulu, HI, USA, July 2017.
Group (BIOSIG), pp. 1–7, Darmstadt, Germany, September [67] A. Rössler, “Faceforensics: a large-scale video dataset for
2012. forgery detection in human faces,” 2018, https://arxiv.org/abs/
[52] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “MesoNet: 1803.09179.

R
a compact facial video forgery detection network,” in Pro- [68] 140K Real and Fake Faces, https://www.kaggle.com/xhlulu/
ceedings of the 2018 IEEE International Workshop on Infor- 140k-real-and-fake-faces.
mation Forensics and Security (WIFS), pp. 1–7, Darmstadt, [69] https://arxiv.org/abs/1608.06993.
Germany, December 2018. [70] https://www.kaggle.com/keras/resnet50.
[53] J. )ies, M. Zollhofer, M. Stamminger, C. )eobalt, and [71] https://towardsdatascience.com/step-by-step-vgg16-impleme
M. Nießner, “Face2Face: real-time face capture and reen- ntation-in-keras-for-beginners-a833c686ae6c.
actment of RGB videos,” in Proceedings of the IEEE Conference [72] https://www.kaggle.com/shivamb/cnn-architectures-vgg-resnet
on Computer Vision and Pattern Recognition, pp. 2387–2395, -inception-tl.
Las Vegas, NV, USA, June 2016. [73] I. N. Tai Do Nhu and S. H. Kim, “Forensics face detection
[54] N. Rahmouni, V. Nozick, J. Yamagishi, and I. Echizen, from GANs using convolutional neural network,” pp. 1–8,
“Distinguishing computer graphics from natural images using 2018, https://arxiv.org/abs/1902.11153v2.
convolution neural networks,” in Proceedings of the 2017 IEEE [74] https://sefiks.com/2018/08/06/deep-face-recognition-with-
Workshop on Information Forensics and Security (WIFS), keras/.
pp. 1–6, Rennes, France, December 2017. [75] M. S. Junayed, “AcneNet-a deep CNN based classification
[55] M. Koopman, A. M. Rodriguez, and Z. Geradts, “Detection of approach for acne classes,” in Proceedings of the 12th Inter-
deepfake video manipulation,” in Proceedings of the 20th Irish national Conference on Information & Communication
Machine Vision and Image Processing Conference (IMVIP), Technology and System (ICTS), pp. 203–208, Surabaya,
pp. 133–136, Belfast, Ireland, August 2018. Indonesia, 2019.
[56] K. Rosenfeld and H. T. Sencar, “A study of the robustness of [76] P. Accuracy, “Recall or F1,” https://towardsdatascience.com/
PRNU-based camera identification,” Media Forensics and accuracy-precision-recall-or-f1-331fb37c5cb9.
18 Computational Intelligence and Neuroscience

[77] Classification, “ROC curve and AUC,” https://developers.


google.com/machine-learning/crash-course/classification/
roc-and-auc.
[78] D. Gong, Y. Jaya Kumar, O. S. Goh, Zi Ye, and W. Chi,

D
“DeepfakeNet, an efficient deepfake detection method,” In-
ternational Journal of Advanced Computer Science and Ap-
plications (IJACSA), vol. 12, no. 6, 2021.
[79] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and

E
J. Ortega-Garcia, “Deepfakes and beyond: a Survey of face
manipulation and fake detection,” Information Fusion, vol. 64,
pp. 131–148, 2020.

C T
R A
E T
R

You might also like