paper5
paper5
https://doi.org/10.1007/s00530-024-01336-6
REGULAR PAPER
Abstract
Manipulation of digital images has become quite common in recent years because of the rise of various image editing tools.
It has become a challenging task to identify authentic and tampered images, since tampered images are non-distinguishable
by the naked eye. Hence, different approaches are used for the identification of tampered and authentic images. However,
conventional techniques are considered to be inefficient in terms of identifying the tampered images effectively. There-
fore, DL (Deep Learning)-based approaches are used for detecting the authentic and tampered images as DL techniques
deliver improved accuracy and better automated FE (feature extraction) skills which help in achieving the desired outcomes.
Moreover, neural networks can extract complex hidden features of the images, thus providing better accuracy of the model.
Therefore, the proposed model utilizes DL-based approaches for effective feature optimization and classification of authentic
and tampered images by employing the proposed EfficientNet model and by incorporating QF (quality factors) into images.
In the proposed feature optimization, SPOA (Seagull Pelican Optimization Algorithm) is used for diminishing the number
of features which drops the computational complexity and aids in refining the performance of the proposed framework by
selecting the relevant and suitable features from the accessible data. Further, in the proposed EfficientNet model, CDT (Cosine
Disintegration Tempering) and TAS (Ternary Attention Structure) are incorporated for classification, where CDT aids the
proposed model in learning discriminant features and helps in preventing the model from getting overfitted on the training
dataset using assimilate rate adopted scheduling and TAS (Ternary Attention Structure) utilized in MBConv possess the
capability to capture both channel attention information and spatial attention information, thereby making the model efficient
and effective for classification of authentic and tampered images. The proposed work employs CASIA 1.0 and CASIA 2.0
dataset for classification. Eventually, the proposed work utilizes different performance metrics for assessing the efficacy of
the model and comparing it with the prevailing models for calculating the effectiveness of the proposed model.
Keywords Forgery detection · Scale Invariant Feature Transform (SIFT) · Speeded Up Robust Features (SURF) · Pelican
Optimization Algorithm (POA) · Seagull Optimization Algorithm (SOA) · EfficientNet
1 Introduction
Communicated by Q. Shen. Today, digital imagery is one of the most common forms
of media for communication among human beings. With
* Arundhati Bhowal
[email protected] the advent and wide availability of digital image-capturing
devices such as point-and-shoot cameras, DSLR (Digital
Sarmistha Neogy
[email protected] Single Lens Reflex) cameras, and mobile phones, image
capturing has become a child’s play. Image editing has also
Ruchira Naskar
[email protected] become extremely easy and convenient for every layman,
given the wide range of image editing software available
1
Department of Computer Science and Engineering, Jadavpur free of cost today. Digital images are vital in sensitive infor-
University, Kolkata 700032, West Bengal, India mation exchange for diverse application domains including
2
Department of Information Technology, Indian Institute judicial, medical, insurance, media, broadcast, and enter-
of Engineering Science and Technology, Shibpur 711103, tainment industries. In legal constitutions of nations across
West Bengal, India
Vol.:(0123456789)
128 Page 2 of 17 A. Bhowal et al.
Fig. 1 An example of copy-move forgery. a Authentic image. b forged image generated by duplicating region from within the same image
Fig. 2 Example of image splicing attack. a, b Authentic source images. c Spliced image generated by intelligent compositing of sources
the globe, digital images have started being adopted as pri- In this work, we deal with the problem of image manipu-
mary sources of evidence for any incident or crime. This lation detection in digital images. In particular, we address
establishes the critical requirement of authenticating and the problem of detecting a specific form of digital image for-
validating the reliability of digital images [1]. Also, with gery, viz. image splicing, which is one of the most common
the widespread use of social media and online social net- forms of image manipulation today. We address the problem
works, the trustworthiness of images has become even more of detecting splicing attacks on images stored in the JPEG
critical, because they are highly capable of influencing the (Joint Photographic Experts Group) format, specifically.
social beliefs of huge masses in a matter of few seconds. A This is because, in almost all of the above scenarios, digital
highly sensitive instance of digital image forgery in recent images that are circulated or we come across, are stored
times, which was taken to social media, is an instance of a in the JPEG format. It is the most common form of image
digital photo that a journalist captured on the first day of the storage, given its versatile compression feature, which poses
G-20 summit in Germany in 2017. A Facebook user super- a challenge for forensic investigators regarding manipula-
imposed a picture of the Russian president on the original tion detection on such images. The proposed method can be
photograph, hence tampering with it.1 Numerous misunder- applied to image formats other than JPEG, including TIFF
standings and debates arose, every single time this photo- and BMP. Splicing and copy-move forgery are two common
graph was posted on social media platforms or circulated on forms of digital image forgery in today’s date. In copy-move
news websites. One such fake image has a large influence forgery or region duplication attack, a portion of an image
on political decision-makers and political movements [2]. is copied and moved to a different target location within the
same image, to repeat or obscure significant image object(s).
In a splicing attack, a single artificial yet natural-looking
1 image is created through an intelligent combination of two
Novak, M. That Viral Photo of Putin and Trump Is Fake. GIZ-
MODO. 2017. Available online: https://bit.ly/3bL6PQo (Accessed on or more individual images [3]. This attack is also termed as
7 July 2020). image compositing attack [4–6]. Examples of both the above
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 3 of 17 128
forms of attackas are shown in Figs. 1 and 2, respectively. To train, especially on high-dimensional data. By selecting
detect image forgery, two approaches are mainly followed, a subset of the most informative features, one can sig-
viz., classical feature extraction-based approach and Deep nificantly reduce the computational resources required
Learning-based methods. Deep Learning-based methods for training and inference, making Deep Learning more
have been more and more popular over the last 10 years, feasible for real-world applications. In this work, a subset
and they have been applied to numerous scientific problems. of features is considered based on Pelican Optimization
This is supported by studies that demonstrate their superior Algorithm (POA) [11] and the Seagull Optimization
performance in segmentation and regression issues as well Algorithm (SOA) [12]. Our experimental results prove
as classification tasks. These methods can even outperform that the generated features enable effective forgery locali-
humans in terms of accuracy and precision. Another impor- zation in JPEG images as well as TIFF and BMP images,
tant factor that has contributed significantly to the adop- at varied factors of compression, while optimizing the
tion of DL techniques is that they do not require the user to drop of detection accuracy unlike other existing methods.
manually create (craft) meaningful features to be used as • Layer optimization. The performance of a deep learning
input to the learning algorithm, which is frequently a dif- model is highly dependent on the choice of hyperparam-
ficult task requiring domain-specific knowledge, as evident eters. Properly adjusting hyperparameters such as drop-
in Convolution Neural Networks (CNNs) [3, 7, 8]. In image out rates, weight decay, and batch normalization can help
forensic problems, DL techniques have been explored vastly improve the model’s generalization and performance. In
in recent times to achieve better accuracy than previously the proposed work, features are selected by both SOA
explored traditional feature learning-based methods. Despite and POA algorithms. This helps to find the right com-
significant advances in deep learning-based image modi- bination of hyperparameters to improve model perfor-
fication detection studies, numerous obstacles and issues mance. It also prevents overfitting, controls computa-
remain. Deep Learning’s use in detecting image manipula- tional resources, and adapts to the characteristics of data
tion is still an emerging field of study. There is still room and the problem domain of compressed image handling.
for improvement in detection performance, and overfitting Following feature optimization, EfficientNet is employed
of a very deep neural network is a possibility. In addition, for classification, with CDT (Cosine Disintegration Tem-
picture forgery localization which seeks to identify question- pering) and TAS (Ternary Attention Structure) used in
able areas within an image has gained more and more inter- the proposed EfficientNet model.
est. On public datasets, existing image forgery localization
techniques have advanced significantly. Nevertheless, all The rest of the paper is organized as follows. In Sect. 2,
techniques see a significant decrease in performance when we present an overview of the related literature that con-
the fake photos are compressed into JPEG format, which tributes to image splicing detection and our contribution
is frequently used in social media posts [9] and also con- to this paper. In Sect. 3, we present and discuss the pro-
tributes to storage savings. The major challenges of image posed method in detail. In Sect. 4, we present and analyze
forgery detection in a compressed domain can be identified the results of our experiments for different quality factors.
as follows. During transmission, tampered images are fre- Finally, Sect. 5 concludes the paper with a discussion on the
quently compressed, introducing a variety of degradation possible future extension of the work.
artifacts including color distortions, ringing effects, blurring,
and blocking abnormalities. The picture tampering traces
and these intricate compression anomalies overlap, signifi- 2 Related work
cantly complicating tamper identification and localization
for current forensic techniques [10]. Hence, the need of the Forgery detection in digital images has been approached in
hour is a robust and accurate forgery detection (and localiza- the existing literature through traditional image analysis-
tion) technique, specifically designed for JPEG-compressed based methods as well as Deep Learning-based techniques.
images. In this respect, our major contributions in this paper Traditional methods majorly rely on analyzing image prop-
can be summarized as follows. erties such as inconsistencies in lighting conditions, noise
patterns, and discontinuities in edges to identify regions
• JPEG splicing detection with optimal feature set. where splicing may have occurred. While these strategies are
Deep Learning models, especially neural networks with effective in some cases, they may struggle with more com-
a large number of layers and parameters, are prone to plex forms of forgery. Deep Learning-based approaches, on
overfitting when dealing with high-dimensional data. the other hand, use Convolutional Neural Networks (CNNs)
Feature selection helps reduce the dimensionality of the to automatically learn and extract features that can distin-
input data by selecting the most relevant features. Also, guish between authentic and altered image regions. Deep
Deep Learning models are computationally expensive to Learning algorithms have shown outstanding effectiveness
128 Page 4 of 17 A. Bhowal et al.
in detecting splicing forgeries, even in difficult situations, measurements, which can then be used to make decisions on
using large datasets and complex architectures. They may the authenticity of the objects in the image.
detect small discrepancies that standard approaches may Nevertheless, the aforementioned conventional methods
miss, making them a promising choice for modern image have certain drawbacks and often focus on a specific forgery
forensics. style. For example, when it comes to geometric modifica-
tions like rotation or scaling, pixel-based and format-based
2.1 Traditional approaches for splicing detection techniques perform poorly. The fact that camera-specific
approaches require the creation of a fresh training set of
The term “classic” or “traditional” procedures are typically images for each camera to build its PRNU model is one of
used to characterize standard passive methods, which draw their evident drawbacks. The limitation with lighting-based
on techniques from signal processing, statistics, physics, and techniques is that they rely on the physical context of the
geometry. Such techniques date back to the pre-Deep Learn- image. Specifically, when dealing with extremely complex
ing era, and their training phase requires little to no data. For lighting conditions (such as an indoor scenario), it becomes
those still needing training data, clustering, Support Vector impossible to estimate a global lighting model, which makes
Machines (SVM), linear/logistic regression, random forests, the method unworkable. Strong presumptions regarding the
and other traditional machine learning approaches are widely geometry of 3D scenes form the foundation of geometry-
applied. They can be characterized as classic approaches based techniques. It is also critical to comprehend the meas-
since they use models with a manageable set of parameters ures derived from certain real-world products [20].
and do not require a lot of training data [13, 14]. We can
classify traditional techniques into five groups depending
on operational principles, viz. pixel-based approaches, for- 2.2 Deep Learning‑based approaches
mat-based approaches, camera-based approaches, lighting-
based approaches, and geometry-based approaches. The Deep Learning (DL) community is growing and increas-
Notable among the pixel-based approaches, Fridrich ing its horizons by utilizing numerous influencing strategies
et al. [15] propose image tamper detection based on discrete and methodologies to establish in-depth networks that aid in
cosine transform (DCT) of image pixels, which can success- amassing enough knowledge, descriptions, and characteris-
fully detect the forged image regions even when the copied tics to solve problems in wide application domains. In the
area is enhanced or retouched to merge it with the back- image forensics domain too, DL methods have started being
ground and when the forged image is saved in a lossy for- widely adopted in recent years. Such methods overcome
mat, such as JPEG. Considering format-specific assumptions many limitations of the traditional approaches, however need
such as in JPEG, format-based approaches, with the help vast training datasets to work efficiently. With the massive
of blocking artifacts, provide techniques that are necessary growth of image editing software and tools in recent years,
for the building of accurate, targeted, and blind steganalysis malicious modification or tampering of digital images has
tools for JPEG images [16]. In 2006, Johnson et al. [17] pro- also been overgrown. Although tampered image identifica-
posed one technique where the authors simulated the effect tion has been explored by various researchers in the recent
of chromatic aberration to detect irregularities in the fabri- decade, localization of tampered image regions remains
cated region of a picture. Such approaches may be termed as an open problem, with significant scope for exploration.
camera-based techniques for image forgery detection. How- The authors in [21] identified tampering as comparatively
ever, such techniques are camera dependent. In contrast, the an easier task and considered localization more difficult.
lighting-based techniques for image tamper detection are Their main goal is to create an efficient detector that can
robust as they are not dependent on the camera model. For accomplish both localization and image forgery detection.
example, Johnson et al. [18] explore lighting discrepancies They developed a very effective CNN architecture that can
as a method for revealing indications of digital meddling automatically extract features and create an RFM (reliability
in images. Lastly, the class of geometry-based methods for fusion map) to increase the accuracy and localization resolu-
image authentication is based on significant assumptions tion for tempering detection. Similarly, authors in [22] used
about the geometry of the 3D scene captured. They also the SVM classifier for texture pattern classification, together
require human knowledge of the real-world measures gath- with the standard deviation blocks discrete cosine transfor-
ered from specific items in the image. In Johnson et al.’s mation of textures. The datasets used in the study included
work [19], the aim of the authors’ was to consider specific the CASIA TIDE v1.0 dataset, the Wild Web dataset, and
known objects, such as billboard signs and license plates in the CUISDE (Columbia Uncompressed Image Splicing
an image, and make them planar by a perspective change. Detection Evaluation) dataset. The experimental results
Once the reference objects are viewed in a convenient plane, indicate that their model has the potential to determine the
it is feasible, through camera calibration, to make real-world location of tamper boundaries.
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 5 of 17 128
Pawar et al. [23] provide a prototype model in the form image quality. JPEG often reaches 10:1 compression with
of a tool in their work that will retrieve all image files from minimal image quality degradation. JPEG has been the most
given digital evidence and detect manipulation in the photo- frequently used image compression standard in the world
graphs. Different tampering detection algorithms have been since its inception in 1992, as well as the most widely used
employed to identify various types of tampering. The sug- digital picture format, with several billion JPEG images pro-
gested prototype will detect whether or not tampering has duced as of 2015. Image forgery detection and localization,
occurred and will categorize image files based on the type which seeks suspicious regions modified using splicing,
of tampering. The authors in [24] addressed the problem of copy-move, or removal operations, is gaining popularity as it
determining which features to train on and how to localize plays an important role in various applications of day-to-day
manipulation in a tempered image. To attain improved accu- life. On public data sets, existing image forgery localization
racy, they used a multi-stream version of the Faster R-CNN algorithms have shown significant performance. However,
network rather than a single stream alone. The second stream when the forged photos are JPEG compressed and are often
is fed by the element-wise sum of the ELA and BAG error used in social media transmission while saving substantial
maps. storage, such approaches suffer a significant performance
The task of certifying the originality and validity of pho- drop.
tographs places numerous constraints on tamper detection Alipour et al. [29] developed a new approach for detecting
algorithms. In most situations, photos obtained from the and localizing non-aligned JPEG forgeries using semantic
internet are subjected to various transformations such as pixel-wise segmentation of JPEG blocks through a deep
noise reduction, scaling, filtering, and even compression. CNN. The trained network accurately detects block bound-
As a result, it is critical to improve the reliability of photo- aries related to various JPEG compressions, enabling the
graphs with tamper detection techniques. Therefore, Diallo detection and localization of irregularities in the boundaries
et al. [25] have concentrated on applying the CNN concept for forgery detection. The proposed algorithm outperforms
to a robust framework. Bevinamarad et al. [26] emphasized state-of-the-art approaches in detecting and localizing non-
the importance of using a robust detection system for detect- aligned JPEG forgeries. Chen et al. [30] presented a novel
ing tampered images, and their work used DCT and SVD method for detecting image splicing forgery using a sim-
(Singular Value Decomposition) for extracting the important plified noise model of JPEG pictures. The model can be
features, and KNN (K-Nearest Neighbor) for precisely clas- used as camera fingerprints for forgery detection, assuming
sifying forged images from unforged images. They used the that pixel variance is a quadratic function of their expecta-
CMFD and CISD (Columbia Image Splicing) datasets to test tions. Based on the model and employing hypothesis testing
the model’s stability, and they were successful in achieving theory, a training-free Generalized Likelihood Ratio Test
the model’s F1 score, which is 92.89 for the CMFD dataset (GLRT) is created for high detection performance with a
and 93.75 for the CISD dataset. predetermined false alarm rate.
Authors in [27] used ResNet50v2 architecture, which uses Due to the propagation of false information, Ali et al. [31]
residual layers, to implement YOLO-CNN. The CASIA_v1 created a DL algorithm that is intended to detect picture for-
and CASIA_v2 datasets, which included two separate cat- geries induced by double compression, which is a significant
egories authentic images and tempered images were used in cause for concern. It is shown that their technique is quicker
their work. The study’s findings showed that the CASIA_ and more accurate than current methods, and it is based on
v2 dataset produced more comprehensive results than the the difference between an image’s original and recompressed
CASIA_v1 dataset. An image splicing forgery detection versions. To detect splicing manipulation in a JPEG image,
which was based on the modified SSD network has been Zeng et al. [32] proposed a new multitask model called Att-
utilized in the work [28] for identifying the forgery images. DAU-Net to detect splicing attack on an image. The model
The image’s testing results revealed that the ISD-SSD algo- is based on an attention mechanism, densely linked network,
rithm, which was used in their work, performed better than Atrous Spatial Pyramid Pooling (ASPP), and U-Net. How-
other popular techniques including the MFCN algorithm and ever, this model also loses performance when various quality
Faster R-CNN. factors are applied to the compressed public dataset.
In 2023, Ding et al. [33] proposed DCU-Net method. The
2.3 Image forgery detection in JPEG‑compressed key components of the detection framework based on DCU-
domain Net are encoder, feature fusion, and decoder. The model
takes as input the original altered image and the altered
Joint Photographic Experts Group (JPEG) is a popular lossy residual image. The dilation convolution approach is used
compression method for digital images, notably those cre- to extract the changed features with different granularities
ated by digital photography. The degree of compression is after the first fusing of the deep features acquired from the
adjustable, providing a choice between storage capacity and dual-channel encoding network. Using the fused feature map
128 Page 6 of 17 A. Bhowal et al.
as input, the expected image is decoded layer by layer. When images since DL techniques deliver improved accuracy and
the dataset is compressed with varying quality factors, the better automated FE skills. Due to these factors, the pro-
performance of the majority of the existing algorithms suf- posed model employs DL-based EfficientNet for the classi-
fers significantly. In this regard, our key target here is to fication of authentic and tampered images as the EfficientNet
attain robustness against compression with low-quality fac- model employs compound scaling techniques, which makes
tors in JPEG images. the model effective for classification. The datasets used in
the proposed model are CASIA 1.0 and CASIA 2.0. Once
2.4 Motivation the datasets are loaded, the images present in the dataset
are pre-processed using different pre-processing techniques
Though different existing studies have delivered reason- which include compression of images and contrast image
able outcomes for the classification of authentic and altered enhancement. Quality factors with the range of 50, 60, 70,
images, there are a few limitations when it comes to attain- 80, 90, and 100 are compressed to attain better results for the
ing better accuracy of the model for effective and precise model. The contrast image enhancement approach is used
classification of authentic and tampered images. Hence, the to enhance the quality of the images by up-surging the dif-
proposed model utilizes DL-based EfficientNet for the clas- ferences between the dark and light regions in an image to
sification of authentic and tampered images. The datasets enhance the details and improve the overall quality of the
used in the proposed study include CASIA 1.0 and CASIA image. Once the images were pre-processed, feature extrac-
2.0 datasets. The proposed model incorporates QF (quality tion proceeded using Hog, SIFT, Surf, and HSV methods.
factor) in the pre-processing stage and different FE (feature Feature extraction reduces the number of features present in
extraction) techniques for reducing the number of features the dataset by creating new features from the already exist-
present in the dataset using Hog, HSV, SURF, and SIFT ing ones. Once the features are extracted, the model is split
techniques. Further, feature optimization is proceeded by into train (80%) and test split (20%).
using the proposed SPOA (Seagull–Pelican Optimization After that, feature optimization is carried out using SPOA
Algorithm) for minimizing the number of features which hyperparameter tuning, as the proposed work utilizes the
decreases the computational complexity and aids in improv- behavior of seagulls and pelicans to attain better outcomes
ing the performance of the proposed model by selecting the for feature optimization. After performing feature optimiza-
relevant and appropriate features from the accessible data. tion, classification is performed using EfficientNet, where
Finally, the classification of the proposed model is car- CDT and TAS are used in the proposed EfficientNet model.
ried out by utilizing EfficientNet layer by incorporating CDT CDT used in the EfficientNet layer aids the proposed model
layer and TAS in MBConv layer of the proposed framework to learn features that are discriminant and helps in prevent-
for obtaining effective model for the classification of authen- ing the model from getting overfitted on the training dataset
tic and tampered images. CDT is a assimilate rate adopted using assimilate rate adopted scheduling and TAS utilized
scheduling approach, which is commonly used for training in MBConv possess the capability to capture both chan-
the neural networks. CDT is primarily used in the proposed nel attention information and spatial attention information,
framework due to its ability to avoid overfitting of the model, thereby making the model efficient and effective for clas-
which can make the model efficient for classification. sification of authentic and tampered images.
Similarly, TAS technique utilized in the proposed mecha- Further, TAS not only has the ability to capture the long-
nism possesses the ability to capture both channel attention term dependencies between the networks but also aids in
information and spatial attention information for improv- retaining the precise location information which helps to
ing the performance of the model for classification. Fur- enhance the accuracy of the proposed model for classifica-
ther, TAS not only has the ability to capture the long-term tion of authentic and tampered image classification. Finally,
dependencies between the networks but also aids in retaining the proposed model predicts if the images are tampered with
the precise location information which helps to enhance the or authentic images. If the images identified by the model
accuracy of the proposed model for classification of authen- are tampered with, localization is performed by localizing
tic and tampered image classification. Eventually, the model the tampered image area. Eventually, the performance of
is evaluated using different metrics. the projected system is measured using various metrics.
Figure 3, depicts the illustrative diagram of the proposed
flow. The figure demonstrates the proposed model’s process
3 Proposed methodolgy of feature extraction and feature optimization using SPOA,
classification of the model is accomplished by employing
Images can be tampered easily using different digital tech- CDT and TAS model and finally, the model is predicted by
niques. Therefore, the proposed model utilizes DL-based classifying the images as authentic and tampered images.
approaches for the detection of tampered and authentic Moreover, if the images are reflected as tampered images,
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 7 of 17 128
localization proceeds by determining the precise boundaries • SIFT (Scale Invariant Feature Transform): SIFT
or location of the tampered regions within the images. techniques help with detecting and describing the local
features present in an image that are invariant to different
3.1 Feature extraction aspects like rotation, scale, and transformation.
FE (feature extraction) is employed once the images pre- Due to the various significance of these FE techniques, the
sent in the dataset are pre-processed using different pre- proposed model utilizes these approaches for effective fea-
processing methods. FE is primarily used for reducing the ture extraction.
number of features present in the dataset, by detecting and
extracting the most relevant and notable features from the 3.2 Feature optimization
original data. Therefore, to perform feature extraction, the
proposed method incorporates different techniques such as Feature optimization is the process of minimizing the
HOG, SIFT, HSV, and SURF techniques. number of features which decreases the computational
complexity and aids in improving the performance of the
• HOG (Histogram of Oriented Gradients): The HOG proposed model by selecting the relevant and appropriate
technique is used in image processing for object detec- features from the accessible data. Therefore, to carry out
tion. This method counts the occurrences of gradient feature optimization the proposed framework utilizes SPOA
orientation in the localized portion of an image and it is (Seagull–Pelican Optimization Algorithm). Though there
considered to be similar to SIFT. In FE, the HOG tech- are various optimization algorithms, the proposed model
nique examines the distribution of gradient orientations utilizes seagull and pelican algorithms since pelican and
in an image and represents the images as histograms. seagull algorithms deliver better results than other existing
• HSV (Hue, Saturation, Value): HSV is defined as a algorithms in terms of solving optimization issues. The pri-
color space representation that separates the informa- mary objective of utilizing POA is that it utilizes simulation
tion of the color into three different components, which of the normal behavior of the pelican during the process of
include hue, saturation, and value. Saturation is defined hunting. In POA, the hunting strategy is divided into explo-
as the purity or intensity of the color, hue aids in denot- ration and exploitation stages. In the exploration phase, the
ing the dominant color, and value defines the brightness. pelicans initially detect the location of prey and gravitate
• SURF (Speeded Up Robust Features): SURF aids in toward the identified region. This strategy of the pelican
detecting and describing the local features depending on results in search space scanning. The significant point in
the distribution of Haar wavelet responses. POA is that the location of the prey is arbitrarily produced
128 Page 8 of 17 A. Bhowal et al.
in the search space, which increases the power of explora- size, accuracy, and computational efficiency of the model
tion and helps in obtaining the exact search of the problem- is projected in the EfficientNet model. As the EfficientNet
solving space. In the exploitation phase, when the pelican model follows compound scaling approach, it uniformly
reaches the surface of the water, the wings are spread on the scales the width, depth, and resolution of the network.
surface of the water to move the fish in an upward direction, Moreover, it comprises base network, which is typically a
then the prey is collected in the throat pouch. This strat- variation of CNN called MobileNet, and a set of scaling
egy result in more fish in the attacked area to be caught by coefficients that determine the network size. MBConv is
the pelicans. Further, exhibiting this behavior of pelicans applied in the efficient architecture, which is considered to
can cause POA to congregate at better points in the hunting be similar to the mobileNetV2 architecture. In traditional
zone. Therefore, this process maximizes the power of local EfficientNet architecture, MBConv is used, which is con-
search and aids in better exploitation capability of pelican sidered as a combination of inverted residual blocks and
optimization. depth-wise separable convolutions. Moreover, SE (Squeeze
Similarly, SOA utilizes seagull rummaging behavior for and Excitation) optimization technique is also used for per-
solving problems, where the movement and search behavior formance enhancement. The MBConv layer is considered as
of the seagulls includes exploration, exploitation, and inten- a central block of EfficientNet architecture.
sification. In exploration, search spaces are explored by sea- The MBConv layer initiates with a depth-wise convolu-
gulls by moving around arbitrarily. Then promising regions tion, then subsequently point-wise convolution (1x1 convo-
are exploited by seagulls by concentrating their search in lution) which enlarges the number of channels, and eventu-
those areas in the exploitation phase, and eventually, the ally another 1x1 convolution which eliminates the channel
search is intensified by seagulls to refine the solutions in the back to the original number. This bottleneck design permits
best-found area. the model to learn more effectively while upholding repre-
Though these algorithms possess various benefits like sentational power of high degree.
finding optimal and almost near-optimal solutions by Furthermore, traditional EfficientNet employs SE blocks,
enhancing the population repeatedly, it has a few draw- which enable the model in learning to focus on key features
backs which include such as lower convergence accuracy while suppressing less important ones. Further, SE block
and easily falling into local optimal especially while solv- also utilizes GAP (Global Average Pooling) for reducing the
ing complex problems. Moreover, the conventional seagull spatial dimensions of the feature maps to a single channel,
optimization aids in managing the optimum search region then followed by 2-FCL. These layers permit the model to
in every run, it may not converge adequately and conse- absorb channel wise-feature dependencies and aid in creat-
quently in some specific runs, due to this, the last generated ing attention weights which are reproduced with the original
outcome may suffer from poor quality. Hence, to overcome feature map and emphasize the information which are con-
these issues, the proposed model amalgamates both Pelican sidered to be significant. Though the traditional EfficientNet
and Seagull Optimization Algorithms to obtain better out- architecture possesses reasonable advantages like the ability
comes for effective feature optimization. Figure 4 depicts the to learn quickly, there are a few limitations due to which the
Seagull–Pelican Optimization Algorithm (SPOA). conventional model lacks in delivering better performance
The combined approach, called Seagull–Pelican Optimi- for classification, which includes the requirement of many
zation Algorithm (SPOA), used in this work helps to find the computational resources for training and lack of interpret-
best combination of hyperparameters for effective feature ability; therefore, to overcome these concerns, the proposed
optimization. The SPOA will test different hyperparameters, model utilizes CDT in EfficientNet input layer and ternary
such as learning rates and the number of filters in the model. attention structure in the MBConv layer.
It calculates a score∖metric for each combination of hyper- In Fig. 5, the process is initiated by utilizing the features
parameters. Based on these scores, the SPOA identifies the after implementing the feature optimization approach, then
best set of hyperparameters to use for our model. By finding the images are sent to EfficientNet input layer which incor-
the right combination of hyperparameters, we can improve porates CDT using assimilate rate adopted scheduling. CDT
the overall performance of the model. The SPOA enables a aids the proposed model in learning discriminant features
systematic optimization process, methodically exploring dif- and helps in preventing the model from getting overfitted
ferent hyperparameter configurations to identify the best_fit on the training dataset. Moreover, Adam optimizer has been
that delivers optimal results. used in the model for delivering the ARDS (Assimiliate
Rate Adopted Scheduling) with a better learning scheduler.
3.3 Classification—EfficientNet input layer Once CDT approach is applied, the features are passed to CL
(Convolutional Layers), which helps with providing more
EfficientNet is a CNN built upon a concept termed as com- flexibility in learning. After employing Conv layer, the fea-
pound scaling. A longstanding trade-off between the model’s tures are passed to MBConv Layer, where TAS is carried out
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 9 of 17 128
using precise location information. TAS utilized in MBConv The proposed mechanism used TAS mechanism in MBConv
possesses the capability to capture both channel attention layer, which is found between Deepwise Conv layer and
information and spatial attention information, thereby mak- Conv layer. The figure also depicts the TAS mechanism.
ing the model efficient and effective for the classification of
authentic and tampered images. Further, TAS not only can 3.3.1 Cosine Disintegration Tempering
capture the long-term dependencies between the networks
but also aids in retaining the precise location information Cosine Disintegration Tempering (CDT) is a assimilate rate
which helps to enhance the accuracy of the proposed model adopted scheduling approach, which is commonly used for
for classification of authentic and tampered image classifi- training neural networks. CDT aids in slowly minimizing the
cation. Eventually, the obtained features are passed to the learning rate throughout training in a cyclic manner. Based
output layer, where the images are predicted whether the on cosine function, the learning rate is adjusted in CDT. By
images are authentic images or tampered images. adjusting the learning rate over time, the CDT permits the
Figure 6 depicts the working mechanism involved in network to fine-tune the weights and converge quickly and
EfficientNet input layer and proposed MBConv layer. This precisely to obtain an ideal solution, which results in better
figure depicts the EfficientNet input layer and MBConv layer performance of the proposed model. CDT is used in the
used in the proposed model to improve classification results. proposed model since CDT possesses the ability to work
128 Page 10 of 17 A. Bhowal et al.
with models that encompasses of huge number of param- the space of the parameter more meticulously. Moreover,
eters. There are several advantages of employing CDT in CDT also helps with stabilizing the training subtleties by
the proposed framework, which include the prevention of preventing the learning rate from becoming too small or too
overfitting and detection of better minima and this can be large and this steadiness eventually leads to more depend-
achieved since CDT permits the proposed model to explore able and consistent training outcomes.
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 11 of 17 128
In this proposed model, cosine annealing scheduling TAS model aids in overcoming these issues by assigning
serves as an effective learning rate scheduler, determining weights not only to the various feature channels but also
the optimal amount of input data to pass through for achiev- to different spatial positions on each channel. Further, the
ing better results. The learning rate for the input is dynami- proposed TAS possesses the ability to capture both chan-
cally controlled by the cosine annealing schedule, which nel attention information and spatial attention information
operates over a specified number of iterations. During this for improving the performance of the classification model.
iteration period, the learning rate smoothly decreases from TAS shapes the interaction between H (Spatial Height) and
an initial maximum value to a minimum value, following the dimension of the channel C. Attention weights which
a cosine function pattern. This gradual decrease allows for are generated using the sigmoid function are applied to AH
larger weight updates in the early stages, facilitating faster with the same width and perform WFM (weighted feature
convergence during training. As the training progresses, maps). Eventually, the calculation process is proceeded
the learning rate is slowly annealed, enabling finer weight using Eq. 1
adjustments toward the end for better convergence and
AH+ = AH+ AH ⋅ 𝜎 ConvBN C − pool AH (1)
( ( ( ( )))
generalization.
Once the iteration period ends, the learning rate typi-
where AH+ denotes the clockwise rotation of 90 degrees
cally resets to its maximum value, and the cosine annealing
along the H axis, ConvBN denotes amalgamation opera-
process repeats, ensuring a continuous cycle of exploration
tion, and BN (Batch Normalization) 𝜎 is defined as sigmoid
and exploitation. Cosine annealing scheduling offers several
activation function, compound pooling is denoted using
advantages. It aids the model in converging faster by effec-
C-Pool. Further, the input function is processed using AP
tively exploring the parameter space through larger initial
(Average Pooling) and MP (Max Pooling) along with the
updates. In addition, the smooth decrease in learning rate
dimensions of the channel and via concatenation, resulting
prevents the model from getting trapped in local minima,
features are amalgamated. Then AW is the input fed to the
enhancing its ability to generalize to unseen data.
attention branch for generating the WFM, and Eq. 2 is used
Overall, this cosine annealing scheduler adjusts the learn-
for calculating the value,
ing rate in a smooth, cosine-like pattern, leveraging the
benefits of both larger initial updates and finer later adjust- AW+ = AW+ AW ⋅ 𝜎 ConvBN C − pool AW (2)
( ( ( ( )))
ments, ultimately leading to improved model performance
and convergence. where, AW+is defined as a 90-degree rotation along the W
axis. Finally, the rotation is carried out in the same channel,
3.3.2 MBConv—Ternary Attention Structure and the generation of WFM is carried out. The process is
utilized using the Eq. 3
MBConv module encompasses CL of kernel size 1 x 1,
depth-wise separable convolution, dropout layer, and pro-
AC = A ⋅ 𝜎(ConvBN(C − pool(A))) (3)
posed TAS (Ternary Attention Structure). The proposed Finally, the TAS would aggregate by generating Eq. 4
TAS aids in enhancing the feature representation capability
of the proposed model by capturing the channel attention i=
1(
A A A
)
(4)
information in the input feature map. The TAS utilizes ter- 3 H+ W+ C
nary attention mechanism for estimating the significance of Each branch uses SA (simple averaging) to realize the fusion
each channel in the input feature map for the present task of both spatial attention and channel attention informa-
and aids in weighing them. Proposed TAS can be achieved tion. Therefore, the incorporation of CDT and TAS in the
by employing, the input feature of size C x H x W into the proposed EfficientNet model will help deliver better per-
feature map of 1 x 1 x C. Then a Fully connected NN is formance for the classification of authentic and tampered
utilized for non-linearity transformation and generates the images.
weights that are activated through sigmoid activation func-
tion. Then the weights that are activated are utilized to weigh
each channel in the input FM.
The existing approaches used in MBConv for Efficient- 4 Results and discussion
Net only pay attention to the significant feature channels
and is incapable of estimating the significance of image Results and discussion of the proposed work are elucidated
feature at different spatial positions, which limits the accu- in the subsequent section by depicting a description of
racy of the model for the classification of authentic and the dataset, performance metrics, performance analysis,
tampered images. Therefore, utilization of the proposed experimental outcome, and comparative analysis.
128 Page 12 of 17 A. Bhowal et al.
4.1 Dataset description • Recall Recall is the ratio of the number of correct posi-
tive samples out of those that were classified as positive.
Two different datasets are used in the proposed model, which It is calculated by the Eq. 8
includes CASIA 1.0 and CASIA 2.0 datasets.
Rc = TP∕(FN + TP) (8)
• CASIA V1 - CASIA V1 dataset comprises 1721 color
images with size of 384 x 256 pixels with JPEG format.
The images present in the dataset are fragmented into two 4.3 Performance analysis
subsets which include the tampered set and the authen-
tic set. Nine hundred twenty-one images are stored in Performance analysis is utilized for examining the efficacy
tampered set and eight hundred images are stored in an of the proposed method using innumerable metrics such as
authentic set. confusion matrix, accuracy, F1 score, recall, and precision
• CASIA V2 - CASIA V2 comprises 12,333 images and for both tampered and authentic images. Therefore, the sub-
with the same 2 subsets which include tampered set and sequent section focuses on examining the performance of the
an authentic set. 7200 images are encompassed in authen- projected system in terms of the classification of authentic
tic images and 5123 tampered images are consisted in and tampered images.
tampered set. Nevertheless, CASIA 2.0 is considered to Figure 7 displays the confusion matrix of the projected
be more comprehensive and challenging than the CASIA framework. Confusion matrix is primarily used for visual-
1.0 dataset. Aside from splicing, the CASIA 2.0 dataset izing and summarizing the performance of the classifica-
includes blurring when modifying the manipulated image tion algorithm. The TP (an outcome correctly predicted as a
set. tempered image), TN (an outcome correctly predicted as an
authentic image), FP (an outcome incorrectly predicted as a
tempered image), and FN (an outcome incorrectly predicted
4.2 Performance metrics as an authentic image) obtained by the proposed model for
CASIA 1.0 dataset in Fig. 7a is 554 for TP, 2 for FP, 2 for FN
Performance metrics are used for analyzing the efficacy of and 153 for TN. Likewise, the confusion matrix of the pro-
the proposed work using different evaluation metrics which posed model for the CASIA 2.0 dataset is depicted in Fig. 7b
include precision, accuracy, F1 score, and recall rate. in which the TP, FP, FN, and TN are 1227, 10, 15, and 1458,
respectively. From the confusion matrix, it is identified that
• Accuracy It provides a measure of the overall correctness the proposed model has detected correct classification rather
of the model. It is computed as follows: than misclassification.
Like the confusion matrix, different other metrics are also
TP + TN considered for assessing the efficacy of the proposed frame-
Accuracy (%) = × 100% (5)
TP + TN + FP + FN work. Therefore, Table 1 illustrates the precision, recall, F1
Where TP is represented as True Positive, TN is rep- score, and accuracy of the proposed model for classifying
resented as True Negative, FP is represented as False authentic and tampered images for both the CASIA 1.0 data-
Positive, and FN is False Negative. set as well as CASIA 2.0 dataset.
• Precision Precision tells when the model predicts posi- The real images’ precision, recall, F1 score, and accu-
tive and how often it is positive. Model precision is com- racy for CASIA1.0 are 0.9964, 0.9964, 0.9964, and 0.9944,
puted as: respectively. Similarly, multiple measures are employed to
evaluate the suggested model for manipulated photos. In this
TP case, the suggested model’s precision, recall, F1 score, and
Precision = (6)
TP + FP accuracy for manipulated photos are 0.9871, 0.9871, 0.9871,
• F1 score F-measure is a versatile performance metric and 0.9944. For CASIA2.0 the proposed model’s precision,
based on both precision and recall. When both preci- recall, F1 score, and accuracy for real images are 0.9945,
sion and recall become high, F-measure will also be 0.9918, 0.9932, 0.9941 and for tempered images 0.9935,
increased, and it becomes perfect 1 only when both pre- 0.9936, 0.9935, and 0.9941, respectively.
cision and recall values become 1. The formulation of F1 The accuracy achieved by the proposed framework after
score is as follows: compressed with various quality factors is shown in Table 2.
Both datasets are compressed with quality factors rang-
F1 − Score = 2 ×
Precision × Recall
(7) ing from 50 to 100. When QF is equal to 50, the proposed
Precision + Recall approach has an accuracy of 0.9887 for CASIA1.0. Simi-
larly, when QF is between 60 and 70, the model’s accuracy
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 13 of 17 128
Table 1 Performance analysis of the proposed model for CASIA 1.0 Table 2, in which the accuracy value varies with the range
and CASIA 2.0 dataset of QF. When QF is 50, the proposed technique produces
Class Precision Recall F1 score Accuracy (%) an accuracy of 0.9871; when QF is 60, the model produces
an accuracy of 0.9875; and when QF is 70, the model pro-
CASIA 1.0 Authentic 0.9964 0.9964 0.9964 0.9944 duces an accuracy of 0.9908. Similarly, the suggested work
Tampered 0.9871 0.9871 0.9871 0.9944
achieves an accuracy of 0.9919 when QF is 80, 0.993 when
CASIA 2.0 Authentic 0.9945 0.9918 0.9932 0.9941
QF is 90, and 0.9941 when QF is 100. Even after substan-
Tampered 0.9935 0.9936 0.9935 0.9941 tial compression, the performance of the proposed technique
does not suffer significantly.
According to the performance analysis, many dimen-
Table 2 Performance analysis Dataset QF Accuracy (%) sions are addressed for measuring the efficacy of the sug-
of the proposed model for gested model utilizing metrics such as accuracy, F1 score,
different quality factors (QF) 50 98.87 recall, and precision. Furthermore, quality considerations
in CASIA 1.0 and CASIA 2.0
60 99.08 are taken into account while evaluating the efficacy of the
dataset
CASIA 1.0 70 99.08 proposed framework, with the QF value ranging from 50 to
80 99.37 100. Though the proposed model produced results for the
90 99.37 categorization of authentic and tampered photos, the pro-
100 99.43 posed model’s competency is further tested by comparing
50 98.71 the projected model to the existing models.
60 98.75
CASIA 2.0 70 99.08 4.4 Experimental outcome
80 99.19
90 99.3 Experimental results obtained using the proposed model are
100 99.41 depicted in the subsequent section, where the authentic and
tampered images classified using the proposed framework
for the CASIA 2.0 and CASIA 1.0 dataset are depicted in
is 0.9908. Similarly, the proposed approach achieves 0.9937 Fig. 8 and Fig. 9.
accuracy at QF values of 80 and 90. Furthermore, the accu-
racy of the suggested model for QF 100 is 99.43. As a result, 4.5 Comparative analysis
the proposed model maintains accuracy regardless of how
the photos are compressed. Different prevailing methods are compared with the pro-
Similar to CASIA 1.0, the performance of CASIA 2.0 posed work to examine the performance of the proposed
dataset with varying quality factors is also depicted in work. Therefore, this section compares the existing
128 Page 14 of 17 A. Bhowal et al.
Fig. 8 Images detected as a authentic and b tampered by the proposed model for CASIA2 dataset
Fig. 9 Images detected as a authentic and b tampered by the proposed model for CASIA1 dataset
techniques with the proposed technique for the classification 98.25%, 94.55%, for CASIA1.0 dataset. However, the pro-
of authentic and tampered images. Table 3 depicts the dif- posed method obtained an accuracy rate of 99.44%. The
ferent prevailing methods along with the accuracy obtained graphical illustration of the model is demonstrated in Fig. 10
by implementing the models. Likewise, for CASIA 2.0 dataset, Hosney et al. [34], Hu
The accuracy obtained by the different models includes et al. [35], Muniappan et al. [36], Nath et al. [38], Ding
Hosny et al. [34], Muniappan et al. [36], Wu et al. [37], Kan- et al. [39], Niyishaka et al. [40], Kanwal et al. [41], Latif
wal et al. [41], Latif et al. [42], are 99.1%, 79%, 91.6%, et al. [42] accuracys’ are 99.3%, 94.78%, 89%, 96.45%,
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 15 of 17 128
Table 3 Comparative analysis of CASIA 1.0 and CASIA 2.0 dataset the input feature map and aids in retaining precise location
CASIA 1.0 Dataset CASIA 2.0 Dataset
information, which aids in improving the accuracy of the
Model Accuracy (%) Accuracy (%)
projected model for classification of authentic and tampered
images.
Hosny et al. [34] (2023) 99.1 99.3
Hu et al. [35] (2023) – 94.78
Muniappan et al. [36] 79 89 5 Conclusion
(2023)
Wu et al. [37] (2022) 91.6 –
In this work, the proposed model utilized DL-based
Nath et al. [38] (2021) – 96.45
approaches for effective feature optimization and classifica-
Ding et al. [39] (2021) – 97.93
tion of authentic and tampered images by employing the
Niyishaka et al. [40] – 94.59
(2021) proposed EfficientNet model. In the proposed work feature
Kanwal et al. [41] (2020) 98.25 97.59 optimization, SPOA was used to diminish the number of fea-
Latif et al. [42] (2019) 94.55 96.3 tures which drops the computational complexity and aids in
Proposed Method 99.44 99.41 refining the performance of the proposed model by selecting
the relevant and suitable features. Further, in the proposed
EfficientNet model, CDT and TAS were assimilated for clas-
sification, where CDT aided the proposed model in learning
97.93%, 94.59%, 97.59%, and 96.3%. The proposed method discriminant features and helped in preventing the model
outperformed all state-of-the-art algorithms with an accu- from getting overfitted and TAS exploited in MBConv pos-
racy rating of 99.41% on the CASIA2.0 dataset. The graphi- sessed the capability to capture both channel attention infor-
cal illustration of the model is demonstrated in Fig. 11 mation and spatial attention information, thereby making the
Because the proposed system uses CDT and TAS to model efficient and effective for classification of authentic
achieve higher classification performance, the experimen- and tampered images. The model works well with JPEG-
tal results show that the proposed model performed better compressed photos, but it also maintains accuracy with TIFF
for the classification of legitimate and tampered photos. The and BMP images, regardless of the level of compression.
obtained accuracy was higher for both the CASIA 1.0 and In the future, the different datasets can be utilized for
CASIA 2.0 datasets. The proposed CDT implementation the classification of authentic and tampered images using
assisted in avoiding model overfitting, and TAS helped in the projected framework and different DL approaches
enhancing the feature representation ability of the projected can be used to obtain even more effective outcomes for
framework by capturing channel attention information in classification.
Acknowledgements This work is technically supported by Science 6. Singhania, S., Arju, N., Singh, R.: Image tampering detection
and Engineering Research Board (SERB), Department of Science and using convolutional neural network. Int. J. Synth. Emot. (IJSE)
Technology (DST), Govt. of India, Grant No. SPG/2022/002222, dated: 10(1), 54–63 (2019)
03/10/2023 received at IIEST Shibpur, and funding for research fellow- 7. Guillaro, F., Cozzolino, D., Sud, A., Dufour, N., Verdoliva, L.:
ship by AICTE ADF Scheme at Jadavpur University, Ref. No. ADF/ Trufor: Leveraging all-round clues for trustworthy image for-
FET/PRE-Ph.D/09/2022-23 dated 02.11.2022. gery detection and localization. In: Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition,
Author Contributions The work is done by A.B., which is jointly super- pp. 20606–20615 (2023)
vised by S.N. and R.N. 8. Thakur, R., Rohilla, R.: Recent advances in digital image
manipulation detection techniques: a brief review. Foren. Sci.
Data Availability Data will be made available on reasonable request. Int. 312, 110311 (2020)
9. Walia, S., Kumar, K.: Digital image forgery detection: a sys-
Declarations tematic scrutiny. Aust. J. Foren. Sci. 51(5), 488–526 (2019)
10. Wang, M., Fu, X., Liu, J., Zha, Z.-J.: Jpeg compression-aware
Conflict of interest The authors declare that they have no conflict of image forgery localization. In: Proceedings of the 30th ACM
interest. International Conference on Multimedia, pp. 5871–5879 (2022)
11. Trojovskỳ, P., Dehghani, M.: Pelican optimization algorithm:
a novel nature-inspired algorithm for engineering applications.
Sensors 22(3), 855 (2022)
References 12. Che, Y., He, D.: An enhanced seagull optimization algorithm for
solving engineering optimization problems. Appl. Intell. 52(11),
1. Kaur, N., Jindal, N., Singh, K.: A passive approach for the detec- 13043–13081 (2022)
tion of splicing forgery in digital images. Multimed. Tools Appl. 13. Rao, Y., Ni, J.: A deep learning approach to detection of splicing
79, 32037–32063 (2020) and copy-move forgeries in images. In: 2016 IEEE International
2. Islam, M.M., Karmakar, G., Kamruzzaman, J., Murshed, M.: Workshop on Information Forensics and Security (WIFS), pp.
A robust forgery detection method for copy-move and splicing 1–6 (2016). IEEE
attacks in images. Electronics 9(9), 1500 (2020) 14. Rajini, N.H.: Image forgery identification using convolution
3. Bourouis, S., Alroobaea, R., Alharbi, A.M., Andejany, M., neural network. Int. J. Recent Technol. Eng. 8(1), 311–320
Rubaiee, S.: Recent advances in digital multimedia tampering (2019)
detection for forensics analysis. Symmetry 12(11), 1811 (2020) 15. Fridrich, J., Soukal, D., Lukas, J., et al.: Detection of copy-move
4. Wang, X.-Y., Wang, C., Wang, L., Jiao, L.-X., Yang, H.-Y., Niu, forgery in digital images. In: Proceedings of Digital Forensic
P.-P.: A fast and high accurate image copy-move forgery detection Research Workshop, vol. 3, pp. 652–63 (2003). Cleveland, OH
approach. Multidimens. Syst. Signal Process. 31, 857–883 (2020) 16. Pevnỳ, T., Fridrich, J.: Estimation of primary quantization
5. Sujin, J., Sophia, S.: Copy-move geometric tampering estimation matrix for steganalysis of double-compressed jpeg images. In:
through enhanced sift detector method. Comput. Syst. Sci. Eng. Security, Forensics, Steganography, and Watermarking of Mul-
44(1) (2023) timedia Contents X, vol. 6819, pp. 392–404 (2008). SPIE
Deep Learning‑based forgery detection and localization for compressed images using a hybrid… Page 17 of 17 128
17. Johnson, M.K., Farid, H.: Exposing digital forgeries through 32. Zeng, P., Tong, L., Liang, Y., Zhou, N., Wu, J.: Multitask image
chromatic aberration. In: Proceedings of the 8th Workshop on splicing tampering detection based on attention mechanism.
Multimedia and Security, pp. 48–55 (2006) Mathematics 10(20), 3852 (2022)
18. Johnson, M.K., Farid, H.: Exposing digital forgeries by detecting 33. Ding, H., Chen, L., Tao, Q., Fu, Z., Dong, L., Cui, X.: Dcu-net: a
inconsistencies in lighting. In: Proceedings of the 7th Workshop dual-channel u-shaped network for image splicing forgery detec-
on Multimedia and Security, pp. 1–10 (2005) tion. Neural Computi. Applicat. 35(7), 5015–5031 (2023)
19. Johnson, M.K., Farid, H.: Metric measurements on a plane from 34. Hosny, K.M., Mortda, A.M., Lashin, N.A., Fouda, M.M.: A new
a single image (2006) method to detect splicing image forgery using convolutional neu-
20. Zanardelli, M., Guerrini, F., Leonardi, R., Adami, N.: Image ral network. Appl. Sci. 13(3), 1272 (2023)
forgery detection: a survey of recent deep-learning approaches. 35. Hu, J., Xue, R., Teng, G., Niu, S., Jin, D.: Image splicing manip-
Multimed. Tools Appl. 82(12), 17521–17566 (2023) ulation location by multi-scale dual-channel supervision. Mul-
21. Yao, H., Xu, M., Qiao, T., Wu, Y., Zheng, N.: Image forgery detec- timed. Tools Appl. 1–24 (2023)
tion and localization via a reliability fusion map. Sensors 20(22), 36. Muniappan, T., Abd Warif, N.B., Ismail, A., Abir, N.A.M.: An
6668 (2020) evaluation of convolutional neural network (cnn) model for copy-
22. Manu, V., Mehtre, B.: Tamper detection of social media images move and splicing forgery detection. Int. J. Intell. Syst. Appl. Eng.
using quality artifacts and texture features. Foren. Sci. Int. 295, 11(2), 730–740 (2023)
100–112 (2019) 37. Wu, Y., Wo, Y., Han, G.: Joint manipulation trace attention net-
23. Pawar, D., Gajpal, M.: Image forensic tool (ift): Image retrieval, work and adaptive fusion mechanism for image splicing forgery
tampering detection, and classification. Int. J. Digit. Crime Foren- localization. Multimed. Tools Appl. 81(27), 38757–38780 (2022)
sics (IJDCF) 13(6), 1–15 (2021) 38. Nath, S., Naskar, R.: Automated image splicing detection using
24. Yancey, R.E.: Deep localization of mixed image tampering tech- deep cnn-learned features and ann-based classifier. Signal Image
niques (2019). arXiv preprint arXiv:1904.08484 Video Process. 15, 1601–1608 (2021)
25. Diallo, B., Urruty, T., Bourdon, P., Fernandez-Maloigne, C.: 39. Ding, H., Chen, L., Tao, Q., Fu, Z., Dong, L., Cui, X.: Dcu-net: a
Improving robustness of image tampering detection for compres- dual-channel u-shaped network for image splicing forgery detec-
sion. In: MultiMedia Modeling: 25th International Conference, tion. Neural Comput. Appl. 35(7), 5015–5031
MMM 2019, Thessaloniki, Greece, January 8–11, 2019, Proceed- 40. Niyishaka, P., Bhagvati, C.: Image splicing detection technique
ings, Part I 25, pp. 387–398 (2019). Springer based on illumination-reflectance model and lbp. Multimed. Tools
26. Bevinamarad, P., Unki, P.H.: Robust image tampering detection Appl. 80, 2161–2175 (2021)
technique using k-nearest neighbors (knn) classifier. In: Innova- 41. Kanwal, N., Girdhar, A., Kaur, L., Bhullar, J.S.: Digital image
tions in Computational Intelligence and Computer Vision: Pro- splicing detection technique using optimal threshold based local
ceedings of ICICV 2021, pp. 211–220. Springer (2022) ternary pattern. Multimed. Tools Appl. 79(19–20), 12829–12846
27. Qazi, E.U.H., Zia, T., Almorjan, A.: Deep learning-based digital (2020)
image forgery detection system. Appl. Sci. 12(6), 2851 (2022) 42. El-Latif, E.I.A., Taha, A., Zayed, H.H.: A passive approach for
28. Xue, Y., Zhu, C., Tan, X.: Isd-ssd: image splicing detection by detecting image splicing using deep learning and haar wavelet
using modified single shot multibox detector. In: International transform. Int. J. Comput. Netw. Inform. Secur. 11(5), 28–35
Conference on Artificial Intelligence and Intelligent Information (2019)
Processing (AIIIP 2022), vol. 12456, pp. 569–575 (2022). SPIE
29. Alipour, N., Behrad, A.: Semantic segmentation of jpeg blocks Publisher's Note Springer Nature remains neutral with regard to
using a deep cnn for non-aligned jpeg forgery detection and locali- jurisdictional claims in published maps and institutional affiliations.
zation. Multimed. Tools Appl. 79(11–12), 8249–8265 (2020)
30. Chen, Y., Retraint, F., Qiao, T.: Image splicing forgery detection Springer Nature or its licensor (e.g. a society or other partner) holds
using simplified generalized noise model. Signal Process. Image exclusive rights to this article under a publishing agreement with the
Communicat. 107, 116785 (2022) author(s) or other rightsholder(s); author self-archiving of the accepted
31. Ali, S.S., Ganapathi, I.I., Vu, N.-S., Ali, S.D., Saxena, N., Werghi, manuscript version of this article is solely governed by the terms of
N.: Image forgery detection using deep learning by recompressing such publishing agreement and applicable law.
images. Electronics 11(3), 403 (2022)