0% found this document useful (0 votes)
21 views

Adaptive Feature Analysis in Target Detection and

Uploaded by

Ênio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Adaptive Feature Analysis in Target Detection and

Uploaded by

Ênio
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Hindawi

Mobile Information Systems


Volume 2022, Article ID 7140594, 14 pages
https://doi.org/10.1155/2022/7140594

Research Article
Adaptive Feature Analysis in Target Detection and Image Forensics
Based on the Dual-Flow Layer CNN Model

Nannan Liang ,1,2 Haifeng Xu,1 WanLi Zhang,1 and Lin Cui1
1
School of Informatics and Engineering, Suzhou University, Suzhou 234000, China
2
Key Laboratory of Mine Water Resource Utilization of Anhui Higher Education Institutes, Suzhou 234000, China

Correspondence should be addressed to Nannan Liang; [email protected]

Received 31 May 2022; Revised 26 July 2022; Accepted 9 August 2022; Published 28 August 2022

Academic Editor: Shadi Aljawarneh

Copyright © 2022 Nannan Liang et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
With the rapid development of artificial intelligence technology, image editing technology has evolved from relying on software
such as Photoshop and GIMP for manual modification to using artificial intelligence technology to achieve intelligent and
automated tampering of images. Editing, falsifying, and disseminating digital images have become simple and easy, leading to a
crisis of confidence in digital images and reducing their reliability as judicial evidence. Therefore, how to identify falsified images,
improve their trustworthiness, and avoid judicial injustice has become a problem that must be overcome in the information age. In
this paper, we propose a target detection and adaptive feature analysis in image forensics based on a dual-flow layer CNN model,
which can effectively perform image forensics. The results show that our algorithm has a clear theoretical basis, a small operational
complexity, and a high detection accuracy.

1. Introduction automated inspection, road traffic, aerospace, and other


fields.
People’s daily life is full of image information data. In the era Natural scenes have great complexity, and target de-
of image information flooding, it becomes more and more tection is affected by scene changes, natural lighting, weather
difficult to get effective and valuable image information from changes, object shapes, sizes, scales, and stacking methods,
it quickly when we are faced with so much image infor- making detection more difficult. Therefore, how to quickly
mation. Computer vision plays an important role in medical, and effectively separate targets and attenuate the effects of
security, transportation, aerospace, automation, control, and target scale, shape, size variation, and background com-
other fields. Humans get 91% of their information in cog- plexity has become a challenge in the direction of target
nition of the world from vision, so computer vision is the detection. The equipment for obtaining photos has gradually
basis of cognition of the world by machines and devices. As a diversified. Nowadays, professional cameras, ordinary
major vehicle for information dissemination, humans cameras, and smartphones with high-definition photo
generally focus on only part of the information in an image. functions are common in life, among which smartphones
In daily life, only certain people or objects are the focus of with high-definition photo functions are far more popular
attention. For example, the license plate number of a moving than other devices in the population with their low prices,
vehicle, the seat belt wearing of a driver, a suspicious portable size, and uncompromising photo functions.
pedestrian in a crowded place, the helmet wearing of a However, everything has both positive and negative
worker in a factory, and so on. It has become a popular aspects, and while photo images are easy to access and edit,
research direction to find out the main targets of people’s they also bring serious negative effects to life in the infor-
attention in images quickly and effectively and to label them mation age. Some people, for various purposes, maliciously
with the correct classification. Therefore, the research of tamper with and disseminate some carefully faked photo
target detection is important for robotics, video surveillance, images, reversing black and white and confusing people’s
2 Mobile Information Systems

minds, causing an uproar in society and subverting the basic photo images may lead to political turmoil, diplomatic
common sense that seeing is believing. discord and even military conflict, and other extremely
Digital photo image tampering forensic technology is a serious consequences.
kind of digital image forensic technology, which relies on
computer technology to judge whether the original content 2. Related Work
is still maintained in the process of transmission and
dissemination of photo images taken by digital cameras. If The traditional target detection methods are Viola-Jones,
we want to forensically examine the authenticity of HOG + SVM, and DPM. Among them, Viola-Jones uses
photo images, we should first understand the means of integral graph features and AdaBoost method. HOG + SVM
photo image tampering, and the more common method detects pedestrians as targets. It first extracts HOG
tampering methods include copy-paste of the same image, features from the candidate regions of the target and then
splicing of different images, image retouching, and image uses SVM classifier for classification decision. DPM is a
enhancement. variant of HOG feature detection, and DPM adds addi-
tional strategies. DPM method is the most effective and best
performing method among all traditional target
1.1. Copy-and-Paste Tampering with the Same Image. detection methods. Its advantages are the intuitive and
Paste a part of the image to other parts of the image as in simple method, block computing speed, and adaptation to
Figure 1. The original photo image is on the left, and the animal deformation. It has been verified by a large number
tampered image on the right is created by copying the lawn of scholars that its detection accuracy, generalization
in the figure and covering the person. ability, and detection speed are better than traditional
Heterogeneous image stitching tampering: a part of one methods.
image is stitched into another image by two or more images. The two-stage-based target detection model refers to the
The stitching tampering between different images has the extraction of features using convolutional neural network
following characteristics: (1) the tampering trace is not vi- (CNN) first, then the recommendation of candidate regions
sually noticeable; (2) some statistical characteristics of the using region candidates, and finally marking the target box
image are changed by the tampering behavior. As shown in location and classifying the marked target boxes. The most
Figure 2, the yellow flowers in the original image 2(a) are typical representative is the RCNN series of networks. The
stitched into the original image 2(b) to obtain the tampered one-stage target detection model is a regression model that
image 2(c). directly regresses the position of the target frame without
generating a candidate frame in the middle of the network
1.2. Image Retouching. It is an image restoration operation and directly converts the target frame location problem into
commonly used in artistic photos to make the people in the a regression problem. The most representative one is the
photos more beautiful; it is also commonly used after image yolo series network. The large number of prior frames in-
copy-paste tampering or splicing tampering to eliminate the creases the computation and memory usage. For targets with
edge traces of tampering. In the original image of extremely large aspect ratios in the scene, the method of
Figure 3(a), the human face has more obvious spots, while preset a priori frames is not only time-consuming but also
the face of the tampered Figure 3(b) becomes smooth and prone to false detection problems. Different data sets require
more beautiful after retouching. different target detection models, so different a priori frames
need to be set, resulting in reduced model generalization
capability.
1.3. Image Enhancement. An operation that blurs or high- Photo image tampering forensics is emerged in the last
lights information somewhere in an image. This type of decade. Despite the short development time, photo image
tampering technique is usually achieved by changing the hue tampering forensic technology has gained great progress
or contrast of a certain part of the image. Figure 4(a) blurs a with the continuous development and improvement of
large amount of detailed information by adjusting the image processing, pattern recognition and artificial intelli-
contrast and hue, so that the original image is tampered with, gence and other related theories, and the continuous dis-
creating a tampered Figure 4(b). covery of relevant experts and scholars’ research. According
Incidents of photo tampering such as the one mentioned to the current research theoretical results, the photo image
above have emerged, seriously affecting the public’s correct tampering forensic process is briefly summarized, as shown
judgment of things. In the present case, the negative impact in Figure 5.
of photo tampering and the crisis of confidence caused by it Since the tampered part of the tampered photo image
are worrying. If doctored photos are used in news reports, differs from the untampered real part in certain types of
they may distort the facts and mislead the public, which may features, such features can be extracted from each part of the
intensify social conflicts; if doctored photos are used as photo image to be tested during forensics, and then the
evidence in court, they may lead to false cases, obstruct extracted features can be classified to arrive at the verdict of
justice, and allow criminals who should be punished to authenticity of the photo image to be tested. According to
escape from the net of justice; if doctored photos are used in the different features extracted, this paper divides the photo
insurance claims, they may cause unnecessary economic image tampering forensics into two categories: tampering
disputes; in international relations, the use of doctored forensics based on image content features and tampering
Mobile Information Systems 3

(a) (b)

Figure 1: Copy-paste forgery within same image. (a) Original image. (b) Tampered image.

(a) (b) (c)

Figure 2: Splice forgery between different images. (a) Original Figure 1. (b) Tampered figure. (c) Original figure.

(a) (b)

Figure 3: Image blur forgery. (a) Original image. (b) Tampered image.

forensics based on imaging features, which are briefly in- will be destroyed, and the researcher can make a decision on
troduced below. the authenticity of the image by detecting the changes of
these content features.

2.1. Tampering Forensic Techniques Based on Content Features


of Photo Images. The consistency of the content features 2.1.1. Forensic Techniques Based on Natural Statistical
(such as natural statistical characteristics, key points, Features. The natural statistical properties such as mean,
lighting direction, and texture) of tampered photo images variance, histogram, and higher order statistics of images in
4 Mobile Information Systems

(a) (b)

Figure 4: Image enhancement tamper. (a) Original image. (b) Tampered image.

Real

Photo images Extraction of Feature


to be tested features classification

Tampering

Figure 5: The process of photo image tamper forensics.

the null and transform domains are some basic features of detect the stitched tampered images, respectively, to achieve
images and an important means to study the essential the localization of the tampered regions. To further improve
properties of images. the forensic effect, [6] suggested extracting LBP features in
The literature [1] proposes a copy-paste forensic algo- the DCT domain for detection. Considering that the tam-
rithm based on DCT coefficients, which adopts a sliding pered images may be contaminated by Gaussian blur fil-
window chunking strategy for the image to be tested, then tering and Gaussian white noise, the DCT algorithm is
calculates the DCT coefficients of each image block, quan- improved in the literature [7].
tifies the obtained DCT coefficients to construct feature In [8], wavelet transform-based image forensic algo-
vectors, and then performs dictionary sorting on all feature rithms are proposed to extract features for matching de-
vectors. If there are similar or identical image blocks in the tection of subband information of the wavelet transform of
image to be tested, the positions of their corresponding the image to be detected. For example, the wavelet de-
feature vectors will be closer, and the similar blocks in the composition has two subbands of low and high frequencies,
image can be identified by calculating the displacement and the copy-paste block is detected by comparing the
vector to achieve the purpose of tampering detection. On correlation of the Zernike moments at the corresponding
this basis, the DCT quantization coefficients are dimen- positions of the two subbands in blocks. The detection of the
sionalized in the literature [2]. [3] made an improvement on tampered region by comparing the similarity on the high-
the chunking strategy by using a circular chunking method, frequency subband after wavelet transform as proposed in
followed by the construction of DCT coefficient feature literature [9]. [10] proposed to extract LBP features in the
vectors. The literature [4] proposed to construct feature low-frequency subband to identify tampering.
vectors by calculating the difference matrix of the DCT
coefficient matrix of an image and then detect them using
SVM, and subsequently the improved method achieved an 2.1.2. Key Point-Based Forensic Techniques. For same-image
average recognition rate of 97.92% and 91.2% on the stitched tampering, there are two or more identical or similar regions
image libraries of CASIAv1.0 and CASIAv2.0; however, such in an image, and key points are extracted for the whole
methods did not achieve tampering localization. The liter- image. Since the key point characteristics of the identical or
ature [5] argues that the tampered images are compressed similar regions are closer, the tampered region can be lo-
and saved again causing changes in the DCT coefficients, cated by correlation matching of all key points. Based on this
whereby the histogram difference of the DCT coefficients principle, a detection method based on Harris point de-
and the double quantized mapping relationship are used to tection is proposed in the literature [11], which has better
Mobile Information Systems 5

robustness to posttampering compression. The literature proposes an LBP-based texture feature description method
[12] uses Harris points combined with the mean value of the with some robustness.
circular neighborhood as feature points, which can solve the
copy-and-paste operation of the visual structure plane re-
gion. The literature [13] extracts image feature points with 2.2. Tampering Forensic Techniques Based on Imaging
Harris operator and uses a new forensic feature matching Features. The general imaging model of a digital camera is
method to improve detection accuracy and efficiency. shown in Figure 6. First, an optical filter to filter the color
The literature [14] proposed the SIFT feature points of light other than red, blue, and green, after which the color
the image to be tested are extracted, the feature points are information of each position is recorded by the color filter
matched using the G2NN matching criterion, and then the array, and then the light signal is converted into an electrical
key points on the match are clustered to determine the signal through the sensor, and then the CFA (Color Filter
copied and pasted regions, but the detection effect is more Array) interpolation algorithm is applied to each. At this
dependent on the clustering results. The literature [15] time, the signal is then processed by a series of digital image
suggests J-linkage clustering, but the algorithm is not ac- processing techniques such as white balance and gamma
curate in locating pasted blocks after rotating and scaling correction, and then compressed according to certain rules
operations, and the detection efficiency is not ideal. To to obtain the final digital photo image.
further improve the robustness of the algorithm, it is pro- The analysis of the camera imaging model shows that in
posed to extract SIFT features using e measured after wavelet the process of digital photo image generation, after a series of
transform to reduce noise interference. hardware processing and software operations, some imaging
The SURF key points are proposed, and it is suggested features such as CFA interpolation noise, pattern noise, and
that SURF is extracted, so the localization of the pasted compression noise are inevitably introduced. Due to the use
blocks is not ideal; to overcome this drawback, a combi- of different hardware and software processing methods, it
nation of SURF algorithm and SIFT algorithm is used to makes the photo images taken by different brands and
extract key points to achieve precise localization while models of cameras, only have the imaging features of that
improving the efficiency of the algorithm. The literature [16] camera, so such features can be used for forensics to detect
combines both SURF and HOG features, and the experi- tampered and forged photo images.
mental results are significant.
2.2.1. Tampering Forensic Techniques Based on CFA Inter-
2.1.3. Forensic Techniques Based on Light Consistency polation Noise. Current photo images usually use a single
Features. In photographs, there is generally a relatively fixed sensor in the generation process, and each pixel point can
illumination environment (e.g., sun, interior lighting), only record one color information, and the other two-color
which makes the illumination intensity and direction con- information are obtained by interpolating the surrounding
sistent in the photograph. For stitched blocks from other pixels, which leads to the correlation between neighboring
photographs, the illumination will not be consistent with the pixels will exist. Since different cameras may not use the
illumination of the real region in the tampered image. Farid’s interpolation method, the correlation between pixels will be
team proposes an image recognition model under 3D light different and can be used to detect tampering. The literature
sources, which uses a spherical harmonic model to estimate [19] locates splicing tampering by re-CFA interpolation of
the direction of the light sources of objects in the photo- images to reconstruct their pixel neighborhood consistency,
graphs and then detects them based on the consistency of the detects whether there is splicing tampering operation by
direction. Using the detection of the consistency of the analyzing the distribution of color difference images at high
direction of the shadow region caused by the light with the frequencies, extracts CFA features using Gaussian filtering
direction of the light, it is robust to multiple tampered based on posterior probability estimation of CFA interpo-
targets. Since the above method is subject to relatively strict lation noise, and achieves forensic purposes by classifying
assumptions that limit its practical applicability, it has been the features. The literature [20] considers the spectral cor-
experimentally shown to improve the light direction esti- relation introduced by CFA interpolation and identifies the
mation error and is more applicable [17]. image authenticity based on this property.

2.1.4. Forensic Techniques Based on Texture Features. 2.2.2. Tampering Forensics Based on Camera Response
Texture is an important feature to describe and distinguish Function. The process of generating photos of natural
different objects. Given that it is difficult to keep the texture scenes through a series of hardware and software operations
features of the tampered block consistent with the original inside the camera can be called Camera Response Function
image, which inevitably destroys the periodicity, direc- (CRF). Each camera is an independent individual and its
tionality, and randomness of the original image texture, it corresponding function is not the same, so the authenticity
provides another possible method for stitching tampering of the image can be identified by comparing the consistency
forensics. By dictionary sorting, the Tamura texture features of CRFs in each region of the image. The literature [21]
of each image block and then calculating the feature simi- estimated the CRF of each region by the geometric in-
larity based on the Euclidean distance, the forged image variance of the pixels in each region of the image and then
regions can be detected and located. The literature [18] used crossover for statistical classification, achieving a
6 Mobile Information Systems

Data Compression
CFA Interpolation

gamma correction
and other post-
White balance,
Optical Filters

Digital photo
processing
Footage

Sensor

images
Scene
Color Filter Array, CFA
Figure 6: Digital camera imaging model.

detection rate of 87% for stitched images. Based on this, the In addition to the above features, Markov features,
differential invariants of the images were calculated to es- Fourier-Mellin transform features, image quality features,
timate the CRF. The literature [22] used a maximum pos- and color features are often used for photo image tampering
terior probability model to estimate the normality of the detection。
CFR to discriminate the authenticity of the photographs.
3. Methods
2.2.3. Tampering Forensic Techniques Based on Compression Figure 7 presents a generalized framework for digital image
Characteristics. Photo images are usually saved in a certain source forensics under the CNN model theory. In the image
compression format during the generation process, and preprocessing, the image to be detected is first cut into image
images are usually compressed one or more times again after blocks (Pk in Figure 8(a) indicates the kth image block), and
tampering, so the differences of individual image blocks after then the image fingerprint characterizing the source of the
compression are detected to identify tampering. In the lit- shot is extracted using CNN, and the detection result Yk of
erature [23], an iterative method is proposed to estimate the each image block is output (Yk in Figure 8(c) indicates the
original quantization table of an image to determine the feature extractor predicts the label for the k th image block),
approximate tampered region, and then the estimated and the majority voting algorithm is used to fuse the de-
original quantization table is used to perform another JPEG tection results of the kth image block and output the image-
compression on the tampered region to precisely locate the level prediction results, i.e., device model multiclassification
tampering according to the difference in pixel values before identification.
and after compression. In the literature [24], the similarity of It was found that the FPN feature fusion algorithm
the synthetic images before and after compression is ob- improved the detection of small targets but did not improve
tained by estimating the quantization factor to identify the the detection of large targets, and there was information
location of tampering. Compressing the image again pro- redundancy after feature fusion. Since then, researchers have
duces a double quantization effect on the DCT coefficients of proposed some variants of FPN, such as PANet, Libra-
the real region, whereby tampering is suggested to be RCNN, which are built on the assumption that the weights
identified based on the change in the DCT coefficients of are the same when the features of two layers are fused,
different regions before and after compression. The litera- ignoring the feature that the contribution values of features
ture [25] argues that images produce uniform quantization in different layers are different. Therefore, this section
noise after JPEG compression, and tampered blocks corrupt proposes a new feature pyramid named dual-stream feature
this property and propose a quantization noise estimation CNN (DS-CNN) using autonomous learning weights and
model to detect the differences between image blocks. jump connection method.
Considering that JPEG compression produces a grid effect, As shown in Figure 8, FPN is a simple top-down one-way
the presence of unaligned grids in the image can be detected information, while PANet adds bottom-up information flow
to identify the tampered locations. to FPN to enhance higher-level semantic information for
semantic segmentation, which is better than FPN but more
computationally intensive. LibraRCNN collects feature in-
2.2.4. Pattern Noise-Based Tampering Forensic Technique. formation from each layer and then refines the output to the
Pattern noise is caused by the imperfection of the camera feature layer, with the idea of fusion before segmentation.
sensor and the inconsistency of the materials used, resulting NASFPN adopts the idea of AutoML and uses search for
in the imperfect conversion of light signals into electrical feature fusion; ASFF learns the weight contribution value of
signals, and is stable in every picture taken by the camera. each layer, but it is a fully connected method with high
Since the sensor of each camera is unique, its mode noise is computation and requires higher performance computing
also unique; in addition, each pixel point on the sensor is equipment, which is not convenient for practical applications.
different, resulting in inconsistent performance of mode As shown in Figure 9, BiFPN is a feature fusion algo-
noise in each pixel point. Based on these two characteristics, rithm proposed in the Efficient Det network, in which a
the pattern noise can be regarded as a camera fingerprint and jump connection approach and a weighted fusion approach
applied to photo image tampering forensics, which can be are used for feature fusion, taking into account both effi-
generalized to a variety of tampering operations such as copy- ciency and accuracy. Its calculation equations are shown in
paste of the same image and stitching of different images. (1) and (2).
Mobile Information Systems 7

P1 X1
Y1

P2 X2 Y2

Majority vote
Y3 Tags L
P3 X3
...
Yk
...
X4
Pk

(a) (b) (c)

Figure 7: Digital image source framework based on CNN; (a) image preprocessing; (b) image feature extraction; (c) classification result
voting.

P7 P7 P7 P7
P7

P6 P6 P6 P6
P6

P5 P5 P5 P5
P5

P4 P4 P4 P4
P4

P3 P3 P3 P3
P3

(a) (b) (c) (d) (e)

Figure 8: Feature network design diagram; (a) FPN; (b) PANet; (c) LibraRCNN; (d) NAS-FPN; (e) ASFF.

Pout Resize denotes up sampling, and nearest neighbor in-


7
terpolation is used here, and ε is set to 0.0001 to prevent
P7 division by zero.
Each feature layer in BiFPN has different weights, which
P6in P6td P6out are theoretically normalized to 1 when fused to the same
P6 layer, but how to normalize the weights?
(1) Unbounded strategy.
P5out
P5 O � 􏽘 wi · Ii . (3)
i

P4 (2) Based on softmax.


ewi
O�􏽘 ·I. (4)
P3 i 􏽐j ewj i

(3) Fast regularity


Figure 9: BiFPN basic structure. wi
O�􏽘 ·I. (5)
i
ε + 􏽐j wj i
w1 ∗ Pin out
6 + w2 ∗ Resize􏼐P7 􏼑
Ptd
6

� conv⎛ ⎠,
⎞ (1) It is found that the BiFPN algorithm with fast regula-
w1 + w2 + ε
rization can eliminate the exponential operation in the
softmax method, reduce the computational overhead, and
w1′ ∗ Pin ′ td
′ out
6 + w2 ∗ P6 + w3Resize􏼐P5 􏼑 speed up the computation although the accuracy is slightly
Pout
6

� conv⎛ ⎠. (2)

lower than the softmax method, but the performance is the
w1′ + w2′ + w3′ + ε
best among the three strategies.
8 Mobile Information Systems

We propose a new dual-stream feature CNN (DS-CNN), C3 , C4 , C5 layers is first converted to 256 dimensions and
for the FCOS structure of fused area-ness. It is mainly then input to A-DFN for feature fusion. The adaptive feature
improved based on the BiFPN algorithm, following its jump fusion is performed to P5 for layer C3 , C4 , C5 , where the
connection and weighted fusion, while improving the FCOS weights are normalized by Soft max, in order to provide
model structure based on the fused area-ness in Chapter 3. more information for the subsequent feature fusion without
The general structure is shown in Figure 10. sacrificing accuracy. And the subsequent weight normali-
As shown in Figure 10, the network is obtained from zation used for weighted feature fusion from layer
ResNeXt101 to obtain C3 , C4 , C5 layers, and then it is P3 , P4 , P5 , P6 , P7 is fast regular, which aims to make the
convolved by 1 ∗ 1 for dimensionality reduction to obtain a network run better by slightly sacrificing accuracy while
256-dimensional P3 , P4 , P5 feature map, which is convenient ensuring performance. Head layer is divided into a shared
for feature fusion. P6 , P7 is the feature map obtained after part and a branch part, and since area-ness is more closely
down sampling P5 , P6 separately. related to location, it is divided into the same branch as
The dual-stream feature CNN (DS-CNN) can enhance regression to help the model perform better target box
the semantic information of each prediction layer though. position regression.
However, the analysis of information inflow from each node
shows that the information inflow and outflow at P5 is 4. Experiments
unbalanced. From Figure 10, it can be seen that layer P5 has
only one information input in layer C5 , but three infor- The COCO dataset, known as Microsoft Common Objects in
mation outflows (the arrows can indicate the information Context, is a dataset acquired by the Microsoft team to
inflow and outflow of P5 ). Whether sufficient information perform target recognition, target segmentation, and target
can be obtained has an important impact on the subsequent detection competitions. The schematic diagram of the
feature fusion of node P6 , P7 . Therefore, in this thesis, the COCO dataset is shown in Figure 13. COCO dataset is
information of node P5 is enhanced based on the dual- divided into 2014 version and 2017 version. The current
stream pyramid, and the information of layer C3 , C4 is also version used in this thesis is the 2017 version, which contains
fused directly to layer P5 . However, layer C3 , C4 , C5 has 80 target categories, 118287 training sets, totaling 19.3 G,
different contribution values to P5 feature layers, so it is 5000 validation sets, totaling 1814.7 M, so the 2017 version
necessary to learn different weights for layer C3 , C4 , C5 first COCO dataset has 123287 sheets.
before feature fusion. In this paper, we name this feature From Figure 14, it can be seen that the COCO dataset has
fusion method as Adaptive Dual Streaming Feature CNN more categories than the PASCALVOC dataset, and the
(A-DS-CNN), and the specific structure is shown in number of instances corresponding to each category is also
Figure 11. higher. Therefore, the COCO dataset is more difficult to
As in Figure 11, layer C3 , C4 will increase the infor- detect the target and can better represent the performance of
mation inflow in layer P7 by means of adaptive feature the target detection model.
fusion. The adaptive weights are calculated as in -) (6): As shown in Figure 15, the area of most targets in the
COCO dataset is only about 6% of the image size; 41.43% of
yij � conv􏼐α3ij ∗ x3ij + β4ij ∗ x4ij + c5ij ∗ x5ij 􏼑, (6) all targets appearing in the COCO training dataset are small
targets, 34.4% are medium targets, and 24.2% are large
λ3αij
e targets. By analyzing the COCO data, it is found that small
α3ij � λ4
, and medium targets account for a larger proportion, so this
βij λ5c
λ3αij +e +e ij
e dataset is more concerned with the detection of small and
λ4β medium targets.
e ij
β4ij � λ4
, (7) We mainly detect the coco dataset and divide it. During
λ5c
λ3αij +e
βij
+e ij training, the image data are first preprocessed, resize the
e image to match the target detection model size, and then
λ5cij
e input to the target detection model, train the appropriate
c5ij � λ4
, number of iterations, and get the final detection results.
βij λ5c
λ3αij +e +e ij
e Finally, the test-test-dev data results are submitted to the
coco Detection Challenge competition to obtain AP values
where α3ij , β4ij , c5ij denotes the weight of C3 , C4 , C5 , respec- and AR values.
tively, i, j denotes the location coordinates of the feature In order to better compare the anchor-base and anchor-
map, x3ij , x4ij , x5ij denotes the value at location (i, j) in layer free methods, this thesis is based on a unified benchmark,
C3 , C4 , C5 , and yij denotes the final output to the value at i.e., the COCO dataset and uses a unified evaluation crite-
location (i, j) in P5 . rion: MAP values (MAP values are equivalent to AP values
The network structure of the final target detection al- in coco data), and compares their MAP values for large,
gorithm in this thesis is shown in Figure 12. medium, small targets and different IOU thresholds. The
To improve the basic convolutional network, the MAP values are compared for large, medium, small targets
Backbone layer adopts the ResNeXt101 structure and uses 64 and different IOU thresholds.
paths with each path width of 4 to reduce the computational According to Table 1, CenterNet511 and CornerNet511
effort. For better feature extraction, the Channel of take longer to test one image under the same conditions,
Mobile Information Systems 9

P7

P6

P5
C5

P4
C4

P3
C3

Figure 10: Dual-stream feature CNN.

P7

P6

P5
C5

P4
C4

P3
C3

Figure 11: Adaptive dual-stream feature CNN.

P7 ... ... classification


head Y7

P6 head Y6 regression
... ...
P5
C5 head Y5 area-ness
Shared Branches
Sections
P4
C4 head Y4

P3
C3 head Y3 head

backbone

Input image

Figure 12: Schematic diagram of the overall network structure.

indicating that CenterNet511 predicts slower because CenterNet511 and CornerNet511 use the hourglass network
CenterNet511 predicts one more centroid than Corner- as the backbone layer, which is computationally intensive
Net511, which brings more computation. Both and has a slow computing speed. In contrast, the ResNeXt-
10 Mobile Information Systems

Figure 13: Schematic diagram of coco target detection dataset.

Instances per category


1,000,000

100,000

10,000

1,000

100
motorcycle

stop sign

baseball glove
person
bicycle
car
airplane
bus
train
boat
traffic light
fire hydrant
street sign
parking meter
bench
bird
cat
dog
horse
sheep
elephant
bear
zebra
giraffe
hat
umbrella
shoe
eye glasses
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
skateboard
surfboard
tennis racket
bottle
plate
cup
Knife
spoon
bowl
banana
apple
sandwich

donut
cake
couch
potted plant
bed

blender

vase
scissors
wine glass

orange
broccoli
carrot
hot dog
pizza

chair

mirror
dining table

toilet
door
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
refrigerator

teddy bear
hair drier
toothbrush
hair brush
cow

window
fork

tv
truck

backpack

sink
desk

book
clock
COCO
PASCAL VOC

Figure 14: Comparison of the number of instances between the COCO dataset and the PASCAL VOC dataset.

35 Table 2: FCOS prediction accuracy values using different


backbone.
30
Method Backbone AP AP50 AP75 APs APm APl
Percent of instances

25
FCOS w/
ResNet-101 41.6 60.6 45.1 24.3 44.9 51.5
20 FPN
FCOS w/ ResNeXt-
15 42.6 62.3 46.2 26.1 45.5 52.5
FPN 32x8d-101
10 FCOS w/ ResNeXt-
44.8 64.1 48.5 27.5 47.4 55.7
FPN 64x4d-101
5
(Note: 32 ∗ 8d means 32 paths, the width of each path is 8).
0
20 40 60 80 100
Percent of image size
101 prediction used by FCOS is fast and has a better per-
COCO SUN
PASCAL VOC
formance in terms of average accuracy mean, so this thesis
ImageNet
uses the FCOS algorithm as the base algorithm for
Figure 15: Example image size and percentage of the COCO, improvement.
PASCAL VOC, ImageNet, and SUN datasets. According to Table 2, the MAP value can reach 44.8
when FCOS adopts ResNeXt-64x4d-101-FPN (i.e., 64 paths,
each with a width of 4) as the backbone, which is 3 and 2
Table 1: Comparison of FCOS and CenterNet511, CornerNet511 points higher than that of ResNet-101 and ResNeXt-32x8d-
prediction speed.
101, respectively. Therefore, ResNeXt-64x4d-101 is used as
Method Backbone Testing time/image the backbone of the target detection model in this thesis.
CenterNet511 Hourglass52 270 ms According to Table 3, the mean accuracy of target de-
CenterNet511 Hourglass104 300 ms tection using center-ness is 0.2 points lower than the area-
CenterNet511 Hourglass104 340 ms ness designed in this paper when the feature fusion method
FCOS ResNeXt-101 112 ms is FPN. In the DS-CNN fusion method, the area-ness is 0.4
(Note: 511 means the input image size is 511 ∗ 511). points higher than the MAP value of center ness. It means
Mobile Information Systems 11

Table 3: MAP values of area-ness and center-ness on FCOS algorithm.


Method Centrality method Feature fusion method AP AP50 AP75 APs APm APl
FCOS Center-ness FPN 44.8 64.0 48.5 27.5 47.6 55.7
FCOS Area-ness FPN 44.8 64.4 48.6 27.4 47.5 55.8
FCOS Center-ness DS-CNN 46.2 65.7 49.9 26.8 49.7 58.4
FCOS Area-ness DS-CNN 46.6 66.1 50.3 27.3 49.5 59.2

Figure 16: Visualization results of partial object detection by the algorithm proposed in this thesis.

Table 4: Comparison of detection results.


Algorithm TPR/% FAR/% Average time to detect an image
τ � 0.006 88.5 1.903
τ � 0.001 95.7 4.942
Traditional pixel-by-pixel sliding window fixed threshold algorithm 727.5 s
τ � 0.014 97.7 9.029
τ � 0.03 98.9 15.37
The proposed algorithm 98.9 1.896 26.75 s

that the area-ness designed in this paper is better than the The methods based on fixed thresholds will have dif-
center-ness of the original FCOS. ferent detection results at different thresholds, and four
As shown in Figure 16, the algorithm of this thesis is able more desirable thresholds of 0.006, 0.01, 0.014, and 0.03 were
to accurately detect even a compact orange placed in the fruit selected for comparison through experiments. In order to be
tray or an empty water bottle placed by the bed, indicating able to evaluate the detection results objectively, the pattern
that the prediction layer is able to acquire a sufficient noise in both types of algorithms is obtained by wavelet noise
number of features. When the target frames of people and reduction and then processed with ZM + WF. In the cal-
Frisbees overlap, the algorithm of this thesis is still able to culation of TPR and FAR, if the number of pixels of a certain
predict them each. image tampering localization result is less than 20, it is
The authenticity detection results for each test image can judged to be a real image and vice versa; it is considered as
be divided into two categories: tampered and true. To tampering. The detection results of the two algorithms are
evaluate the performance, shown in Table 4.
TN The proposed adaptive thresholding algorithm has a
TPR � , TPR of 98.9% and FAR of 1.896% for 1000 images to be
FP + TN
(8) tested, while the fixed thresholding algorithm has different
FN detection results at different threshold values. It is 0.01,
FAR � . 0.014, and 0.03, and although the TPR is similar or equal to
FN + TN
it, the FAR is much higher than this paper. At 0.006, the
Authenticity detection results: tampering detection ex- FAR is similar to the algorithm in this paper, but the TPR
periments are performed on 500 real images and 500 is much lower than that in this paper. Meanwhile, the
tampered images from the image library given in Table 4 average detection time of the two algorithms on 1000
using the traditional fixed-threshold sliding window method images is given in Table 4, and the comparison results
based on correlation coefficients and the proposed SPCE- show that the proposed algorithm effectively reduces false
based adaptive threshold nonoverlapping chunk match- alarms while maintaining a high detection rate and de-
ing + ZNCC algorithm, respectively. tection efficiency.
12 Mobile Information Systems

(d) (c) (b) (a)


Figure 17: The location results of tampered images with different texture. (a) Original image. (b) Tampered image. (c) Tampered position.
(d) Positioning effect.

Table 5: The detection accuracy of Seam-carving tampering images.


Tampering
P-value ratio selection
Untampered image (%) 0.9 (%) 0.8 (%) 0.7 (%) 0.6 (%)
G.R. sheng 67.86 64.27 85.72 83.92 87.50
0.9 67.84 78.58 89.28 85.72 87.50
0.8 69.65 76.77 87.51 89.28 91.08
0.7 71.43 78.56 94.63 85.72 89.28
(Note: G.R. Sheng is the detection result based on the extended Markov feature, the rest are the detection results when different thresholds are selected by the
adaptive detection feature extraction method, the results in the table are all correct results).

Table 6: The comparison between the proposed method and G.R. Sheng’s method.
Tampering
P-value ratio selection
Untampered image (%) 0.9 (%) 0.8 (%) 0.7 (%) 0.6 (%)
G.R. sheng 67.86 64.27 85.72 83.92 87.49
Algorithm of this paper 69.25 77.98 90.488 86.89 89.31
(Note: The detection result of this algorithm is the average correct rate using three thresholds of 0.7, 0.8 and 0.9).

Compared with the traditional fixed threshold judgment second local block of wall image is k ∈ [0.3288, 0.4372]; the
method, the adaptive threshold judgment method, which is texture complexity of the third local block of floor image is
based on the texture complexity of the image block to be k ∈ [0.3511, 0.5296]; the texture complexity of the fourth
tested, selects a suitable threshold value, thus realizing local block of green grass image is k ∈ [0.6601, 0.8442]; and
“specific problem specific analysis.” the texture complexity of the fifth local block of dead grass
In Figure 17, the first to fourth columns show the image is k ∈ [0.6927, 0.9463].
original image, the tampered image, and the tampered lo- Observing the localization results of the proposed
cation, respectively. The second column gives five tampered adaptive thresholding algorithm for five tampered inspec-
images with simple to complex texture complexity, where tion images shows that whether the texture of the tampered
the texture complexity of the first local block of blue sky image is simple or complex, which effectively eliminates the
image is k ∈ [0.1857, 0.2886]; the texture complexity of the influence of texture on forensics (see Tables 5, 6).
Mobile Information Systems 13

5. Conclusion Journal of Humanoid Robotics, vol. 16, no. 04, Article ID


1941004, 2019.
In the information age, photography is one of the most [4] A. R. Alqahtani, A. Badry, S. A. Amer, F. M. A. Al Galil,
important means to ensure public access to information, but M. A. Ahmed, and Z. S Amr, “Intraspecific molecular vari-
the continuous tampering of photos forces people to ation among Androctonus crassicauda (Olivier, 1807) pop-
reexamine. As Professor Hani Farid of Dartmouth said, “we ulations collected from different regions in Saudi Arabia,”
live in a place where we no longer believe what we hear. As Journal of King Saud University Science, vol. 34, no. 4, Article
ID 101998, 2022.
an important information carrier, digital photos should have
[5] Di Wu, Y. Lei, M. He, C. Zhang, and Li Ji, “Deep rein-
broad application space. Justice and safeguarding justice are forcement learning-based path control and optimization for
indisputable. This paper proposes an adaptive characteristic unmanned ships,” Wireless Communications and Mobile
analysis method based on two-layer CNN model, which Computing, vol. 2022, pp. 1–8, Article ID 7135043, 2022.
effectively solves this problem and is of great significance to [6] R. Ali, M. H. Siddiqi, and S. Lee, “Rough set-based approaches
the research in this field. for discretization: a compact review,” Artificial Intelligence
Review, vol. 44, no. 2, pp. 235–263, 2015.
Data Availability [7] M. Afrasiabi, H. Khotanlou, and T. Gevers, “Spatial-temporal
dual-actor CNN for human interaction prediction in video,”
The experimental data used to support the findings of this Multimedia Tools and Applications, vol. 79, no. 27-28,
study are available from the corresponding author upon pp. 20019–20038, 2020.
[8] G. Cai, Y. Fang, J. Wen, S. Mumtaz, Y. Song, and V. Frascolla,
request.
“Multi-carrier $M$-ary DCSK system with code index
modulation: an efficient solution for chaotic communica-
Conflicts of Interest tions,” IEEE Journal of Selected Topics in Signal Processing,
vol. 13, no. 6, pp. 1375–1386, Oct, 2019.
The authors declared that they have no conflicts of interest [9] I. Castillo Camacho and K. Wang, “A comprehensive review
regarding this work. of deep-learning-based methods for image forensics,” Journal
of Imaging, vol. 7, no. 4, p. 69, 2021.
Acknowledgments [10] K. Chandra, A. S. Marcano, S. Mumtaz, R. V. Prasad, and
H. L. Christiansen, “Unveiling capacity gains in ultradense
This work was supported in part by the Key Laboratory of networks: using mm-wave NOMA,” IEEE Vehicular Tech-
Mine Water Resource Utilization of Anhui Higher Educa- nology Magazine, vol. 13, no. 2, pp. 75–83, June 2018.
tion Institutes, Suzhou University (Grant no. [11] X. Liao, K. Li, X. Zhu, and K. J. R. Liu, “Robust detection of
image operator chain with two-stream convolutional neural
KMWRU202107), in part by the Outstanding Youth Talents
network,” IEEE Journal of Selected Topics in Signal Processing,
in Anhui Provincial Education Department (Grant vol. 14, no. 5, pp. 955–968, 2020.
no.2019gxbjZD43), in part by the Open Research Fund of [12] F. B. Saghezchi, A. Radwan, J. Rodriguez, and T. Dagiuklas,
National Engineering Research Center for Agro-Ecological “Coalition formation game toward green mobile terminals in
Big Data Analysis and Application, Anhui University (Grant heterogeneous wireless networks,” IEEE Wireless Commu-
no.AE202201),in part by the Open Foundation of the nications, vol. 20, no. 5, pp. 85–91, 2013.
Anhui Key Laboratory of Intelligent Building and Building [13] B. Chen, W. Tan, G. Coatrieux, Y. Zheng, and Y. Q. Shi, “A
Energy Conservation (Grant no.IBES2020KF03), in part by serial image copy-move forgery localization scheme with
the Key Research and Development Project of Anhui source/target distinguishment,” IEEE Transactions on Mul-
Province in China (Grant no. 202004b11020023), in part by timedia, vol. 23, pp. 3506–3517, 2021.
the Academic support project for top-notch talents in dis- [14] S. Palanisamy, B. Thangaraju, O. I. Khalaf, Y. Alotaibi,
ciplines (majors) of Colleges and Universities at Anhui S. Alghamdi, and F. Alassery, “A novel approach of design and
analysis of a hexagonal fractal antenna array (HFAA) for next-
Province in China (Grant no. gxbjZD21081), and in part by
generation wireless communication,” Energies, vol. 14, no. 19,
the school-level key disciplines of computer science and p. 6204, 2021.
technology at Suzhou University in China (Grant no. [15] B. Bayar and M. C. Stamm, “Constrained convolutional
2019xjzdxk1) and in part by the Collaborative Innovation neural networks: a new approach towards general purpose
Center—cloud. computing industry (Grant no. 4199106). image manipulation detection,” IEEE Transactions on Infor-
mation Forensics and Security, vol. 13, no. 11, pp. 2691–2706,
References 2018.
[16] O. Mayer and M. C. Stamm, “Forensic similarity for digital
[1] A. M. Al-Azab, A. A. Zaituon, K. M. Al-Ghamdi, and images,” IEEE Transactions on Information Forensics and
F. M. A. Al-Galil, “Surveillance of dengue fever vector Aedes Security, vol. 15, pp. 1331–1346, 2020.
aegypti in different areas in Jeddah city Saudi Arabia,” Ad- [17] A. Abd, A. Fahd Mohammed, and S. P. Zambare, “New
vances in Animal and Veterinary Sciences, vol. 10, no. 2, species of flesh fly (Diptera: sarcophagidae) Sarcophaga
pp. 348–353, 2022. (Liosarcophaga) geetai in India,” J Entomol Zool Stud, vol. 4,
[2] L. Cai, Q. Sun, T. Xu, Y. Ma, and Z. Chen, “Multi-AUV no. 3, pp. 314–318, 2016.
collaborative target recognition based on transfer-reinforce- [18] S. Nagi Alsubari, S. N Deshmukh, A. Abdullah Alqarni et al.,
ment learning,” IEEE Access, vol. 8, pp. 39273–39284, 2020. “Data analytics for the identification of fake reviews using
[3] R. Tong, Y. Zhang, H. Chen, and H. Liu, “Learn the temporal- supervised learning,” Computers, Materials & Continua,
spatial feature of sEMG via dual-flow network,” International vol. 70, no. 2, pp. 3189–3204, 2022.
14 Mobile Information Systems

[19] D. Bhardwaj and V. Pankajakshan, “A JPEG blocking artifact


detector for image forensics,” Signal Processing: Image
Communication, vol. 68, pp. 155–161, 2018.
[20] W. Zhang, Q. Li, Q. M. J. Wu, Y. Yang, and M. Li, “A novel
ship target detection algorithm based on error self-adjustment
extreme learning machine and cascade classifier,” Cognitive
Computation, vol. 11, no. 1, pp. 110–124, 2019.
[21] Q. Liu, C. Liu, and Y. Wang, “etc. Integrating external dic-
tionary knowledge in conference scenarios the field of per-
sonalized machine translation method,” Journal of Chinese
Informatics, vol. 33, no. 10, pp. 31–37, 2019.
[22] S. Walia and K. Kumar, “Digital image forgery detection: a
systematic scrutiny,” Australian Journal of Forensic Sciences,
vol. 51, no. 5, pp. 488–526, 2019.
[23] S. A. Bansode, V. R. More, S. P. Zambare, and M. Fahd, “Effect
of constant temperature (20 0C, 25 0C, 30 0C, 35 0C, 40 0C)
on the development of the Calliphorid fly of forensic im-
portance, Chrysomya megacephala (Fabricus, 1794),” Journal
of Entomology and Zoology Studies, vol. 4, no. 3, pp. 193–197,
2016.
[24] G. Boato, D. T. Dang-Nguyen, and F. G. B. De Natale,
“Morphological filter detector for image forensics applica-
tions,” IEEE Access, vol. 8, pp. 13549–13560, 2020.
[25] F. A. Al-Mekhlafi, R. A. Alajmi, Z. Almusawi et al., “A study of
insect succession of forensic importance: Dipteran flies
(diptera) in two different habitats of small rodents in Riyadh
City, Saudi Arabia,” Journal of King Saud University Science,
vol. 32, no. 7, pp. 3111–3118, 2020.

You might also like