0% found this document useful (0 votes)
15 views

A New Forensic Video Database for Source Smartphone Identification Description and Analysis

Uploaded by

Asifvali Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

A New Forensic Video Database for Source Smartphone Identification Description and Analysis

Uploaded by

Asifvali Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received January 12, 2022, accepted February 9, 2022, date of publication February 14, 2022, date of current version

February 25, 2022.


Digital Object Identifier 10.1109/ACCESS.2022.3151406

A New Forensic Video Database for Source


Smartphone Identification: Description
and Analysis
YOUNES AKBARI 1 , SOMAYA AL-MAADEED 1 , (Senior Member, IEEE),
NOOR AL-MAADEED1 , (Member, IEEE), AL ANOOD NAJEEB 1 ,
AFNAN AL-ALI 1 , FOUAD KHELIFI 2 , (Member, IEEE),
AND ASHREF LAWGALY 2
1 Department of Computer Science and Engineering, Qatar University, Doha, Qatar
2 Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, NE1 8ST, U.K.
Corresponding author: Somaya Al-Maadeed ([email protected])
This work was supported in part by the National Priorities Research Program (NPRP) through the Qatar National Research Fund
(a member of Qatar Foundation) under Grant NPRP12S-0312-190332, and in part by the Open Access Funding Provided by the
Qatar National Library.

ABSTRACT In recent years, the field of digital imaging has made significant progress, so that today every
smartphone has a built-in video camera that allows you to record high-quality video for free and without
restrictions. On the other hand, rapidly growing internet technology has contributed significantly to the
widespread use of digital video via web-based multimedia systems and mobile smartphone applications such
as YouTube, Facebook, Twitter, WhatsApp, etc. However, as the recording and distribution of digital videos
have become affordable nowadays, security issues have become threatening and spread worldwide. One of
the security issues is identifying source cameras on videos. There are some new challenges that should be
addressed in this area. One of the new challenges is individual source camera identification (ISCI), which
focuses on identifying each device regardless of its model. The first step towards solving the problems is a
popular video database recorded by modern smartphone devices, which can also be used for deep learning
methods that are growing rapidly in the field of source camera identification. In this paper, a smartphone
video database named Qatar University Forensic Video Database (QUFVD) is introduced. The QUFVD
includes 6000 videos from 20 modern smartphone representing five brands, each brand has two models, and
each model has two identical smartphone devices. This database is suitable for evaluating different techniques
such as deep learning methods for video source smartphone identification and verification. To evaluate the
QUFVD, a series of experiments to identify source cameras using a deep learning technique are conducted.
The results show that improvements are essential for the ISCI scenario on video.

INDEX TERMS Database, smart phone, source camera identification on videos, deep learning methods.

I. INTRODUCTION of investigations have a potential importance in research


Cellphone has developed rapidly over the past century due to fields in all the sectors like medicine, law, and surveillance
its economic advantages, functionality and ease of access [1]. system where images and videos authenticity is important.
It allows the creation of digital audiovisual content without In general, video forensic analysis is much more difficult than
any constraints such as time, objects, places and network the image analysis due to lossy video compression, so the
connections [2]. Smartphone devices can provide some per- current traces can be erased or significantly damaged by
tinent information for crime prosecution and forensic inves- high compression rates, making all or part of the processing
tigations in massively important manner [1]. These types records unrecoverable. While numerous forensic methods
have been developed based on digital images [3]–[9], the
The associate editor coordinating the review of this manuscript and forensic analysis of videos has been less explored. It should
approving it for publication was Roberto Caldelli. be noted that methods based on images cannot also be applied

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
20080 VOLUME 10, 2022
Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

directly to videos [10]–[12]. This is due to some challenges


such as compression, stabilization, scaling, and cropping,
as well as the differences between frame types that can occur
when producing a video.
Video identification algorithms are used to identify and
distinguish camera types based on video produced by digital
cameras. During the last few years, forensic specialists have
been particularly interested in this topic. In general, there
are two main ways to identify images and videos, namely,
examining images or videos to extract a unique fingerprint of
the camera and using metadata associated with the images or
videos (the DNA of a video). Lopez et al. [13] demonstrated
that the internal elements and metadata of video can be
used for source video identification. Since metadata can be
FIGURE 1. Overview of QUFVD structure and evaluation step for ISCI
removed from an image or video, identifying video or images scenario.
based on fingerprint is a reliable method. Moreover, two
concepts are considered for identifying camera: individual
source camera identification (ISCI) and source camera model presented by both videos and I-frames corresponding with
identification (SCMI). ISCI distinguishes cameras from both videos. We also consider training, validation, and testing
the same and different camera models, while SCMI is a subset data for two common categories of methods used in the
of ISCI that distinguishes a particular camera model from field, namely Photo Response Non Uniformity (PRNU) and
others, but cannot distinguish a particular camera model from Machine Learning approaches. PRNU, which is understood
the same camera models. SCMI has been more researched to be the unique fingerprint of the camera, is often referred
compared to ISCI [14]. Other aspect that is important in to as residual noise or sensor pattern noise (SPN). PRNU
identifying source camera on video is codec for compensation occurs when the CCD (Charge Coupled Device) or CMOS
purpose. The codec may affect the accuracy of the source (Complementary Metal Oxide Semiconductor) sensors pro-
camera identification, as some useful information is lost dur- cess the input signal (light) and convert it into a digital signal.
ing the action. In deep learning methods, which are a popular category of
As a result of the challenges and advances in research in machine learning, this traing step should be performed to
the field of forensic video analysis, such as deep learning extract the fingerprint of the camera. The main challenges
methods, there is a need for standard databases that allow for these methods are the separation of content from noise
researchers to more easily compare techniques using the same and the number of training data. The first challenge can be
experimental protocols. Although there are several databases solved by introducing architectures that address the problem
for identifying source cameras for images [15], [16], there are by, for example, adding new layers and loss functions.
few databases for videos. Therefore, for new challenges such The paper is organized as follows. Section II is a review
as ISCI and deep source camera identification analysis that to available databases for videos with a brief description.
focus on video, it is essential to have a database to perform Our motivation is explained in Section III. Our new video
new methods on video. database is completely presented in Section IV. Section V
Since most video databases focus on videos recorded with describes database evaluation based on the deep learning
a (videocassette recorder) VCR, and among them there is only method. The last section concludes this work.
one database for smartphones (Daxing) [1], in order to give
new tasks to the databases, we focus on presenting a smart- II. LITERATURE REVIEW
phone database for videos. It should be noted that the Daxing The databases presented based on videos are summarized in
database cannot cover the ISCI challenge for all devices (out Table 1.
of the 22 models, 16 models can be used for the challenge) One of the main reasons for the lower exploration of videos
and QUFVD is more suitable to train a Deep Learning method compared to images is that there are few standard digital
due to the number of videos recorded in the database (6000 video databases to develop the methods [14]. We explore the
videos) compared to Daxing (1400). This study is an attempt databases in the section.
to develop a database that is unique to the new challenges of CAMCOM2010 [17] is a contest designed to identify
smartphone video. The structure of the database for identi- source YouTube videos. Two participants submitted results
fying source camera is shown in Figure 1. As shown in the despite a satisfactory number of participants at first. However,
figure, to evaluate the database, we need to extract frames. the database is not available publicly.
Generally, frames consists of intra-coded picture (I-frame), The University of Surrey’s website provides access to
predictive coded picture (P-frame), and bi-predictive coded SULFA database [18]1 that contains original videos and
picture (B-frames), which show promising results among
those obtained with I-frames [12], [14]. The database is 1 http://sulfa.cs.surrey.ac.uk/

VOLUME 10, 2022 20081


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

TABLE 1. The databases presented based on videos.

forged ones. The original videos are suitable for source cam- vertically in each scene. Each scene contains at least three
era identification purpose. About 150 videos were collected videos. In addition, all videos were recorded over 10 seconds.
from three sources that was also extended by [19]. Method The database was evaluted by method presented in [23].
presented in [20] was tested in the study. SOCRatES database5 [24] captured by smartphones.
VISION database2 was introduced in [21] that is the Around 9700 images and 1000 videos were taken by 103 dif-
most popular database in the field. In total, 35 portable ferent smartphones from 15 different brands. [3] and [25]
devices from 11 major brands contributed 34,427 images were assessed on the database.
and 1914 videos, all in native format and social format
(Facebook, YouTube and WhatsApp are included). There are III. MOTIVATION
videos captured in indoor, outdoor, and flat scenarios. Videos The rapid development of new smartphones in the field of
of flat surfaces such as walls and sky are included in the imaging may be an important reason for the development of
flat scenario. Videos depicting offices or shops are included databases in forensic analysis, especially in the identification
in the indoor scenario, while videos depicting gardens are of source cameras. On the other hand, the coverage and com-
included in the outdoor scenario. Three recording modes were pletion of aspects that other databases have not considered
used for each scenario: Still mode, where the user stands in the development of the databases may lead researchers to
still while the video is recorded. While capturing the video present a new database.
(moving video), the user walks, the panrot mode combines As described in the previous sections, most databases
a pan with a rotation to achieve a recording. YouTube and contain videos recorded on VCR, and only one of these
WhatsApp social media platforms were used to exchange databases is dedicated to smartphones (Daxing) [1]. Although
videos belonging to each scenario. In the study, they evaluated the database can be considered as an important database in
the database by method presented in [4]. this field as it covers a wide range of devices, there are some
video-ACID database3 was presented in [14] to source aspects that may lead researchers to develop a new database
camera identification that is accessible publicly. Over 12,000 to meet new challenges in this field.
videos were collected from 46 physical cameras repre- Table 2 shows the database in detail based on number of
senting 36 different camera models in the video- ACID’s videos for each device. Of the 90 devices used in the database,
database. All of these videos were shot manually to represent 85 devices were considered for recording the videos. As can
a range of lighting conditions, content, and motion. Moreover, be seen from the table, the range of videos recorded by
this database is suitable for both SCMI or ISCI scenarios. the devices is limited. This may be because the Daxing
They evaluated deep learning method presented in [22]. database focuses on both videos and images. The smallest
[1] presented a Daxing smartphone identification number of videos recorded by a device in the Daxing database
database,4 which include both images and videos from exten- is 4 while the largest number of videos recorded by a device
sive smartphones of different brands, models and devices. is 106 videos, where only one device has 106 videos and the
The data from 90 smartphones, representing 22 models rest of the devices have less than 31 videos. More devices
and 5 brands, includes 43400 images and 1400 videos. In the have 12 to 28 videos. On average, the number of videos
case of the iPhone 6S (Plus), 23 different smartphone models for each device is around 26. As a result, the assessment
are available. Scenes selected normally include a sky, grass, of PRNU-based methods may not be reliable. Furthermore,
rocks, trees, stairs, a vertical printer, a lobby wall, and a white source camera identification techniques based on machine
wall in a classroom, among others. The videos were shot learning may face a problem of unbalanced data since the
number of training videos is small and differs across the
2 https://lesc.dinfo.unifi.it/VISION/ devices. This prompts the researchers to adjust and balance
3 misl.ece.drexel.edu/video-acid
4 https://github.com/xyhcn/Daxing 5 http://socrates.eurecom.fr/

20082 VOLUME 10, 2022


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

FIGURE 2. Sample frames from captured videos: (a) Huawei-Y7 (device 1), (b) Huawei-Y7 (device 2), (c) Huawei-Y9 (device1), (d) Huawei-Y9 (device 2),
(e) iPhone-8Plus (device 1), (f) iPhone-8Plus (device 2), (g) iPhone-XsMax (device 1), (h) iPhone-XsMax (device 2), (i) Nokia-5.4 (device 1), (j) Nokia-5.4
(device 2), (k) Nokia-7.1 (device 1), (l) Nokia-7.1 (device 2), (m) Samsung-A50 (device 1), (n) Samsung-A50 (device 2), (o) Samsung-Note9 (device 1),
(p) Samsung-Note9 (device 2), (q) Xioami-RedmiNote8 (device 1), (r) Xioami-RedmiNote8 (device 2), (s) Xioami-RedmiNote9Pro (device 1),
(t) Xioami-RedmiNote9Pro (device 2).

the database before use. For example, for the iPhone 8 Plus, suitable for SCMI or ISCI. The options are described in more
24 videos were recorded for device #1 and only 4 videos were detail in this subsection. Table 3 summarizes our database
recorded for device #2. with its features. The QUFVD is publicly available.6
As our experiments have shown (Section V), increasing
the number of training data can improve the results in our A. DEVICES
database. Also, since most machine learning methods require There are several popular manufacturers that produce differ-
enough data to train, it is obvious that a database with many ent smartphone brands. Among them, there are few brands
more videos is better suited to machine learning methods that are widely used by people. In order to have a variety of
compared to Daxing for the ISCI scenario. As shown in brands, we collected the devices to be used for video record-
Table 2, for implementing ISCI scenario, only one device ing and selected 5 popular brands: iPhone, Samsung, Huawei,
has 106 videos to train, and the rest of the devices have less Xioami and Nokia. For each brand, we selected two differ-
than 31 videos to train. Additionally, to design the structure ent models, and for each model, we selected two devices.
of the Daxing database to be suitable for a machine learning Therefore, four devices are considered for each brand. The
approach, the videos need to be divided into training, testing total number of devices used to collect this database should
and validation sets. If we consider 26 videos for each device be 20 devices.
as the average of the database and use our structure to split the
database, we have 15, 7 and 4 videos for training, testing and B. SIZE PROPERTIES
validation respectively. Therefore, it is obvious that it can be With the development of deep learning methods in this area,
less to compare the machine learning methods fairly. It should a large number of videos or frame can improve the results
be noted that the Daxing database may be more suitable for in this area, as shown in this article. Therefore, a database
machine learning methods for the SCMI scenario for more with suitable size can be considered for both traditional
models. methods such as PRNU and deep learning methods. In our
Finally, it should be noted that a new database can be database, 300 videos are collected for each device, making
connected to other databases such as Daxing to obtain more a total of 6000 videos available. The length of the videos is
data and deal with new challenges. between 11 and 15 seconds at a frame rate of 30 frames per
second. Since I-frames play an important role in identifying
the source [12], [14], these types of frames are also extracted.
IV. QUFVD DESCRIPTION
Depending on the length and content of each video, they are
In this section, we discuss the features and structure of
QUFVD. For describing a database, the following options are 6 https://www.dropbox.com/sh/nb543na9qq0wlaz/
important: number of videos and camera, resolution, codec, AAAc5N8ecjawk2KlVF8kfkrya?dl=0

VOLUME 10, 2022 20083


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

TABLE 2. Number of videos captured based on Daxing database for each device.

TABLE 3. The devices of our database with their characteristics.

different. A total of 76531 I-frames are extracted by FFmpeg7 same camera model. In the database, two devices are consid-
software. Finally, to test Deep Learning methods, 500 patches ered for each smartphone. For example, for Samsung Galaxy
of 350 × 350 are extracted from each I-frame. A total of A50, the videos were captured from two devices. Moreover,
980,580 Patches are extracted for train set. the challenge is studied in the evaluation section. Therefore,
our database contains both SCMI and ISCI scenarios for all
C. CONTENT PROPERTIES models defining two 10- and 20-class problems.
In this database collection, we rely mainly on the static
camera despite the static and moving state of the camera E. CODEC PROPERTIES
when recording. This database contains very diverse video Video files are compressed with codecs, which are always
collections of different scenes, either outdoor, indoor, moving a tradeoff between quality and size (better quality vs. larger
or still objects, mainly gardens, sky, streets, shops, domestic file size). Video files can be compressed to reduce their size,
staff and sea. Figure 2 shows samples of the data based on which can reduce bandwidth usage and increase streaming
each device. speed. For encoding high-definition video, AVC is the stan-
dard codec used by several online video services, including
D. ISCI PROPERTIES YouTube and Vimeo. The MPEG-4 and the H.264 standards
One way to challenge the identification of the source camera are implemented by the library ’libx264’ in FFmpeg. All
is that the captured videos are from smartphones with the smartphones used for our database recorded videos according
to the H.264 video encoding standard, except for the iPhone
7 https://www.ffmpeg.org/ Xs Max and the Samsung Note9 (H.265).

20084 VOLUME 10, 2022


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

F. I-FRAME PROPERTIES
Group of Pictures (GOP) consists of I-frames, P-frames and
B-frames as intra-coded picture, predictive-coded picture and
bi-predictive coded picture respectively in coding standards
like MPEG series and H.264. I-frames are the least com-
pressible and do not require other video frames for decoding.
P-frames can be decompressed with data from previous
frames and are more compressible than I-frames. For
B-frames, previous and forward frames can also be used
as data references to achieve the highest compression. The
I-frame is generally more detailed than the P- and B-frames.
The GOP size, generally divided into fixed and unfixed,
is the number of B- and P-frames between two consecu-
tive I-frames. Several studies have demonstrated that meth-
ods based on I-frames give better results compared to other
FIGURE 3. Structure of the folders in the database.
frames [26]–[28].

G. VIDEO NAMING
All videos are renamed according to the following rule: that 80% of these videos are considered as training set and the
‘‘Brand_Model_device No._Video No.’’ for example: We remaining 20% are considered as test set. Also, 20% of the
have ‘‘iPhone_XS Max_1_(1)’’, which refers to the first training data is considered as validation data. The structure
video from the first device of the model XS Max of the iPhone can be used for both classical (like PRNU) and machine
mobile brand, another example: ‘‘Samsung_A50_2_(30)’’, learning methods. For example, in PRNU methods, reference
refers to the video number 30 from the second device of the patterns can be obtained from videos in the training data and
model Galaxy A50 of the mobile brand Samsung, and so on. query patterns in the test data. Since, as mentioned earlier, the
Also, for I-frames, the frame number and type are appended I-frames lead to better results, the I-frames of the videos are
to the video name, e.g., ‘‘Huawei-Y7Prime2019-1(14)-31-I’’ extracted to evaluate the database. The statistics for training,
indicates the 31st frame with the type I-frame of the device. testing, and validation at both the video and frame levels are
shown in Table 4. For each video in each experimental series,
we selected all I-frames related to the videos in the training,
H. RESOLUTION AND COLOR MODE
testing, and validation series. A total of 76531 I-frames were
The resolution of a video is the width and height of the video
extracted.
in pixels. All videos recorded in the database are based on
The method presented in [29] is used to evaluate our
the rear camera of smartphones. There are two types of reso-
database. Also, in references [30], [31] and [22], the CNN
lutions in the database, namely 720 × 1280 and 1080 × 1920.
method (MISLnet CNN architecture [29]) was used to iden-
Also, the frames are stored in two modes: Color (True Color)
tify the source camera, using frames to train the network.
and Grayscale. It can be tested whether the resolution and The network used a constrained convolutional layer that was
color mode affect the results.
added as the first layer that used three kernels with size 5. This
layer is constructed in such a way that there are relationships
I. STRUCTURE OF THE DATABASE between adjacent pixels that are independent of the content of
The overall structure of QUFVD is shown in Figure 3. The the scene. The methods was tested on VISION database [21].
structure can be modified by researchers according to their The experiments showed that the layer can improve results
methods and facilities. Moreover, the database can be com- compared with deep learning architectures without the layer.
bined with other databases (e.g. the Daxing database) to The structure of the CNN for the three studies is shown in
address new scenarios and challenges and to have a wider Figure 4. As shown in the figure, a constrained convolutional
choice of brands. layer is added to a simple CNN.
Our database is evaluated against two main scenarios of
V. QUFVD EVALUATION ISCI and SCMI. A 10-class problem should be considered for
The quality of our database is evaluated in this section by SCMI and a 20-class problem for ISCI. For each, the effect
experimenting with ISCI and SCMI scenarios with different of the number and size of patches is examined, as well as the
settings based on a Deep Learning method. We divide the effect of the color mode, i.e., gray and true color modes. All
experiments into different scenarios showing the influence of videos were encoded according to the respective device codec
some conditions on the results. This result provides a base- using the H.264 or H.265 video encoding standard, and no
line for the accuracy of camera model identification in the video was edited or re-encoded.
QUFVD database and can be used for comparison with other To identify a video based on its I-frames, all I-frames in
methods. The division of the database for the experiments is the test set are considered. The scores obtained by the CNN

VOLUME 10, 2022 20085


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

based on the highest probability show which I-frames belong Also, the processing time for patch, frame and a video
to which classes. At the video level, a majority vote then with 11 I-frames is shown in Table 11, which is performed
decides all the frames that belong to a video. by processing frame size of 1920 × 1080.
A 64-bit operating system (Ubuntu 18) with a CPU
E5-2650 v4 @ 2.20 GHz, 128.0 GB RAM, and four NVIDIA B. RESULT DISCUSSION
GTX TITAN X. was used in order to run our experiments. State-of-the-art source camera identification methods have
faced challenges such as compression, stabilization, and
ISCI. Various methods have been presented to overcome
A. ISCI VS SCMI these challenges. Recently, Deep Learning methods have
The performance of the network is measured by comput- been introduced to solve these challenges. As mentioned ear-
ing the accuracy based on frame-level and video-level in lier, our database is also evaluated using deep learning method
both scenarios ISCI and SCMI. In classification stage, each developed to solve these problems. Overall, the results in
frame/video in the test data is classified into one of the frame and video levels show that the method is successful
10-class (SCMI) or 20-class (ISCI). The frame-level and for the SCMI problem, but it does not work well for the ISCI
video-level results for SCMI scenarios for each smartphone challenge. For both scenarios, when the results are reported
model are shown in Table 5. at the video level, improvement can be seen. The results are
To investigate the effects of device dependency, the ISCI discussed in more detail below.
scenario is considered. The results of the frame and video As shown in Table 5 in frame level, all devices except
levels in terms of accuracy for the ISCI scenario for each the Y7, 8 Plus, and Redmi Note9 Pro achieve more than
device are shown in Table 6. The result is based on the 70 % accuracy in grayscale mode. The biggest improvement
accuracy for each device. in the mode is for the Note 9 compared to the color mode.
Also overall accuracy, precision, recall and F1-score based The best results are reported for the Note 9 and Xs Max,
on frame-level for both scenarios ISCI and SCMI are reported which have the same codec (H.265). At the video level,
in Table 7. Precision is also called Positive Predictive an overall improvement is seen for all devices. The best
Value (PPV) which is a measure of the closeness of the set result with 95% is also obtained for the Note 9. However,
of predicted results. Recall is also known as True Positive its codec is similar to that of Xs Max, so Xs Max cannot
Rate (TPR) and F1-score is the harmonic average of the pre- see an improvement like the Note 9. In general, we cannot
cision and recall, where it is at its best at a value of 1 meaning make the decision that the codec has a direct effect on the
perfect precision and recall. results. Moreover, we can see that the resolution does not
Tables 5 and 6 also list the effects of color mode, affect the results, since Y7 and Y9 have the lowest resolution,
i.e., grayscale and true color. With this premise, Figure 5 and 6 but their results are not worse. Therefore, as a result of this
provide a more comprehensive picture of camera identifica- work, the two cases cannot confirm that codec and resolution
tion performance to check the quality of the CNN by present- are two effective factors in this area. However, in grayscale
ing the Receiver Operating Characteristic (ROC) curves for a mode, it can be seen that the results are better than in color
selected group of ten and twenty cameras from our database. mode.
Two values are calculated for each threshold: True Positive Based on Table 6 (ISCI scenario), although only 3 devices
Ratio (TPR) and False Positive Ratio (FPR). The TPR of have an accuracy of less than 65%, half of the devices achieve
a given class, e.g. Huawei Y7, is the number of outputs an accuracy of less than 50% at the frame level. Even though
whose actual and predicted class is Huawei Y7 divided by the Note 9 for Device 1 scores the best among all devices, similar
number of outputs whose predicted class is Huawei Y7. The to the SCMI scenario, Device 2 scores only 66.7%, which
FPR is calculated by dividing the number of outputs whose means the fifth place. There are no meaningful results in the
actual class is not Huawei Y7, but whose predicted class was table, except that grayscale mode still performs better than
Huawei Y7 by the number of outputs whose predicted class color mode for both frame and video levels.
is not Huawei Y7. Figures 5 and 6 show the TPR compared to the FPR
One of the most important factors in machine learning is for the two scenarios SCMI and ISCI in two modes (color
how much training data the model needs to perform well. and grayscale) at different frame-level thresholds. As can
To show how it works, a series of experiments were conducted be seen from the figures, we have different analysis for the
with increasing training data for the SCMI scenario for both devices in terms of TPR and FPR. The best performance is
gray and color modes. Table 8 shows the effect of the factor. shown by Nokia 5.4 with Area Under Curve (AUC=0.989)
In addition, the size of the patches can affect the per- compared to the second ranked Note 9 with AUC=0.987
formance of the CNN methods. For the experiment, four in grayscale mode (Figure 5 (b)). Moreover, as shown in
different sizes were considered for the SCMI scenario based Figure 6 (a and b), RedmiNote9Pro performs significantly
on 10000 greyscale patches (see Table 9). better in grayscale mode. From the figure, it can be seen that
For a more detailed analysis of the error detections, the Note 9 device 1 has the best performance with AUC=0.989.
confusion matrix for the ISCI scenario in grayscale mode is As shown in Table 7, all metrics are better in the SCMI
given in Table 10. scenario than in the ISCI scenario. Based on the results,

20086 VOLUME 10, 2022


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

FIGURE 4. Architecture of the ConstrainedNet. (Based on [29]).

TABLE 4. The statistics for training, testing, and validation at both the video and frame levels.

TABLE 5. The results of the frame and video levels in terms of accuracy TABLE 6. The results of the frame and video levels in terms of
(%) for the SCMI scenario for each smartphone model. accuracy (%) for the ISCI scenario for each device.

it is essential to improve the scenario in machine learning


approaches.
Table 8 shows that increasing the number of patches to
be trained can improve the results in both gray scale and
color modes for SCMI at frame level. 37.3% is improved
from 5000 patches to all patches (about 90000) trained
for each class. In the ISCI scenario, the improvement is
49.9% if all patches (about 45000) are trained, which is an patches are trained, the results are 2 % higher than in color
improvement of about 6%. Also in gray mode, when all mode.

VOLUME 10, 2022 20087


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

TABLE 7. The results of the frame level in terms of accuracy, PPV, TPV and
F-score (%) for the SCMI and ISCI scenario.

FIGURE 6. True and false positive rates (ROC) obtained in ISCI scenario
(a) 20 classes with color mode (b) 20 classes with gray scale mode.

TABLE 8. The impact of training data set on performance.

TABLE 9. The impact of patch size on performance.

FIGURE 5. True and false positive rates (ROC) obtained in SCMI scenario
(a) 10 classes with color mode (b) 10 classes with gray scale mode.

Table 9 shows that while the size of each patch can with 10000 patches for each class. For sizes over 350 × 350,
improve performance, it is limited by the size of the 350 × a drop in performance is indicated. Therefore, we chose this
350. It should be noted that the experiment was conducted size for all experiments in the evaluation.

20088 VOLUME 10, 2022


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

TABLE 10. Confusion matrix of ISCI scenario in grayscale mode. Classes 1 to 20 are Y7 (device 1), Nokia 5.4 (device 2), Nokia 7.1 (device 1), Nokia 7.1
(device 2), A50 (device 1), A50 (device 2), Note 9 (device 1), Note 9 (device 2), RedmiNote8 (device 1), RedmiNote8 (device 2), RedmiNote9Pro (device 1),
Y7 (device 2), RedmiNote9Pro (device 2), Y9 (device 1), Y9 (device 2), 8 Plus (device 1), 8 Plus (device 2), Xs Max (device 1), Xs Max (device 2), Nokia 5.4
(device 1), respectively.

TABLE 11. The processing time (second) for patch, frame and a video with 11 I-frames.

Table 10 shows the confusion matrix obtained for the The database is suitable for new challenges such as ISCI
ISCI scenario in grey scale mode. As mentioned earlier, the and for use by deep learning methods. The results show that
scenario is more challenging than SCMI and the results can improvement is essential for ISCI. Although it is not a fair
be improved in the next studies. The confusion matrix can comparison, the Deep Learning method used in our study
show misclassifications between all classes. As shown in the achieves promising results compared to the results reported
table, misclassifications between devices of the same brand by Daxing, which are based on the PRNU method.
occur in most cases, e.g., classes 14 and 15 (two devices Y9) In order to improve the video level results, different deci-
have the most misclassifications when they misidentify each sion making approaches such as fusion methods based on
other. weighting the score of the classifiers can be applied in the
As can be seen in Table 11, the time increases in all patch, future. We will add a few more tasks to the database where
frame and video levels as the patch size increases. we transfer videos over social media such as WhatsApp
and Facebook to study the impact of compression on source
VI. CONCLUSION camera identification. To detect video tampering, another task
This paper presents a new video database (QUFVD) based on adds forged videos to the database. To get more data and new
smartphones for source camera identification. The database challenges, our database can be attached to other databases.
includes five popular smartphone brands with two models per A augmentation method can be applied on the database to
brand with two devices for each model, 6000 original videos, have more data to train them. Although we cannot clearly see
and 76531 I-frames. The entire database is provided with the effects of codec and resolution in the method, it can be
an evaluation analysis for use by the research community. studied by other methods.

VOLUME 10, 2022 20089


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

ACKNOWLEDGMENT [21] D. Shullani, M. Fontani, M. Iuliani, O. A. Shaya, and A. Piva, ‘‘VISION: A


This publication was made possible by NPRP grant video and image dataset for source identification,’’ EURASIP J. Inf. Secur.,
vol. 2017, no. 1, pp. 1–16, Dec. 2017.
# NPRP12S-0312-190332 from Qatar National Research [22] B. Hosler, O. Mayer, B. Bayar, X. Zhao, C. Chen, J. A. Shackleford, and
Fund (a member of Qatar Foundation). Open Access funding M. C. Stamm, ‘‘A video camera model identification system using deep
provided by the Qatar National Library. The statement made learning and fusion,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal
Process. (ICASSP), May 2019, pp. 8271–8275.
herein are solely the responsibility of the authors. [23] M. Goljan, J. Fridrich, and T. Filler, ‘‘Large scale test of sensor fingerprint
camera identification,’’ Proc. SPIE, vol. 7254, Feb. 2009, Art. no. 72540I.
[24] C. Galdi, F. Hartung, and J.-L. Dugelay, ‘‘SOCRatES: A database of
REFERENCES realistic data for SOurce camera REcognition on smartphones,’’ in Proc.
[1] H. Tian, Y. Xiao, G. Cao, Y. Zhang, Z. Xu, and Y. Zhao, ‘‘Daxing smart- 8th Int. Conf. Pattern Recognit. Appl. Methods, 2019, pp. 648–655.
phone identification dataset,’’ IEEE Access, vol. 7, pp. 101046–101053, [25] C.-T. Li, ‘‘Source camera identification using enhanced sensor pattern
2019. noise,’’ IEEE Trans. Inf. Forensics Security, vol. 5, no. 2, pp. 280–287,
[2] S. Milani, M. Fontani, P. Bestagini, M. Barni, A. Piva, M. Tagliasacchi, Jun. 2010.
and S. Tubaro, ‘‘An overview on video forensics,’’ APSIPA Trans. Signal [26] T. Höglund, P. Brolund, and K. Norell, ‘‘Identifying camcorders using
Inf. Process., vol. 1, no. 1, p. e2, 2012, doi: 10.1017/ATSIP.2012.2. noise patterns from video clips recorded with image stabilisation,’’ in Proc.
[3] J. Lukäš, J. Fridrich, and M. Goljan, ‘‘Digital camera identification from 7th Int. Symp. Image Signal Process. Anal. (ISPA), Sep. 2011, pp. 668–671.
sensor pattern noise,’’ IEEE Trans. Inf. Forensics Security, vol. 1, no. 2, [27] M. Chen, J. Fridrich, M. Goljan, and J. Lukáš, ‘‘Source digital camcorder
pp. 205–214, Jun. 2006. identification using sensor photo response non-uniformity,’’ Proc. SPIE,
vol. 6505, Mar. 2007, Art. no. 65051G.
[4] M. Chen, J. Fridrich, M. Goljan, and J. Lukás, ‘‘Determining image origin
[28] M. Goljan, M. Chen, P. Comesaña, and J. Fridrich, ‘‘Effect of compres-
and integrity using sensor noise,’’ IEEE Trans. Inf. Forensics Security,
sion on sensor-fingerprint based camera identification,’’ Electron. Imag.,
vol. 3, no. 1, pp. 74–90, Mar. 2008.
vol. 2016, no. 8, pp. 1–10, Feb. 2016.
[5] A. Lawgaly and F. Khelifi, ‘‘Sensor pattern noise estimation based on
[29] B. Bayar and M. C. Stamm, ‘‘Constrained convolutional neural networks:
improved locally adaptive DCT filtering and weighted averaging for source
A new approach towards general purpose image manipulation detection,’’
camera identification and verification,’’ IEEE Trans. Inf. Forensics Secu-
IEEE Trans. Inf. Forensics Security, vol. 13, no. 11, pp. 2691–2706,
rity, vol. 12, no. 2, pp. 392–404, Feb. 2017.
Nov. 2018.
[6] A. Lawgaly, F. Khelifi, and A. Bouridane, ‘‘Weighted averaging-based [30] D. Timmerman, S. Bennabhaktula, E. Alegre, and G. Azzopardi, ‘‘Video
sensor pattern noise estimation for source camera identification,’’ in Proc. camera identification from sensor pattern noise with a constrained Con-
IEEE Int. Conf. Image Process. (ICIP), Oct. 2014, pp. 5357–5361. vNet,’’ 2020, arXiv:2012.06277.
[7] X. Kang, Y. Li, Z. Qu, and J. Huang, ‘‘Enhancing source camera identifi- [31] O. Mayer, B. Hosler, and M. C. Stamm, ‘‘Open set video camera model
cation performance with a camera reference phase sensor pattern noise,’’ verification,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
IEEE Trans. Inf. Forensics Security, vol. 7, no. 2, pp. 393–402, Apr. 2012. (ICASSP), May 2020, pp. 2962–2966.
[8] F. Ahmed, F. Khelifi, A. Lawgaly, and A. Bouridane, ‘‘Comparative anal-
ysis of a deep convolutional neural network for source camera identifi-
cation,’’ in Proc. IEEE 12th Int. Conf. Global Secur., Saf. Sustainability
(ICGS3), Jan. 2019, pp. 1–6.
[9] F. Ahmed, F. Khelifi, A. Lawgaly, and A. Bouridane, ‘‘Temporal image
forensic analysis for picture dating with deep learning,’’ in Proc. Int. Conf.
Comput., Electron. Commun. Eng. (iCCECE), Aug. 2020, pp. 109–114.
YOUNES AKBARI received the B.Sc. degree
[10] M. Iuliani, M. Fontani, D. Shullani, and A. Piva, ‘‘Hybrid reference-based in computer software engineering from Payame
video source identification,’’ Sensors, vol. 19, no. 3, p. 649, 2019.
Noor University (Central Branch of Birjand),
[11] S. Mandelli, P. Bestagini, L. Verdoliva, and S. Tubaro, ‘‘Facing device attri- Iran, in 2006, the M.Sc. degree in information
bution problem for stabilized video sequences,’’ IEEE Trans. Inf. Forensics
technology management from Payame Noor Uni-
Security, vol. 15, pp. 14–27, 2019.
versity (Central Branch of Tehran), Tehran, Iran,
[12] E. Altinisik and H. T. Sencar, ‘‘Source camera verification for strongly sta-
in 2011, and the Ph.D. degree in applied math-
bilized videos,’’ IEEE Trans. Inf. Forensics Security, vol. 16, pp. 643–657,
2021.
ematics (numerical analysis) from the University
of Semnan, Iran, in 2017. He was a Postdoctoral
[13] R. R. López, E. A. Luengo, A. L. S. Orozco, and L. J. G. Villalba, ‘‘Digital
video source identification based on container’s structure analysis,’’ IEEE
Researcher with the Department of Computer Sci-
Access, vol. 8, pp. 36363–36375, 2020. ence and Engineering, Qatar University (QU), and worked on the project of
[14] B. C. Hosler, X. Zhao, O. Mayer, C. Chen, J. A. Shackleford, and document binarization. He is a reviewer of several international journals in
M. C. Stamm, ‘‘The video authentication and camera identification the field of artificial intelligence, such as IEEE TRANSACTIONS ON CYBERNETICS,
database: A new database for video forensics,’’ IEEE Access, vol. 7, Pattern Recognition, IEEE ACCESS, and Artificial Intelligence Review. His
pp. 76937–76948, 2019. research interests include pattern recognition, neural networks, remote sens-
[15] T. Gloe and R. Böhme, ‘‘The ‘Dresden image database’ for benchmark- ing, and document analysis. He has published several papers on these areas.
ing digital image forensics,’’ in Proc. ACM Symp. Appl. Comput., 2010,
pp. 1584–1590.
[16] O. A. Shaya, P. Yang, R. Ni, Y. Zhao, and A. Piva, ‘‘A new dataset for source
identification of high dynamic range images,’’ Sensors, vol. 18, no. 11,
p. 3801, Nov. 2018.
[17] W. van Houten, Z. Geradts, K. Franke, and C. Veenman, ‘‘Verification of
video source camera competition (CAMCOM 2010),’’ in Proc. Int. Conf. SOMAYA AL-MAADEED (Senior Member, IEEE) received the Ph.D. degree
Pattern Recognit. Berlin, Germany: Springer, 2010, pp. 22–28. in computer science from Nottingham, U.K., in 2004. She is the currently
[18] G. Qadir, S. Yahaya, and A. T. S. Ho, ‘‘Surrey university library for forensic the Coordinator of the Computer Vision and AI Research Group. She enjoys
analysis (SULFA) of video content,’’ in Proc. IET Conf. Image Process. excellent collaboration with national and international institutions and indus-
(IPR). Edison, NJ, USA: IET, 2012, pp. 1–6. try. She is also a principal investigator of several funded research projects
[19] L. D’Amiano, D. Cozzolino, G. Poggi, and L. Verdoliva, ‘‘Video forgery generating approximately five million. She has published extensively pat-
detection and localization based on 3D patchmatch,’’ in Proc. IEEE Int. tern recognition and delivered workshops on teaching programming for
Conf. Multimedia Expo Workshops (ICMEW), Jun. 2015, pp. 1–6. undergraduate students. She attended workshops related to higher education
[20] K. Rosenfeld and H. T. Sencar, ‘‘A study of the robustness of PRNU-based strategy, assessment methods, and interactive teaching. In 2015, she was
camera identification,’’ Proc. SPIE, vol. 7254, Feb. 2009, Art. no. 72540M. elected as the IEEE Chair for the Qatar Section.

20090 VOLUME 10, 2022


Y. Akbari et al.: New Forensic Video Database for Source Smartphone Identification: Description

NOOR AL-MAADEED (Member, IEEE) received FOUAD KHELIFI (Member, IEEE) received the
the Ph.D. degree in computer engineering from Ingenieur d’Etat degree in electrical engineering
Brunel University, U.K., in 2014. She is currently from the University of Jijel, Algeria, in 2000, the
an Associate Professor with the Computer Science Magistère degree in electronic engineering from
and Engineering Department, Qatar University. the University of Annaba, Algeria, in 2003, and
She participated in many regional and interna- the Ph.D. degree from the School of Computer
tional conferences and published an important Science, in 2007. Then, he joined Queen’s Univer-
number of research articles in prestigious peer- sity Belfast, U.K., as a Research Student, in 2004.
reviewed journals, book chapters, and conferences From 2008 to 2010, he held a research position
proceedings. She has improved the relationship with the Digital Media and Systems Research
between academia and the industry by leading many research projects, both Institute, University of Bradford, U.K., before joining the Department of
domestically and abroad, totaling over eight million QAR in her fields Computer and Information Sciences, Northumbria University at Newcastle,
of specialization, such as image processing, speech, speaker recognition, U.K., as a Lecturer (an Assistant Professor), where he is currently an Asso-
intelligent pattern recognition, video-surveillance systems, and biometrics. ciate Professor (a Reader). He has authored and coauthored over 90 pub-
She is a member of the First Batch Qatar Leadership Center, the Current lications and has successfully supervised 12 Ph.D. students. His research
and Future Leaders Program, Qatar University Senate, and other committees. interests include the fields of computer vision and machine learning, image
She is also a member of various international associations, such as IET, and video watermarking, image/video authentication and perceptual hashing,
BA, and IAENG. She participates in activities which connect her to the data hiding, image forensics and biometrics, image and video coding, and
community, such as working with charities and volunteering in sport events. medical image analysis.
She received the following awards, such as the Qatar Education Excellence
Platinum Award for new Ph.D. holders from Highness the Emir of Qatar, in
2014 and 2015, the Premium Award from IET Biometrics, in 2017, and the
Barzan Award, in 2019.

AL ANOOD NAJEEB received the bachelor’s


degree in computer application from the Sree
Narayana Institute of Technology, India, in 2019.
She is currently pursuing the M.S. degree in com-
puter science with Qatar University, Doha, Qatar. ASHREF LAWGALY received the B.Sc. degree
She is also working as a Research Assistant for in computer science from Garyounis University,
Dr. Somaya Al-Maadeed at Qatar University. Her Benghazi, Libya, in 2001, and the M.Sc. degree
research interests include image processing, com- (Hons.) in software engineering from Northum-
puter vision, and machine learning. bria University, Newcastle, U.K., in 2011. He then
joined the Northumbria University at Newcastle,
as a Research Student, in 2013, and received the
Ph.D. degree from the Department Computer and
Information Sciences (CIS), in 2017. From 2018 to
2020, he held a research position with the Depart-
AFNAN AL-ALI received the Master of Science degree in computer engi- ment of CIS, Northumbria University. He is currently a Senior Research
neering from the University of Basra, Basra, Iraq. She is currently pursu- Assistant with the CIS Department. His research interests include image and
ing the Ph.D. degree with Qatar University. Her research interests include video processing, multimedia forensics, information security, and medical
machine learning, AI, computer vision, object detection and classification, image analysis.
and machine learning for health care.

VOLUME 10, 2022 20091

You might also like