0% found this document useful (0 votes)

43 views6 pages

Cell-Phone Identification From Recompressed Audio Recordings

This document summarizes a research paper presented at the 2018 Twenty Fourth National Conference on Communications. The research aims to identify the source cell phone of an audio recording using passive signatures extracted from the recorded audio, even if the audio has undergone recompression, as often happens when audio is posted on social media. The researchers analyze features from both low and high frequency regions of recordings, using inverted Mel frequency cepstral coefficients to analyze high frequencies. Their method achieves 97.2% accuracy on a publicly available dataset and outperforms existing methods on doubly compressed audio recordings.

Uploaded by

lawgalyadel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views6 pages

Cell-Phone Identification From Recompressed Audio Recordings

Uploaded by

lawgalyadel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2018 Twenty Fourth National Conference on Communications (NCC)

Cell-Phone Identification from Recompressed

Audio Recordings
Vinay Verma, Preet Khaturia, and Nitin Khanna
Multimedia Analysis and Security (MANAS) Lab, Electrical Engineering
Indian Institute of Technology Gandhinagar (IITGN), Gujarat, India

Abstract—Many audio forensic applications would civil incident. A few decades back, most of these
benefit from the ability to classify audio recordings, based audio recordings needed to be created using dedicated
on characteristics of the originating device, particularly and costly setup. However, now, these recordings are
in social media platforms where an enormous amount of
data is posted every day. This paper utilizes passive signa- mainly obtained using hand-held multipurpose devices,
tures associated with the recording devices, as extracted especially cell-phones due to their ubiquitous presence.
from recorded audio itself, in the absence of any extrinsic During the last couple of years, there has been a
security mechanism such as digital watermarking, to tremendous increase in usage of smartphones due to
identify the source cell-phone of recorded audio. It uses their increasing functionalities resulting from more
device-specific information present in low as well as high-
frequency regions of the recorded audio. On the only capable and less costly hardware components in them.
publicly available dataset in this field, MOBIPHONE, the A report by International Telecommunication Union
proposed system gives a closed set accuracy of 97.2% (ITU), says that in 2015 there were more than seven
which matches the state of art accuracy reported for billion mobile cellular subscriptions in the world [1].
this dataset. On audio recordings which have undergone Moreover, the capability to easily manipulate user
double compression, as typically happens for a recording
posted on social media, the proposed system outperforms generated audio content by easily available and user-
the existing methods (4% improvement in average accu- friendly software and spreading of this content through
racy). social media platform has resulted in serious concerns
for law enforcement agencies around the world.
I. I NTRODUCTION Audio data is acquired by the acquisition device in
There has been a continuous evolution in research particular from the cell-phone, and authentication of
related to various aspects of automated audio process- the cell-phone from the audio which has gone through
ing such as speech, speaker and language recognition, some compression either by audio editing software
speech-based human-machine interaction and biomet- or compression mechanism employed by social media
rics. Simultaneously, there is a tremendous growth platforms while sharing the audio recordings, possibly
of electronic devices in the consumer market which due to storage requirement, becomes a challenging
has made hand-held devices such as digital cameras, problem. If source cell-phone can be identified even
portable scanners, tablets, and smartphones, key com- from such compressed audio recordings, it could be
ponents of our daily lives. Due to the widespread helpful for the forensic examiner to answer the foren-
usage of these handheld devices, the amount of user- sics questions such as verifying the claim for the
generated multimedia content is escalating. Thus, its authenticity of audio or ownership of the acquisition
study is an essential aspect of multimedia forensics device.
with numerous applications such as proclaiming mean- Performance of the proposed system is evaluated
ingful conclusions about a subject in the court of law. on audio recordings in their original format as well
Audio forensics, part of the broader field of mul- as when audio recordings had undergone a second
timedia forensics, pertains to the acquisition, analy- compression (compressed with different bit rate and
sis, and evaluation of audio recordings that can be sampling rate). Following are the main contributions
produced as evidence in the court of law. Evidence of this paper:
to be analyzed using audio forensic methods may • This paper addresses the importance of high-
come from a criminal investigation by law enforcement frequency region of the audio in identifying the
agencies or as part of an official inquiry into an source cell-phone. It utilizes device-specific infor-
accident, fraud, accusation of slander, or some other mation present in low as well as high-frequency
regions of the recorded audio. Most of the exist-
This research was supported by a grant from the Department of
Science and Technology (DST), New Delhi, India, under Award ing systems for source cell-phone identification
Number ECR/2015/000583. Any opinions, findings, and conclusions from the recorded audio have focused on utilizing
or recommendations expressed in this material are those of the features from low-frequency regions.
author(s) and do not necessarily reflect the views of the Department
of Science and Technology. Address all correspondence to Nitin • For extracting features from the high frequency
Khanna at [email protected]. region of the audio, our method uses inverted

978-1-5386-1224-8/18/$31.00 ©2018 IEEE

Authorized licensed use limited to: Northumbria University Library. Downloaded on November 23,2022 at 21:53:08 UTC from IEEE Xplore. Restrictions apply.
2018 Twenty Fourth National Conference on Communications (NCC)

Mel frequency cepstral coefficient (IMFCC) [2]. for classifying cell-phones according to their five dif-
IMFCCs have not been previously explored for ferent manufacturers. Aggrawal et al. [15] used only
source cell-phone classification. the noisy part of whole speech, and MFCC features
• The performance of our algorithm is also tested were extracted from the estimated noisy part of the
when the recorded audio has undergone second speech. Average accuracy of 90% was reported for
compression by popular audio editing software classifying cell phones belonging to different manufac-
such as Adobe Audition. To the best of our turers when speech content of the recorded files varied.
knowledge, such a study in source identification Authors in [7] used Gaussian supervectors (GSV’s) for
from doubly compressed audio has not yet been characterizing the intrinsic signatures of the record-
explored. However, it will be the case when ing device. This work also released the dataset MO-
recorded audio is posted on social media. BIPHONE. Source cell phone identification through
MFCC (Mel frequency cepstral coefficient) feature sparse representation is presented in [16]–[18]. Zou
with generalized linear discriminant sequence kernel et al. [16] used Gaussian supervectors (GSV’s) for
(GLDS-kernel [3], [4]) with order-2 has been used characterizing the intrinsic signatures of the recording
for comparison purpose. Some works for the cell- device. Moreover, they used an exemplar-based dic-
phone identification task using MFCC were proposed tionary and a dictionary learned by K-SVD learning
in [5]–[7]. These state-of-the-art handcrafted features algorithm for the sparse representation of GSV’s. For
for cell-phone identification task are well suited for similarity measure, correlation is estimated between
comparison with the proposed method. The primary two sparse representations. Further, KISS metric based
motive of this study is to reveal that high-frequency similarity between a pair of intrinsic signatures rep-
information of the audio also plays a significant part resented by sparse representations was used in [17].
and our experiments confirm this hypothesis. A dataset consisting of audio recordings from 14 cell-
Rest of the paper is organized as follows. In Sec- phones was used for evaluation purpose. Zou et al. [18]
tion II, previous work on source cell-phone identifi- extended their previous work [16] by proposing a new
cation is described. In Section III proposed method supervised learning method based on discriminative K-
is discussed. Experimental results are presented in SVD (D-KSVD). Li et al. [19] used a deep neural
Section IV, while conclusion and future work are network to learn the intrinsic signatures left by the
dicussed in Section V. recording device in the recorded audio. Further, spec-
tral clustering on the learned features was used to make
II. R ELATED W ORKS a single cluster for each recording device.
Any device specific distortions in the data acquired
III. P ROPOSED S YSTEM
by a device are referred as intrinsic signatures of the
acquisition device [8]. In audio forensics, the concept Every electronic component has some tolerance val-
of unique fingerprint of the recording device, extracted ues associated with it, and thus practical realization of
from the captured audio, has been previously used different instances of even the same electronic circuitry
for microphone identication [9]–[13]. In recent years, is unique. This also holds for hardware associated with
the idea of cell-phone identication from their recorded different parts of a cell-phone such as its microphone,
audio has been explored, keeping in view the increased and thus no two recorders associated with two dif-
number of cell-phone users [1]. Hanilci et al. [5] ferent cell-phones are expected to be exactly same.
first addressed the problem of cell-phone recognition This difference in the realization becomes even more
from the recorded audio. They proposed a cell-phone prominent for cell-phones manufactured by different
recognition system which was equivalent to speaker manufacturers as they may use their specific design
recognition system and used MFCC features, which of microphone [5]. An audio signal recorded by a
are well-known and state of the art for speech and cell-phone can be approximated as a convolution of
speaker recognition. A maximum closed set accuracy the original audio signal with the impulse response of
of 96.42% was reported using support vector machine the cell-phone. This implies that for the same input
(SVM) classifier on a dataset with 14 cell-phones audio signal, recorded signal will be slightly different
of different make and model. Further, they extended for different cell-phones (evident from Figure 2 in [5]
their work by extracting MFCC and linear frequency showing the spectrum of the same input audio recorded
cepstral coefficients (LFCC) features from the silence by different cell-phones).
region of audio recordings [6]. This approach was Thus, cell-phones leave traces of their specific con-
tested on the dataset used in their earlier work [5] and volutional distortion in the recorded audio signal, and
resulted in an improvement in classification accuracies. we propose to use such device-specific distortions as
In another work, Pandey et al. [14], estimated power intrinsic signatures of the cell-phones. One way to
spectral density from the speech free regions of the recognize the recording device is to find its impulse
recordings for source cell-phone classification. With response from the recorded audio and classify it ac-
twenty-six cell-phones of different make and models, cordingly. Estimation of this impulse response may
an average classification accuracy of 88% was obtained not always be possible as it will generally require

access to the device and specific experimental setup filter bank and energy is summed up in each of the
for response measurement. An alternative approach filters of the filter bank. That gives an amount of
taken in this paper is to estimate some device-specific energy contained in each of the filters of the filter
information from the recorded audio only and then bank. The logarithm of each energy value is taken
use this information to model a particular cell-phone. and discrete cosine transform (DCT) of log energies
A test recording can be compared with this model is calculated. These DCT coefficients are termed as
to recognize the recording cell-phone. This device inverse Mel frequency cepstral coefficients (IMFCC’s).
specific information may not be directly the system’s Generally, for most of the applications, first 12-13
impulse response, but the generated model captures coefficients are chosen excluding the DC component.
intra-class similarities and inter-class dissimilarities Inverted Mel filter bank (Figure 2) is generated
and thus enables identification of cell-phone used for using following equation (equation 1) that describes
the recording using only the recorded audio from that mapping from Hz scale of frequency to inverse Mel
cell-phone. scale of frequency:
The aim of the proposed cell-phone recognition
8031.25 − f
system is to capture device-specific discriminatory mb = 2889.22 − 2595 log10 1 + (1)
700
information using a suitable representation. The device
specific signature associated with recording circuitry of  
2889.22 − m b
a cell-phone is not limited to low-frequency regions f = 8031.25 − 700 10 2595 − 1 , (2)
and might contain substantial information in high-
frequency regions as well. Thus, this paper proposes a
combination of MFCC and IMFCC (which emphasizes where m b is a subjective pitch in inverted Mel corre-
high-frequency region of the signal as well) based sponding to the actual frequency f in Hz. This corre-
features for capturing the device-specific distortions sponds to taking 512 points discrete Fourier transform
present in a recorded sample. IMFCC features have (DFT) of each frame, for the audio having a sampling
been previously used for speaker identification [2] rate of 16 kHz. Detailed description and more general
and synthetic speech detection [20]. By using the form of the equation can be found in [2].
concatenation of MFCC and IMFCC features, as the
device-specific foot-prints/signatures, this paper ad- 3000

dresses the close-set cell-phone identification prob-

lem using access to only the audio recorded from 2500

them. After extracting suitable features, classification

Subjective pitch in inverted Mel

2000
is performed using SVM [21] classifier. The Effect
on the performance of the proposed system when the 1500

original recordings have undergone different kinds of

1000
compression is also presented here.
500
A. Feature Extraction
0
To model the device discriminatory content of audio, 0 1000 2000 3000 4000 5000 6000 7000 8000
Frequency (f) in Hz
we have used a combination of features obtained from
inversion of the Mel-filter bank (IMFCC) and MFCC. Fig. 1. Subjective pitch in inverted Mel vs frequency in Hz
MFCCs are very well known features for various
speech-related recognition tasks. Detailed description The filter bank contains equally spaced filters in
about MFCC can be found in [22]. While inverted inverted Mel scale but exponentially spaced in Hz scale
MFCCs (IMFCCs) proposed by Chakroborty et al. [2] of frequency. For example, the filters in filter bank gen-
are relatively new feature sets. IMFCC feature extrac- erated using equation 1 are more closely spaced in the
tion is based on the same pipeline as that of MFCC high-frequency region as compared to that in the low-
but with the major difference that in IMFCC, inverted frequency region of the signal. When the spectrogram
Mel scale of frequency has been used that affects to of an audio signal is multiplied with this filter bank,
change in the filter bank structure. A brief description the high-frequency region of the spectrogram is more
of IMFCC feature extraction is given below. emphasized, as compared to the low-frequency region.
To increase the signal energy at high-frequencies,
the signal is passed through pre-emphasis filter. The B. Feature Combination
signal is generally divided into 20-30ms of frames with Suppose for an audio segment of t seconds,
10-15ms of overlap. A window function, generally a MFCC or IMFCC feature matrix is Xn×d =
Hamming window is used to smooth out the signal to [x1 x2 . . . xn ]T , where n is the number of
avoid some non-desirable artifacts in spectrum while frames in t seconds of audio segment and xi ∈ Rd ,
calculating the energy spectrum of each frame. The ∀i = 1, 2, . . . n is MFCC or IMFCC feature vector
energy spectrum is multiplied by inverse Mel spaced for each frame. In GLDS method [4], first each of

1 male and six female speakers are used for training pur-
0.8 pose and remaining six male, and six female speakers
are used for testing purpose. Thus, training and testing
Amplitude

0.6

0.4
sets are mutually exclusive and balanced in the number
of speakers, gender and source cell-phones. Similar
0.2
kind of train and test split is done in [7]. This kind of
0
0 1000 2000 3000 4000 5000 6000 7000 8000 splitting of the data make sure that performance of the
Frequeny in Hz
classifier is speaker independent and to a reasonable
Fig. 2. IMFCC Filter Bank
extent text independent(not fully text independently
because two sentences are same for each speaker).
Audio Pre- Inverse IMFCC
For our experiments speaker [1:7,14:18] are chosen
Window | DFT |2 Log DCT
emphasis Filter Bank
for training and remaining twelve speakers for testing
for each cell-phone (Table II in [7]). Two kinds of
Fig. 3. An Overview of IMFCC Feature Extraction experiments are performed to see the performance of
the proposed system in different scenarios. In one
experiment, decisions are made on 3-seconds of audio
xi is mapped into yi , which is in higher dimensional
segments (or subpart), while in another experiment
feature sapce (say p), using a polynomial expansion
as done in [7], decisions are made for each of the
function φ(.), where φ : Rd → Rp . So yi = φ(xi ),
recordings from different speakers (each recording of
∀i = 1, 2, . . . n. Details about polynomial expansion
approximately 30 seconds duration). The decision on
function φ(.) can be found in [4]. Polynomial expan-
the complete recording is made by taking the majority
sion of each feature vector results in a new feature
vote on the decisions of 3-second audio segments of
matrix Yn×p = [y1 y2 . . . yn ]T . In the second step,
that recording. To assess the effect of different kinds of
final feature vector z1×p is obtained using z = n1 eT Y,
compression on the proposed system, the experiments
where eT is [1 1 . . . 1]. Hence for a given audio
were conducted on two variants of MOBIPHONE
segment of t seconds (3-seconds in our case), say final
dataset viz. original and mp3 compressed.
MFCC and IMFCC feature vectors are zm and zim ,
that will result in proposed feature vector [zm zim ].
A. Experiments Using Original Audio Recordings
C. Classification For comparison with the existing results reported
A multi-class SVM classifier available in LibSVM on the MOBIPHONE dataset, this experiment uses the
package [21] is used for the classification purpose. original recordings present in the dataset for training
the classifier as well as testing. These original record-
IV. E XPERIMENTAL R ESULTS AND D ISCUSSIONS ings are available in lossless compressed wav file
To facilitate easier comparison with the existing format and contain audio sampled at 16 kHz sampling
methods, the performance of the proposed system is rate. For this task, an average classification accuracy
evaluated on a standard, publicly available dataset in of 91.2% and 93.2% is obtained when only MFCC
this field, MOBIPHONE [7]. This dataset consists of features and a combination of MFCC and IMFCC
21 cell-phones (Table I in [7]). Each of the cell- features respectively are used for classifying audio
phones is used to record 12 male and 12 female segments of length 3-seconds (Second column in Ta-
speakers randomly chosen from TIMIT dataset (Table ble V). These results demonstrate the complementary
II in [7]). Each of the speakers speaks ten sentences information captured in the features corresponding to
of approximately 3-seconds each. First two sentences the high-frequency region of the audio. Moreover,
are same for all the speakers while remaining eight are when the decisions are obtained for one complete
different. recording that is of approximately 30 seconds, an
For the results presented in this paper, MFCC [23] average classification accuracy of the 97.2% is ob-
and IMFCC feature estimations are done on a frame tained by the proposed system which is similar to the
size of 30ms with 15ms overlap, using the Hamming best accuracy obtained by MFCC based features and
window, and 26 filters in the filter bank; as these classifiers proposed in [7]. Classification accuracy for
are the most commonly used settings for frame size, each of the 21 cell-phone is shown in Figure 4, and
window, and overlap respectively. For generating a Figure 5 for the decision made on 3-seconds of audio
single feature vector for each subpart of 3-seconds, segments and on the complete recording respectively.
features for all the frames corresponding to this 3- Results in terms of percentage of correct and incorrect
seconds subpart are combined using GLDS kernel of classification accuracies with the proposed method and
order 2. This choice of order of GLDS kernel is similar method in [5], with decisions on complete recording,
to [5] which has reported similar performance by using is shown in Table I, and Table II respectively. In the
either 2nd order or 3rd order GLDS kernel. Table I, Table II, Table III, and Table IV, the columns
A classifier is trained and tested on two mutually P1 , P2 , . . . , P21 represents the cell-phone1, cell-
exclusive sets of audio recordings. Recordings of six phone2, . . . , cell-phone21 respectively. Actual brand

TABLE I
R ESULTS (% OF CORRECT AND INCORRECT CLASSIFICATION ACCURACIES ) USING P ROPOSED M ETHOD ON THE O RIGINAL
R ECORDINGS
Cell-Phones P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21
Correct
Classification
100 91.7 91.7 100 100 91.7 100 100 100 100 100 100 83.3 83.3 100 100 100 100 100 100 100

Incorrect (P14 , 8.33)

Classification
(P8 , 8.33) (P21 , 8.33) (P5 , 8.33) (P13 , 8.33)
(P15 , 8.33)

TABLE II
R ESULTS (% OF CORRECT AND INCORRECT CLASSIFICATION ACCURACIES ) USING MFCC [5] BASED M ETHOD ON THE O RIGINAL
R ECORDINGS
Cell-Phones P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21
Correct
Classification
100 91.7 91.7 100 91.7 75 100 100 100 100 100 100 66.7 83.3 91.7 100 100 91.7 100 100 100
(P14 , 16.7)
Incorrect (P5 , 16.7)
Classification
(P8 , 8.33) (P21 , 8.33) (P14 , 8.33) (P15 , 8.33) (P13 , 16.7) (P4 , 8.33) (P4 , 8.33)
(P13 , 8.33)
(P5 , 8.33)

TABLE III
R ESULTS (% OF CORRECT AND INCORRECT CLASSIFICATION ACCURACIES ) USING P ROPOSED M ETHOD ON MP 3 COMPRESSED
RECORDINGS ( SAMPLING RATE =16 K H Z AND BITRATE =40 K BPS )

Cell-Phones P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21
Correct
Classification
100 91.7 91.7 100 83.3 91.7 100 100 100 100 100 100 100 91.7 100 100 100 100 100 100 100

Incorrect (P6 , 8.33)

Classification
(P8 , 8.33) (P21 , 8.33) (P5 , 8.33) (P13 , 8.33)
(P14 , 8.33)

TABLE IV
R ESULTS (% OF CORRECT AND INCORRECT CLASSIFICATION ACCURACIES ) USING MFCC [5] BASED M ETHOD ON MP 3 COMPRESSED
RECORDINGS ( SAMPLING RATE =16 K H Z AND BITRATE =40 K BPS )

Cell-Phones P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21
Correct
Classification
100 83.3 91.7 100 100 83.3 100 100 100 100 100 100 58.3 83.3 100 100 100 91.7 100 100 100
(P14 , 16.7)
Incorrect (P5 , 8.33)
Classification
(P8 , 16.7) (P21 , 8.33) (P15 , 8.33) (P13 , 16.7) (P5 , 8.33)
(P13 , 8.33)
(P5 , 16.7)

and model of these cell-phones are described in [7]. 100

MFCC Proposed

In any column of these tables, say column P13 from 90

the Cell-Phones row of Table I, cell-phone from P13
Accuracy (%)

80
class has been correctly classified with classification
accuracy of 83.3% and missclassified with cell-phone 70

P14 and P15 with misclassification accuracy of 8.33% 60

and 8.33% respectively. Comparison of Table I with
50
Table II reveals that the proposed system achieves 1 3 5 7 9 11 13 15 17 19 21
Class Index
the state-of-art performance on existing dataset for
problems explored in the literature. Next set of ex- Fig. 4. Comparison of classification accuracies for all cell-phones
periments evaluates the performance of the proposed using original recordings (decision on 3-seconds segments).
system under mp3 compression.
MFCC Proposed
100

TABLE V 90
AVERAGE ACCURACIES (%) FOR O RIGINAL R ECORDINGS IN
Accuracy (%)

THE DATASET 80

3s Segments Complete Recording 70

MFCC [5] 91.2 94.4

60
Proposed System 93.2 97.2
50
1 3 5 7 9 11 13 15 17 19 21
Class Index

B. Experiments Using Recompressed Audio Record-

ings (in .mp3 format) Fig. 5. Comparison of classification accuracies for all cell-phones
using original recordings (decision on complete recording).
All the audio recordings present in MOBIPHONE
dataset are recompressed in mp3 format as it is one of
the most widely used audio file formats and supported the common settings present in Adobe Audition which
by most of the hand-held devices. Although one of is one of the most commonly used audio processing
the most commonly used libraries for this purpose, software. Table VI shows the average classification
ffmpeg [24] provides a large number of settings for accuracies for the recordings compressed in constant
mp3 compression, in this paper results are reported on bit rate (CBR) mode in .mp3 format with different

bit rates and sampling frequencies. Results in terms [4] W. M. Campbell, K. T. Assaleh, and C. C. Broun, “Speaker
of percentage of correct and incorrect classification Recognition with Polynomial Classifiers,” IEEE Transactions
on Speech and Audio Processing, vol. 10, no. 4, pp. 205–212,
accuracies with the proposed method and method 2002.
in [5] with decisions on complete recording is shown [5] C. Hanilçi, F. Ertaş, T. Ertaş, and . Eskidere, “Recognition of
in Table III, and Table IV respectively. Brand and Models of Cell-Phones,” vol. 7, no. 2, pp. 625–634,
2012.
[6] C. Hanilçi and T. Kinnunen, “Source Cell-phone Recognition
TABLE VI from Recorded Speech using Non-speech Segments,” Digital
AVERAGE ACCURACY (%) WITH DIFFERENT MP 3 COMPRESSION Signal Processing: A Review Journal, vol. 35, pp. 75–85, 2014.
PARAMETERS ( SAMPLING RATE AND BITRATE ) (A DOBE [7] C. Kotropoulos and S. Samaras, “Mobile Phone Identification
AUDITION ) using Recorded Speech Signals,” in 19th IEEE International
Conference on Digital Signal Processing (DSP), 2014, pp.
3s Segments Complete Recording 586–591.
11 kHz MFCC [5] 91.6 95.6 [8] N. Khanna, A. K. Mikkilineni, A. F. Martone, G. N. Ali,
24 Kbps Proposed System 93 96.4 G. T.-C. Chiu, J. P. Allebach, and E. J. Delp, “A Survey
11 kHz MFCC [5] 90.1 93.7 of Forensic Characterization Methods for Physical Devices,”
32 Kbps Proposed System 93.2 97.2 Digital Investigation, vol. 3, pp. 17–28, September 2006.
12 kHz MFCC [5] 92.2 96.4 [9] C. Kraetzer, A. Oermann, J. Dittmann, and A. Lang, “Digital
24 Kbps Proposed System 92.7 96.4 Audio Forensics: A First Practical Evaluation on Microphone
12 kHz MFCC [5] 90.3 93.7 and Environment Classification,” in Proceedings of the 9th
32 Kbps Proposed System 93.6 97.6 Workshop on Multimedia & Security, 2007, pp. 63–74.
16 kHz MFCC [5] 92.5 95.6 [10] R. Buchholz, C. Kraetzer, and J. Dittmann, “Microphone
32 Kbps Proposed System 93.1 97.2 Classification using Fourier Coefficients,” in International
16 kHz MFCC [5] 91.9 94.8
Workshop on Information Hiding. Springer, 2009, pp. 235–
40 Kbps Proposed System 93.4 97.6 246.
[11] D. Garcia-Romero and C. Y. Espy-Wilson, “Automatic Acqui-
sition Device Identification from Speech Recordings,” in IEEE
International Conference on Acoustics, Speech and Signal
V. C ONCLUSION Processing (ICASSP), 2010, pp. 1806–1809.
[12] Y. Jiang and F. H. F. Leung, “Mobile Phone Identification from
This paper has explored the characteristics of micro- Speech Recordings using Weighted Support Vector Machine,”
phone sensitivity used in cell-phones, and experimental in 42nd Annual Conference of the IEEE Industrial Electronics
Society (IECON), Oct 2016, pp. 963–968.
results are conclusive of the fact that in addition [13] H. Malik and J. Miller, “Microphone Identification using
to the low-frequency spectrum, there is significant Higher-order Statistics,” in Proc. AES Int. Conf. Audio Foren-
information regarding signature of the device in the sics, 2012, pp. 2–5.
[14] V. Pandey, V. K. Verma, and N. Khanna, “Cell-phone Iden-
high-frequency spectrum as well. The proposed system tification from Audio Recordings using PSD of Speech-free
achieved the average classification accuracy of 97.2% Regions,” in IEEE Students’ Conference on Electrical, Elec-
on the publicly available dataset MOBIPHONE and tronics and Computer Science (SCEECS), March 2014, pp.
1–6.
similar kind of accuracy when the audio recordings [15] R. Aggarwal, S. Singh, A. K. Roul, and N. Khanna, “Cellphone
undergo a different amount of compressions. On audio identification using noise estimates from recorded audio,”
recordings which have undergone double compression, in International Conference on Communications and Signal
Processing (ICCSP), April 2014, pp. 1218–1222.
which commonly happens when a recording is up- [16] L. Zou, Q. He, and X. Feng, “Cell Phone Verification from
loaded on a social media platform such as WhatsApp Speech Recordings using Sparse Representation,” in IEEE
or edited in Adobe Audition and re-saved, the pro- International Conference on Acoustics, Speech and Signal
Processing (ICASSP), April 2015, pp. 1787–1791.
posed features perform much better than the existing [17] L. Zou, Q. He, J. Yang, and Y. Li, “Source Cell Phone Match-
state of art features (Table VI). For example, when ing from Speech Recordings by Sparse Representation and
double compression happens at 16KHz, 40 Kbps, the KISS Metric,” in IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), March 2016, pp.
proposed features give an accuracy of 97.6% while 2079–2083.
MFCC gives an accuracy of 93.4% (Tables IV and [18] L. Zou, Q. He, and J. Wu, “Source Cell Phone Verification
VI). Further, the proposed system’s least accuracy is from Speech Recordings using Sparse Representation,” Digital
Signal Processing, vol. 62, pp. 125 – 136, 2017.
83.3% for P3 , while the existing MFCC features give [19] Y. Li, X. Zhang, X. Li, X. Feng, J. Yang, A. Chen, and Q. He,
least accuracy of 58.3% for P13 (Tables III and IV). “Mobile Phone Clustering from Acquired Speech Recordings
Future work will include more extensive evaluation for using Deep Gaussian Supervector and Spectral Clustering,”
in IEEE International Conference on Acoustics, Speech and
audio undergone through several compression stages Signal Processing (ICASSP), March 2017, pp. 2137–2141.
while transmitting through social media platforms. [20] D. Paul, M. Pal, and G. Saha, “Spectral Features for Synthetic
Speech Detection,” IEEE Journal of Selected Topics in Signal
Processing, vol. 11, no. 4, pp. 605–617, 2017.
R EFERENCES [21] C.-C. Chang and C.-J. Lin, “LIBSVM: A Library for Support
Vector Machines,” ACM Transactions on Intelligent Systems
[1] International Telecommunication Union (ITU), “ICT Facts and
and Technology (TIST), vol. 2, no. 3, p. 27, 2011.
Figures 2015,” Tech. Rep., May 2015.
[22] B. Logan et al., “Mel frequency cepstral coefficients for music
[2] S. Chakroborty, A. Roy, and G. Saha, “Improved Closed Set
modeling.” in ISMIR, 2000.
Text-independent Speaker Identification by Combining MFCC
[23] K. Wojcicki. (June 2011) HTK MFCC Toolbox.
with Evidence from Flipped Filter Banks,” International Jour-
http://in.mathworks.com/matlabcentral/fileexchange/32849-
nal of Signal Processing, vol. 4, no. 2, pp. 114–122, 2007.
htk-mfcc-matlab.
[3] W. M. Campbell, “Generalized Linear Discriminant Sequence
[24] F. Bellard, M. Niedermayer et al., “FFmpeg,” Available from:
Kernels for Speaker Recognition,” in IEEE International Con-
http://ffmpeg. org, 2017.
ference on Acoustics, Speech, and Signal Processing (ICASSP),
vol. 1, 2002, pp. 161–164.

Authorized licensed use limited to: Northumbria University Library. Downloaded on November 23,2022 at 21:53:08 UTC from IEEE Xplore. Restrictions apply.

HCell Phone Forensics (IBC)
100% (1)
HCell Phone Forensics (IBC)
29 pages
Speech To Text Preprint Version
No ratings yet
Speech To Text Preprint Version
29 pages
Buy Ebook Seeking The Truth From Mobile Evidence Basic Fundamentals Intermediate and Advanced Overview of Current Mobile Forensic Investigations 1st Edition Bair Cheap Price
100% (3)
Buy Ebook Seeking The Truth From Mobile Evidence Basic Fundamentals Intermediate and Advanced Overview of Current Mobile Forensic Investigations 1st Edition Bair Cheap Price
58 pages
Vericast
No ratings yet
Vericast
68 pages
Detection of Impostor and Tampered Segments in Audio by Using An Intelligent System
No ratings yet
Detection of Impostor and Tampered Segments in Audio by Using An Intelligent System
14 pages
Reis Et Al. - 2017 - ESPRIT-Hilbert-Based Audio Tampering Detection Wit
No ratings yet
Reis Et Al. - 2017 - ESPRIT-Hilbert-Based Audio Tampering Detection Wit
12 pages
CSDF Mini Project Report
No ratings yet
CSDF Mini Project Report
17 pages
Verma CNN-based System For Speaker Independent Cell-Phone Identification From Recorded Audio CVPRW 2019 Paper
No ratings yet
Verma CNN-based System For Speaker Independent Cell-Phone Identification From Recorded Audio CVPRW 2019 Paper
9 pages
Nasima CSDF
No ratings yet
Nasima CSDF
16 pages
CSDF Meswcoe
No ratings yet
CSDF Meswcoe
16 pages
Band Energy Difference For Source Attribution in Audio Forensics
No ratings yet
Band Energy Difference For Source Attribution in Audio Forensics
11 pages
CSDF 2-10 Mergedr
No ratings yet
CSDF 2-10 Mergedr
11 pages
259
No ratings yet
259
154 pages
Grigoras, Smith, Morrison, Enzinger (2013) Forensic Audio Analysis - Review 2010-2013
No ratings yet
Grigoras, Smith, Morrison, Enzinger (2013) Forensic Audio Analysis - Review 2010-2013
28 pages
Project
No ratings yet
Project
14 pages
Channel Response Based Multi-Feature Audio Splicing Forgery Detection and Localization
No ratings yet
Channel Response Based Multi-Feature Audio Splicing Forgery Detection and Localization
8 pages
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
No ratings yet
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
8 pages
Minor Project Ms
No ratings yet
Minor Project Ms
12 pages
Audio Splicing Detection Using Convolutional Neural Network
No ratings yet
Audio Splicing Detection Using Convolutional Neural Network
5 pages
Subband Aware CNN For Cell-Phone Recognition: Xiaodan Lin, Jianqing Zhu, and Donghua Chen
No ratings yet
Subband Aware CNN For Cell-Phone Recognition: Xiaodan Lin, Jianqing Zhu, and Donghua Chen
5 pages
Project PPT Bhu
No ratings yet
Project PPT Bhu
12 pages
Professional Hardware and Software Solution For Advanced Speech Signal Analysis
No ratings yet
Professional Hardware and Software Solution For Advanced Speech Signal Analysis
24 pages
General Presentation
No ratings yet
General Presentation
18 pages
Information 12 00263 v2
No ratings yet
Information 12 00263 v2
15 pages
Audio DeepFake Detection(Innovative) (1)
No ratings yet
Audio DeepFake Detection(Innovative) (1)
16 pages
BTP_Final
No ratings yet
BTP_Final
16 pages
Forensic Audio Analysi1
No ratings yet
Forensic Audio Analysi1
19 pages
Digital Audio Forensics Fundamentals From Capture To Courtroom James Zjalic download
No ratings yet
Digital Audio Forensics Fundamentals From Capture To Courtroom James Zjalic download
83 pages
Audio_Deepfake_Detection_Using_Deep_Learning Paper2
No ratings yet
Audio_Deepfake_Detection_Using_Deep_Learning Paper2
6 pages
Mobile Device Forensics: Richard Ayers
No ratings yet
Mobile Device Forensics: Richard Ayers
19 pages
Lending An Ear in The Courtroom Forensic Acoustics PDF
No ratings yet
Lending An Ear in The Courtroom Forensic Acoustics PDF
8 pages
Detection of Synthetically Generated Speech
No ratings yet
Detection of Synthetically Generated Speech
5 pages
PR1 EXAM Ally
100% (2)
PR1 EXAM Ally
19 pages
36558
No ratings yet
36558
11 pages
Digital Forensics
No ratings yet
Digital Forensics
32 pages
Computer Forensics: Audio Forensic
No ratings yet
Computer Forensics: Audio Forensic
18 pages
Modern Techniques For Collection of Evidences and Role of Forensic Science
100% (2)
Modern Techniques For Collection of Evidences and Role of Forensic Science
4 pages
1933 Harry J Gardener Dynamic Numbers
100% (2)
1933 Harry J Gardener Dynamic Numbers
28 pages
B e A T: Call For Cell Phone Forensics
No ratings yet
B e A T: Call For Cell Phone Forensics
4 pages
Audio Fingerprint Generation: Andre Mosley Po T Wang John Broadway Yu-Heng Lee
No ratings yet
Audio Fingerprint Generation: Andre Mosley Po T Wang John Broadway Yu-Heng Lee
2 pages
Kanpur - 10
No ratings yet
Kanpur - 10
65 pages
Trig Bash
No ratings yet
Trig Bash
26 pages
Untitled Document
No ratings yet
Untitled Document
130 pages
Forensic Cop Journal 2 (2) 2009-Standard Operating Procedure of Audio Forensic
No ratings yet
Forensic Cop Journal 2 (2) 2009-Standard Operating Procedure of Audio Forensic
7 pages
Roomba 600 Manual
No ratings yet
Roomba 600 Manual
32 pages
On Symplectic Packing Problems in Higher Dimensions: Kyler Siegel, Yuan Yao
No ratings yet
On Symplectic Packing Problems in Higher Dimensions: Kyler Siegel, Yuan Yao
25 pages
Jamc Ir Advance c5051 5045 5035 5030
No ratings yet
Jamc Ir Advance c5051 5045 5035 5030
6 pages
Science Revision
No ratings yet
Science Revision
4 pages
From Traditional Grammar To Functional Grammar
No ratings yet
From Traditional Grammar To Functional Grammar
57 pages
IS R-7000 WL Cable Grease PDS 22287
No ratings yet
IS R-7000 WL Cable Grease PDS 22287
7 pages
TC Electronic Toneprint Editor 2 - 0 - Manual English PDF
No ratings yet
TC Electronic Toneprint Editor 2 - 0 - Manual English PDF
26 pages
Ite2004 Software-Testing Eth 1.0 37 Ite2004
No ratings yet
Ite2004 Software-Testing Eth 1.0 37 Ite2004
2 pages
15kW FinePackagesHome
No ratings yet
15kW FinePackagesHome
2 pages
UNIT 1 All Notes
No ratings yet
UNIT 1 All Notes
24 pages
A Researcher Wants To Conduct A Research Project A
No ratings yet
A Researcher Wants To Conduct A Research Project A
4 pages
GX Moulded Glass Packaging-Flyer 2016 01
No ratings yet
GX Moulded Glass Packaging-Flyer 2016 01
2 pages
00 00 00 2021 Nurses' Organizational Commitment and Job Performance in A Tertiary Hospital Questionnaire
No ratings yet
00 00 00 2021 Nurses' Organizational Commitment and Job Performance in A Tertiary Hospital Questionnaire
5 pages
Ca10 Specs
No ratings yet
Ca10 Specs
2 pages
Websphere Application Server 6.1 Questions and Answers
No ratings yet
Websphere Application Server 6.1 Questions and Answers
29 pages
Example of Petrographic Report
No ratings yet
Example of Petrographic Report
3 pages
Systematic Analysis of Simple Salt-2
No ratings yet
Systematic Analysis of Simple Salt-2
7 pages
6 - Way To Go Reducido-90-97
No ratings yet
6 - Way To Go Reducido-90-97
8 pages
Encyclopedia of Electrochemical Power Sources - Zinc Electrodes Solar Thermal Production
No ratings yet
Encyclopedia of Electrochemical Power Sources - Zinc Electrodes Solar Thermal Production
19 pages
Expert PDF Viewer Activex Control Patch by Under Seh Team
No ratings yet
Expert PDF Viewer Activex Control Patch by Under Seh Team
2 pages
Character Analysis
No ratings yet
Character Analysis
4 pages
ESP Pham Quynh Anh
100% (1)
ESP Pham Quynh Anh
6 pages
Assembly / Installation Instructions:: 6 Corporate Parkway Goose Creek, Sc. 29445 Www. Quoizel. Com
No ratings yet
Assembly / Installation Instructions:: 6 Corporate Parkway Goose Creek, Sc. 29445 Www. Quoizel. Com
1 page
Impact of Information Technology in Indian Banking Industry
No ratings yet
Impact of Information Technology in Indian Banking Industry
14 pages
Chap11 Perimeter
No ratings yet
Chap11 Perimeter
10 pages
NFC Technology and Applications: Definitive Reference for Developers and Engineers
From Everand
NFC Technology and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Unveiling the Enigma: Unraveling the Secrets of Mobile Operating Systems
From Everand
Unveiling the Enigma: Unraveling the Secrets of Mobile Operating Systems
Pasquale De Marco
No ratings yet
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Proceedings etc2016: European Telemetry and Test Conference etc2016
From Everand
Proceedings etc2016: European Telemetry and Test Conference etc2016
BoD - Books on Demand
No ratings yet
Proceedings etc2014: European Telemetry and Test Conference
From Everand
Proceedings etc2014: European Telemetry and Test Conference
The European Society of Telemetry
No ratings yet
Who Rules the Net?: Internet Governance and Jurisdiction
From Everand
Who Rules the Net?: Internet Governance and Jurisdiction
CSPtrade
No ratings yet
Android Application Security Essentials
From Everand
Android Application Security Essentials
Pragati Ogal Rai
No ratings yet
Network Security Traceback Attack and React in the United States Department of Defense Network
From Everand
Network Security Traceback Attack and React in the United States Department of Defense Network
Edmond K. Machie
No ratings yet
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
Secure Your Internet Use
From Everand
Secure Your Internet Use
Jerome Svigals
No ratings yet
Application of Wireless Technologies in Nuclear Power Plant Instrumentation and Control Systems
From Everand
Application of Wireless Technologies in Nuclear Power Plant Instrumentation and Control Systems
IAEA
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
From Everand
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
Kevin Wilson
No ratings yet
Computer Jargon: The Illustrated Glossary of Basic Computer Terminology
From Everand
Computer Jargon: The Illustrated Glossary of Basic Computer Terminology
Kevin Wilson
No ratings yet
Wardialing: Unveiling Cyber Tactics and Electronic Warfare
From Everand
Wardialing: Unveiling Cyber Tactics and Electronic Warfare
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Internet of Things & Wireless Sensor Network
From Everand
Internet of Things & Wireless Sensor Network
Ajit Singh
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Digital Video Fingerprinting: Enhancing Security and Identification in Visual Data
From Everand
Digital Video Fingerprinting: Enhancing Security and Identification in Visual Data
Fouad Sabry
No ratings yet

Cell-Phone Identification From Recompressed Audio Recordings

Uploaded by

Cell-Phone Identification From Recompressed Audio Recordings

Uploaded by

2018 Twenty Fourth National Conference on Communications (NCC)

Cell-Phone Identification from Recompressed

978-1-5386-1224-8/18/$31.00 ©2018 IEEE

dresses the close-set cell-phone identification prob-

them. After extracting suitable features, classification

original recordings have undergone different kinds of

Incorrect (P14 , 8.33)

Incorrect (P6 , 8.33)

and model of these cell-phones are described in [7]. 100

In any column of these tables, say column P13 from 90

P14 and P15 with misclassification accuracy of 8.33% 60

3s Segments Complete Recording 70

MFCC [5] 91.2 94.4

B. Experiments Using Recompressed Audio Record-

You might also like