0% found this document useful (0 votes)

5 views

7

The document discusses advanced file carving techniques in digital forensics, highlighting the need for improved methods due to the increasing complexity of data recovery. It presents an ontological scheme that categorizes the principles, phases, and tools involved in file carving, while also analyzing the role of artificial intelligence in enhancing these methods. The study aims to systematize existing knowledge and identify future directions for the development of file carving techniques.

Uploaded by

malazizi1981

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

7

Uploaded by

malazizi1981

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

204 ISSN 1814-4225 (print)

Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

UDC 004.63.056.3 doi: 10.32620/reks.2023.3.16

Maksym BOIKO1,2, Viacheslav MOSKALENKO1,3, Oksana SHOVKOPLIAS1

1
Sumy State University, Sumy, Ukraine
2
The National Anti-Corruption Bureau of Ukraine, Kyiv, Ukraine
3
National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine

ADVANCED FILE CARVING: ONTOLOGY, MODELS AND METHODS

File carving techniques are important in the field of digital forensics. At the same time, the rapid growth in the
amount and types of data requires the development of file carving methods in terms of capabilities, accuracy,
and computational efficiency. However, most of the methods are developed to solve specific tasks and are based
on a certain set of assumptions and a priori knowledge about the files to be recovered. There is a lack of research
that systematizes methods and structures approaches to identify gaps and determine perspective directions for
development, considering the latest advances in information technology and artificial intelligence. The subject
matter of this article is the structure, factors, efficiency criteria, methods, and tools of file carving, as well as
the current state and tendencies of development of file carving methods. The goal of this study is to systematize
knowledge about advanced file carving methods and identify perspective directions for their development. The
tasks to be solved are as follows: to identify the main stages of file carving and analyze approaches to their
implementation; to build an ontological scheme of file carving; and to identify perspective directions for the
development of carving methods. The methods used were literature review, systematization, and summarization.
The obtained results are as follows. An ontological scheme for the file carving concept is constructed. The
scheme includes the principles, properties, phases, techniques, evaluation criteria, tools used, and factors influ-
encing file carving. The features, limitations, and fields of application of the data recovery methods are provided.
It was established that the most widespread approach to file reconstruction is still a manually detailed analysis
of the internal structure of files and/or their contents, identifying specific patterns that allow reassembling the
sequence of data fragments in the correct order. However, most of the methods do not provide one hundred
percent guaranteed results. This article analyzes the current state and prospects of using artificial intelligence
methods in the field of digital forensics, particularly for identifying data blocks, clustering, and reconstructing
files, as well as restoring the contents of media files with damaged or lost headers. The necessity of having priori
information about the file structure or content for successfully carving fragmented data is determined. Conclu-
sions. The scientific novelty of the obtained results is as follows: for the first time, advanced file carving methods
are systematized and analyzed by directions of development and the perspectives of using artificial intelligence
for identifying data blocks, clustering, and file content restoration; for the first time, an ontological scheme of
file carving is constructed, which can be used as a roadmap for developing new advanced systems in the digital
forensics field.

Keywords: digital forensics; metadata; fragmentation; fragmented file; data recovery; file carving; file fragment
identification; file reconstruction; file restoring; artificial intelligence.

1. Introduction blocks [1]. Such disk space areas are only marked as free
for use and remain intact until they are allocated for stor-
1.1 Motivation of research ing other information. As a result, unallocated disk space
can contain forensically important data.
Users constantly create, view, edit, and delete many Some file types (for example, TXT, LOG, DOC)
files when working with data. This is a dynamic process. store their data in an uncompressed form. Their full or
The file system is responsible for the mechanisms and partial contents can be accessed without restoring the en-
rules for storing data on the disk space [1]. Researchers tire object by reading their detected data blocks or iden-
regularly search for deleted information and recover it tifying text fragments using search terms. However, this
when conducting digital forensic examinations. This is is not sufficient when trying to extract the contents of
explained by the fact that when illegal or compromising compound files that use compression, encryption, or have
activities are performed, it is evident that users try to a complex internal structure. These file types include
cover their trails and delete any sensitive information. JPG, BMP, AVI, MPG, DOCX, XLSX, PDF, and
If it does not consider the SSD’s internal pro- SQLITE.
cesses [2], file systems usually optimize their work so A separate digital forensics sphere is the study of
that they do not take any action with deleted data RAM, particularly volatile memory dumps in the

 Maksym Boiko, Viacheslav Moskalenko, Oksana Shovkoplias, 2023

Information security and functional safety 205

Windows operating system [3]. RAM areas may contain - identify the stages of data recovery where these
the contents of files the user has been working with that techniques are applied;
may not have been stored on the disk [4]. Such files oc- - determine the feasibility of using artificial intelli-
cupy non-contiguous data blocks, the location of which gence and advanced techniques.
may not always be known. Structurally, this work consists of the following sec-
The file recovery process is more difficult when the tions. The research methodology is described in section
files are fragmented and there is no file allocation data. 2. Section 3 discusses the main phases of digital foren-
In the above circumstances, searching for file fragments sics, data recovery with and without file system metadata,
and their corresponding positioning is a time-consuming and the ontological diagram of file carving. Advanced
and complex task with unclear solutions. In addition, it is file carving techniques and their details are provided in
necessary to consider the increasing number of digital de- section 4. Section 5 presents a discussion of the afore-
vices and the amount of information available in general. mentioned techniques. The last section provides conclu-
In recent years, there has been an intensifying use of ad- sions and indicates directions for future research.
vanced file carving techniques to solve and optimize var-
ious stages of such tasks. 2. Research methodology

1.2. Research gap The research hypothesis is that the carving of highly
fragmented files depends on three key factors:
In recent years, researchers have periodically re- - from improving the efficiency of the identification
viewed file carving techniques. The most common meth- of data fragments in unallocated space and/or RAM;
ods of data recovery are presented in [5]. - from the techniques of reconstruction of the de-
Some authors have focused on a survey of various tected file fragments;
data carving techniques of multimedia [6, 7] or JPEG [8] - directly from the file type and its internal content.
files. In [9], the researchers focused only on the effi- For this purpose, the three research questions iden-
ciency analysis of Scalpel and Foremost carving pro- tified for the current literature review are shown in Ta-
cesses. The paper [10] discusses the recovery of a more ble 1.
extensive set of file types focused on fragmented Mi-
crosoft Word documents. Table 1
In other cases, the file carving algorithms were di- Research questions
vided according to a particular principle. For example, # The question
in [11], the authors classified carving methods for JPEG What are the typical stages of file carving,
files into basic and advanced categories and conducted a Q1 and what are the perspectives for improving
detailed analysis of graph theoretic and weightage tech- each stage?
niques. Similar approaches to the classification of file Is it possible to carve fragmented files with-
carving methods are used in [12], where the author also Q2 out a priori information about their internal
presents a taxonomy of file carving techniques. structure and contents?
In addition to the techniques and carving directions What are the perspectives on using artificial
discussed, the work [13] includes data recovery research Q3 intelligence methods in the file carving
area mapping. field?
Despite a relatively large number of surveys on data
recovery techniques, the authors did not comprehen- The search approach is based on selecting and ana-
sively consider the problem of file carving. The works lyzing articles that address the problems of carving
are not sufficiently systematized. In addition, the onto- highly fragmented files or solve individual phases of this
logical relationship between file carving and various as- process.
pects of this process has not been established. The selection process consisted of several stages.
Initially, the most relevant studies were identified by
1.3. Objectives and Contributions searching for keywords in the titles and abstracts. The ar-
ticles were then reviewed for their relevance to the re-
This study systematizes and build schemes for research questions. The final set of articles was based on
covering highly fragmented files using advanced tech- the quality of the content.
niques and determine the feasibility of using artificial in- For a complete understanding of the problems that
telligence in this process. appear when recovering fragmented data in the absence
The key issues are as follows: of file system metadata, the literature for the period of
- analyze the existing advanced file carving tech- dynamic digital forensics growth approximately the last
niques; 20 years was analyzed.
206 ISSN 1814-4225 (print)
Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

3. Background, Directions and Ontology - the file system metadata is not affected;
- metadata of the deleted file is lost.
3.1 Digital forensics If the metadata is available, the file can be recovered
using information about the location of its data
Conducting digital forensics examinations, re- blocks [1]. The only nuance may be overwriting certain
searchers perform several actions depending on the type areas of the deleted file with other data. Then, at best,
of research, type and number of objects, tasks to be only a partial reconstruction of the file is possible with
solved, etc. [14]. In general, this process can be condi- the subsequent loss of some or all of its contents, depend-
tionally divided into four stages: collection, examination, ing on the file type, number, and character of lost frag-
analysis, and presentation (Fig. 1). ments.
Figure 3 illustrates a possible case of data overwrit-
ing. At the top is the initial state of the disk space with
existing files #1, #2, and #3. At the bottom is the current
state of the exact locations of the disk space, where file
Fig. 1. Digital forensics stages #1 is wholly overwritten and file #3 is partially overwrit-
ten after user manipulations. In this case, if the file sys-
During the first stage, copies of digital media are tem metadata is available, files #2 and #4 will be fully
collected and created. In the next phase, the created im- recovered, file #1 will be lost, and file #3 will be restored
ages are processed. As a rule, a full-fledged study of disk but partially overwritten. At the same time, the recovery
space is conducted: file system analysis, hidden infor- of even partial contents of file #3 is highly questionable.
mation detection, deleted file recovery, signature analy-
sis, indexing, pattern search, etc. In the last two stages,
investigators identify important data, interpret them, and
generate a report with detailed answers to the questions.

3.2 Deleted file recovery

The complexity of the deleted data recovery process

depends on the file system, the character of the user’s ac- Fig. 3. Example of possible data overwriting
tions when deleting information, the character and dura-
tion of further actions, etc. (Fig. 2). 3.3 File carving

The biggest problems arise when recovering deleted

information from lost or damaged file system metadata
and information about the location of file data blocks. In
this case, file carving techniques are applied, which are
well suited for recovering contiguous files that contain a
header and footer [15, 16]. For these purposes, the unal-
located space is searched for file beginning and end sig-
natures. However, this method has disadvantages if the
file consists of two or more non-contiguous fragments.
Fig. 4 shows an imaginary example of locating data
blocks of two deleted fragmented files on the disk space.
File A is divided into four clusters that are located out of
Fig 2. Data recovery by the degree of complexity (from
order. The file B occupies three clusters and is divided
the easiest at the top to the most difficult at the bottom)
into two fragments. During the recovery process, the
most likely problem is identifying the A3, A4, B2, and
The simplest case for recovering files in the most
B3 fragments. If A and B belong to the same file type, it
popular file systems (NTFS, FAT32, EXFAT, HFS,
is necessary to define the boundaries of each file. Finally,
EXT) is to delete data from the Recycle Bin. In this case,
to recover the file A, it is necessary to arrange the frag-
the file is not actually deleted but is moved to another
ments correctly.
location. Therefore, the blocks of data it occupies and its
metadata remain intact.
If the user deleted a file bypassing the Recycle Bin
or emptied the latter one, two situations are possible: Fig. 4. Example of possible data fragmentation
Information security and functional safety 207

Usually, it is not a problem to identify the first frag- antee 100% results and works well only with non-frag-
ment of a file, which in most cases has a clear marker in mented data. For this reason, in non-trivial cases, special-
the form of a header in its initial bytes. However, not all ists often use additional tools for manual data recovery,
files have a footer, which can also have any offset relative such as Hex Editors and highly specific scripts [17].
to the beginning of the block/cluster. If the file consists To evaluate the effectiveness of the software, it is
of three or more fragments, the first key problem is to advisable to determine the number of correctly recovered
determine the data blocks that do not have clear markers, (true positive or TP), incorrectly recovered (false positive
such as the header and footer. Subsequently, it is neces- or FP), and unrecovered (false negative or FN) files [18].
sary to cluster the detected fragments and directly recon- Subsequently, the following criteria are applied:
struct the file or its contents. - precision – the percentage of correctly recovered
Fig. 5 shows the ontological diagram, which indi- files among the results of the utility’s work [18]:
cates the principles of file carving, properties, tools re-
quired for this, the phases of file carving, factors that af- TP
Precision = TP+FP ; (1)
fect the result, techniques used, and criteria used for eval-
uation.
- recall – the percentage of correctly recovered files
The set of software tools is shown in Fig. 5 contains
from their total number in the digital media [18]:
only basic information, is not complete, and depends on
the platform on which the data recovery process is per-
TP
formed. Usually, at the pre-processing stage of file carv- Recall = TP+FN ; (2)
ing, utilities such as FTK Imager, DD, X-Ways Foren-
sics, and EnCase Imager are used to create a full bit-for- - f-measure – the overall performance of a
bit copy of the original media. Then, at the examination tool [18 - 20]:
stage, the disk space is analyzed. For this purpose, uni-
versal tools such as X-Ways Forensics, UFS Explorer, 1
Fmeasure = α/P+(1−α)/R , (3)
EnCase, Magnet Axiom, Autopsy, Forensic Explorer,
and FTK are most often used. They operate on the prin-
ciple of a Swiss Army knife. Scalpel, Foremost, Photo- where P is the precision, R is the recall, α is the numeric
Rec, and RecoverIt are utilities explicitly designed for value from 0 to 1 used to determine precision and recall
data recovery, which is performed using proprietary al- weights;
gorithms. The abovementioned software does not guar-

Fig. 5. The ontological diagram of file carving

208 ISSN 1814-4225 (print)
Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

- reliability – the tool’s efficiency among supported conducted using the keywords and their combinations
file types [18 - 20]: shown in Table 3.

SF−SFN Table 3
Reliability = , (4)
SF Search terms
# Keywords
where SF is the number of supported files in the dataset
1 file carving
and SFN is the number of supported false negatives;
2 data carving
- computational complexity is the amount of re-
3 smart carving
sources required to solve the task. Computational com-
plexity is often estimated by the data processing speed or 4 machine learning
task execution time with the same computing re- 5 artificial intelligence
sources [19, 20]. 6 data recovery
When comparing the effectiveness of the utilities, 7 fragmented files
some researchers [19, 20] also divided false positive files
into two categories: partially recovered (known false pos- The most relevant detected works, their direction,
itive or kFP) and remaining files (unknown false positive brief description, and particularities are shown in Ta-
or uFP). As a result, precision and recall are defined as ble 4. In general, from these studies, advanced file carv-
follows [19, 20]: ing techniques are successfully used to varying degrees
at the identification, clustering, reconstruction, and resto-
TP ration stages in addition to standard digital forensics
Precision = TP+uFP+kFP/β , (5)
methods. However, most studies do not clearly distin-
guish between these phases. For example, clustering and
where β is the numeric value not less than 1 used to de- validation often occur during file reconstruction and/or
termine the relative weight of uFP compared with kFP; restoring. Usually, these issues are solved in parallel. In
addition, most of the authors who addressed the issue of
all−FN reconstruction or restoring performed data validation and
Recall = , (6)
all
verification. Therefore, the last two stages are not men-
tioned separately in Table 4.
where all is the total number of files in the dataset.
In this case, identification means identifying data
Each of the above metrics (Precision, Recall,
blocks related to a specific type of data or files. Cluster-
F-measure, Reliability) can take a value from 0 to 1 and
ing involves dividing the identified data fragments into
show the quality of the tool. Metric values close to 1 in-
groups of blocks belonging to different files. The identi-
dicate that the software shows good performance. Ta-
fied data blocks are placed in the correct order during re-
ble 2 shows the interpretation of the low values of Preci-
construction. Instead, during the restoring process, the
sion, Recall, and Reliability metrics [18 - 20]. It is worth
file’s contents are restored in case of damage or loss of
noting, the authors often compare the number of success-
some file areas.
fully recovered files using their methods with the results
Fig. 6 shows pre-processing and typical stages of
of recognized utilities such as Scalpel, Foremost, Photo-
file carving.
Rec, etc.

Table 2
Interpretation of the low values of the metrics
Metric Interpretation
Precision A large number of false positives
A small number of correctly recovered
Recall
files
A large number of fails when recovering
Reliability
supported file type

4. Advanced file carving techniques

To review advanced file carving techniques, we an-
alyzed the works available on resources such as Sci-
enceDirect, Elsevier, and IEEE. To do this, a search was
Fig. 6. Steps of pre-processing and file carving
Information security and functional safety 209

Table 4
Advanced file carving techniques
Authors Direction Summary
Applying a set of support vector machines classifiers to determine data blocks
for the files of the following types: BMP, DOC, EXE, GIF, JPG, MP3, ODT,
Zanero [21] Identification
PDF, PPT (9 classes).
Average true positive rate – 90.4%, average false positive rate – 12.4%.
File fragment classification using a supervised learning approach based on sup-
port vector machines combined with the bag-of-words model (24 classes).
Fitzgerald et
Identification The best results were obtained for CSV, PS, GIF, SQL, HTML, JAVA, XML,
al. [22]
and BMP files (>90%). Fragments of PPTX, PPS, DOCX, XLSX, PPT, SWF,
JPG, ZIP, GZ, PDF, and TXT files – 2.3% to 31.8% of prediction accuracy.
Using support vector machines (N-gram vectors) to classify data blocks across
Beebe et al. 30 file types and 8 data types.
Identification
[23] Overall classification rate – 73.4%. High misclassification rate of encrypt, PPT,
ZIP, PPTX, GZIP, PNG, FLV, DOC, XLSX, PDF, DOCX, AVI, and BMP files.
Pan et al. A method to identify the AVI-type blocks based on their internal structure.
Identification
[24] False positive rate – 53% (2 classes).
File fragment classification (18 classes) using N-grams frequencies.
Wang et al. The average prediction accuracy is up to approximately 61%. Problems
Identification
[25] with classifying XLSX, PPTX, DOCX, GZ, PNG, PDF, PPT, and SWF
files.
Comparison of machine learning methods (Decision Trees, Support Vector
Machines, Neural Networks, Logistic Regression, k-Nearest Neighbor) for data
Karampidis
Identification block identification.
et al. [26]
Prediction accuracy – 89% to 100%. Only 4 different classes (JPG, PDF, PNG,
GIF).
Reconstructing graphic files by determining the image to which a fragment be-
Al-Sadi et al. longs. NaiveBayesMultinomialUpdateable, MultiClass, RandomForest, and
Reconstruction
[27] BayesNet classifiers are used to determine the similarity between pixel values.
The best results are 91% to 99.2% on average. Only graphic files.
File fragment classification using a hierarchical machine-learning-based ap-
proach with optimized support vector machines (SVM)
Bhatt et al.
Identification 14 classes – CSV, DOC, HTML, PDF, PPT, XML, XLS, TXT, GIF, JPG, PNG,
[28]
PS, SWF, and GZ. An average accuracy of 67.78%. PPT, PDF, DOC fragments
– the worst results.
Sportiello Construct SVM classifiers to determine the type of data block.
Identification
et al. [29] 8 classes – BMP, DOC, EXE, GIF, JPG, MP3, ODT, and PDF files.
512-byte and 4096-byte fragment type classification using convolutional neural
Mittal et al. networks with automatic feature extraction.
Identification
[30] 65.6% and 77.5% accuracy in the case of 75 classes. HEIC, MOV, 7Z, DMG,
ZIP, EXE, PPTX, DJVU, PDF, DOCX – quite low rates.
File type identification approaches using support vector machines and neural net-
Sester et al. works for n-gram analysis.
Identification
[31] 6 classes – CSV, DOC, JPG, PPT, TXT, and XLS. Approximately 73% to 98%
accuracy in different cases.
4096-byte fragment type classification using a deep convolution neural network.
16 classes – CSV, DOC, DOCX, GIF, GZ, HTML, JAVA, JPG, LOG, PDF,
Chen et al. PNG, PPT, RTF, TEXT, XLS, and XML.
Identification
[32] 70.9% accuracy. Low results – DOC, DOCX, GIF, JPG, PNG, and TEXT.
Represent all bytes of the data block as a grayscale image (automatic feature
extraction).
Using recurrent (RNN), convolutional (CNN), and feed-forward neural networks
(FNN) as classifiers of 512-byte data blocks
Hiester [33] Identification
4 classes: CSV, XML, JPG, and GIF.
Up to 98% accuracy in the best case (automatic feature extraction).
512-byte and 4096-byte fragment type classification using light-weight
Ghaleb et al.
Identification convolutional neural networks.
[34]
66.33% and 79.27% accuracy in the case of 75 classes.
210 ISSN 1814-4225 (print)
Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

Continuation of Table 1
Authors Direction Summary
A 512-byte fragment type classification technique that converts the byte stream
Liu et al. in a 2-D grayscale image and then captures both sequences by convolutional neu-
Identification
[35] ral networks.
71.4% accuracy in the case of 75 classes.
Using grayscale image conversion and convolutional neural networks to detect
Bharadwaj the compression algorithm of 4096-byte data block.
Identification
[36] 8 classes – rar, gzip, zip, 7-zip, bzip2, ncompress, lz4, and brotli. The achived
accuracy is 41 % after five epochs.
Using the feature generation model, Byte2Vec, for feature extraction from 4096-
Hague et al.
Identification byte fragments and k Nearest Neighbors for classification.
[37]
35 to 42 classes. An accuracy rate of 74%.
File type identification using feed-forward and convolutional neural networks.
Vulinovic et 18 classes – CSV, DOC, DOCX, GIF, GZ, HTML, JPG, PDF, PNG, PPT, PPTX,
Identification
al. [38] PS, RTF, SWF, TXT, XLS, XLSX, and XML.
Macro-average F1-score: FFNN – 79,93% to 81,38%, CNN – 61,55%.
Identification and restoration of damaged audio files using feed-forward and
Heo et al. Identification
Long Short Term Memory (LSTM) neural network.
[39] Restoring
High rates of identification of audio files.
Restoring fragmented and partially overwritten video files by video frame anal-
Identification yses.
Na et al. [40]
Restoring 40 to 50% of the video with damaged data (50% overwriting) was recovered.
Only MPEC-4 and H.264 video formats.
Recover damaged images with a lost header.
Amrouche et
Restoring 90% accuracy of image properties identification; 78% accuracy for header pre-
al. [41]
diction.
Alghafli et Identification Identification and recovery of video with lost video codecs specifications.
al. [42] Restoring Problems with fragmented files.
Using the byte frequency distribution and rate of change as features for building
a classifier based on SVM. Reassembling fragments of the same file type using
Qiu et al. Identification the PUP approach.
[43] Reconstruction The target file type is JPEG. Other file types are PNG, XML, HTML, PDF, GZ,
ZIP, Office, MP3, and TXT. Better results (40.9% to 85.7%) compared with Pho-
toRec.
Using SVM for high-entropy file fragment classification and Parallel Unique
Guo et al. Identification
Path algorithm for multimedia file reconstruction.
[44] Reconstruction
Only 3 types (DOC, JPEG, C++ source code) were studied.
JPEG carving framework using an extreme learning machine and evolutionary
Identification algorithms for data block identification, validation, and reassembling.
Ali et al. [45]
Reconstruction 90 to 93% accuracy. Problems with more than 2 fragmentation patterns or inter-
twined images.
Analysis of the textual contents of DOCX files in RAM and application of K-
Identification
mean and Hierarchical clustering techniques to recover documents’ texts.
Ali et al. [4] Clustering
54.35% to 90.54% of recovered documents. Possible problems with fragmented
Reconstruction
data blocks.
Identification Finding PDF fragments in RAM using their internal structure. K-Means and
Al-Sharif et
Clustering Hierarchical clustering to define different documents.
al. [46]
Restoring 46.34% to 50.24% of the PDF contents were carved (without file reconstruction).
Finding and reassembling SQLite databases using knowledge of their internal
Zhang et al. Identification
structure.
[47] Reconstruction
Time-consuming method.
Finding and reassembling PNG files using knowledge of their internal structure.
Hilgert et al. Identification Better results compared with PhotoRec, Scalpel, and Foremost.
[48] Reconstruction Problems with recovering files with missing fragments in the middle and/or the
peculiarities of dividing the file into data blocks.
Carving of highly fragmented JPEG files.
Tang et al.
Reconstruction The proposed framework can recover 97% of fragmented JPEG files.
[49]
Fragmentation points are detected using the coherence of Euclidean Distance.
Information security and functional safety 211

Continuation of Table 1
Authors Direction Summary
Carving fragmented text and some graphic files.
Ravi et al.
Reconstruction Only several graphic file types (JPG, PNG, GIF). TXT files – dictionary-based
[50]
approach.
Roussev et Presenting several file fragmentation techniques.
Identification
al. [51] The need to manually examine files and find specific features.
Lin et al. DOC files’ carving method based on internal structure.
Reconstruction
[52] Better results (95,45%) than PhotoRec, Foremost.
Carving fragmented JPEG files using knowledge about their internal structure.
Birmingham
Reconstruction Better results compared with Adroit, FTK 3.3, Scalpel, PhotoRec, ProDiscover,
et al. [53]
and Encase 6. Does not cover out-of-order fragmentation.
Reassembling orphaned JPEG fragments using PRNU fingerprints of the cam-
Durmus et al. Reconstruction
eras.
[54] Restoring
It can also partially collect photos. 42% to 57% fragment localization accuracy
Chang et al. JPEG fragment carving using pixel similarity.
Reconstruction
[55] Success rate – 92%.
Uzun et al. An Advanced Carver for JPEG Files.
Restoring
[56] Ability to recover JPEG files with damaged or lost headers.
Boiko et al. Reconstructing highly fragmented OOXML files.
Reconstruction
[57] Up to 83% recovered files. Problems with embedding in documents.
Hand et al.
Reconstruction Utility for recovering binary executable files using their internal structure.
[58]
Identification Identification and reassembly of EVTX Log fragments using their internal
Xu et al. [59]
Reconstruction structure.
Garfinkel
Reconstruction Fast object validation for bi-fragmented files (JPEG, DOC, and ZIP files).
[16]

5. Discussion In addition, the task is complicated by blocks of different

data types in the compound files. As seen in [30, 32 - 35],
As seen in Table 4, researchers have been quite suc- using neural networks with automatic feature extraction
cessful in applying advanced methods to improve the is a perspective direction in data identification.
mechanisms of deleted data recovery. The authors pay It should be noted that due to the wide variety of
the most attention to the problem of fragment type iden- data types, some authors achieved prospective results in
tification, the general principles of which are discussed research on the identification of specific file types, such
in [51]. This is relevant for the classification of data as AVI [24], audio [39], MPEG-4 and H.264 video for-
blocks that do not have clear markers. Many researchers mats [40], JPEG [45], PDF [46], SQLite [47], PNG [48],
use artificial intelligence methods for this purpose. Thus, EVTX [59], and even compression algorithms [36].
classifiers based on support vector machines with hand- These studies used advanced knowledge of the internal
crafted features have been used in previous stud- structure of these file types, which provided additional
ies [21 - 23, 28, 29, 31, 43, 44]. In these cases, the result benefits in detecting and identifying such data.
of the identification of data blocks depended, among After classifying fragments by data or file type, the
other things, on the correctness of the selection of classi- next logical step is to perform clustering of these data
fier features. In more recent studies [25, 30, 32 - 37], sup- blocks and file reconstruction. These tasks are closely in-
port vector machines, k Nearest Neighbors and various tertwined and sometimes solved comprehensively. The
types of neural networks with automatic feature extrac- case of bi-fragmented files is described in detail in [16].
tion were applied. The above approach removed the hu- The main problems appear with several file fragments
man factor in selecting features and showed its suitability and especially with the inconsistent placement of these
and high efficiency. Other works [26, 31, 33, 38] have data blocks.
made it possible to compare machine learning methods In general, file reconstruction approaches are based
with each other. These studies show that using different on knowledge of the file’s internal structure and/or con-
types of neural networks to identify data blocks yields tent. For example, because of the complex structure of
higher accuracy rates in most cases than other methods. graphic files, various methods exist for recovering them.
A comparison of the above methods showed that the In [43, 44], we used the Parallel Unique Path algorithm
achieved efficiency depends on the type of selected (PUP), highlighted in [60]. On the other hand, to recover
algorithm and the number of file types that were trained.
212 ISSN 1814-4225 (print)
Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

graphic files, researchers have successfully proposed de- fragmented files. As a result, many researchers have at-
termining the similarity between pixel values [27, 55], tempted to improve existing techniques and develop their
comparing pixel values on the fragment boundaries [50], own data recovery methods. The mentioned ontological
applying similarity metrics [45, 49], using PNG and scheme can be used as a roadmap for these purposes by
JPEG internal structure features [48, 56], analyzing digital forensics investigators.
PRNU fingerprints of the cameras [54], and utilizing both At the beginning of the study, we identified three
internal structure and content of JPEG files [53]. In addi- questions. The conclusions obtained from the analysis of
tion, the use of internal file structure for its recovery is the papers are summarized below.
possible with many types of compound files, such as Q1. What are the typical stages of file carving and
video [40], SQLite databases [47], DOC [52], OOXML what are the perspectives for improving each stage?
[57], BIN [58], and EVTX [59]. Instead, when recovering In general, in the case of data fragmentation, there
text documents, there is an additional option to use their is a tendency to divide the file carving process into stages
content. Therefore, in these cases, it is possible to use dic- to solve individual subtasks: 1) identification of data
tionary-based techniques [4, 46, 50]. blocks without explicit markers and 2) classification and
Noteworthy is the use of artificial intelligence tech- reconstruction of files or their contents.
niques to restore audio [39] and graphic files [41] with The first of these stages, the identification of data
damaged headers, as well as the use of a validator to re- blocks, is characterized by the widespread use of artifi-
construct video files with lost areas containing video co- cial intelligence techniques. Artificial intelligence mod-
dec specifications [42]. In these papers, the authors pro- els and methods have quite high efficiency. However,
posed methods that provide access to the internal con- most researchers focus on identifying a limited range of
tents of damaged files. As seen from the above data types. Therefore, a perspective direction is the de-
works [39, 41], artificial intelligence methods are a per- velopment of models and methods that can identify a
spective direction in restoring media data content. In gen- wide range of data block types and be self-learning. In
eral, this can be seen as a way to replace computationally addition, the analyzed techniques need to be improved to
complex algorithms. increase accuracy and prevent the loss of important data
The analyzed works show that no universal tool can blocks in case of misclassification.
simultaneously solve all problems in the search, identifi- The main problems of the following phases are the
cation, and reconstruction of file fragments. As can be difficulty clustering the detected data blocks, i.e., assign-
seen from Table 4, two tendencies are traced. In some ing a particular group of fragments to a specific file. Out-
cases (for instance, [23, 32, 33, 39]), researchers focus on of-order fragmentation has additional issues with the cor-
creating new approaches or improving existing methods rect assembly of the file. It can be concluded that there
for specific stages of file carving. This mainly refers to are no universal techniques at these stages, and all of
the data identification phase. Because of the use of artifi- them require a detailed analysis of the file types to be re-
cial intelligence at this stage, many approaches typically covered.
focus on identifying various file or data types, - up to 75 Q2. Is it possible to carve fragmented files without
[30, 34, 35]. In other words, there is a certain universality priori information about their internal structure and con-
in most cases. tents?
Another tendency is to use the peculiarities of the The universal methods used to identify data blocks
internal structure of certain file types or their contents in actually depend on the alphabet’s power of the classifi-
file carving (for example, [4, 43, 46, 48]). The methods cation analysis models. At the same time, the reconstruc-
proposed in these papers are developed for identifying, tion process of files depends on their internal structure
clustering, reconstructing, or restoring only files of spe- and/or contents. Therefore, each described method is ap-
cific types. Almost each of these approaches plied only to recover files of certain types. The only ex-
(e.g., [47, 50, 57]) requires first studying the internal ception in some cases may be approaches for recovering
structure of a file type or gaining access to certain parts bi-fragmented files.
of its contents. Therefore, they are usually not appropri- Q3. What are the perspectives on using artificial in-
ate for other file types. telligence methods in the field of file carving?
The role of artificial intelligence is not restricted to
Conclusions identifying data fragments. It is important to restore ac-
cess to file contents in cases of overwriting or damaging
This paper systematizes advanced file carving tech- some areas of files. Thus, artificial intelligence tech-
niques and presents an ontological scheme of file carv- niques are used to generate headers to restore the content
ing. Although file carving techniques are generally of damaged media files. In general, artificial intelligence
known and understandable, they have several disad- models and methods are a perspective approach to reduce
vantages when working with different types of complexity. Due to the universality of artificial
Information security and functional safety 213

intelligence, it is possible to use artificial intelligence 7. Alrobieh, Z. S., & Raqpan, A. M. A. A. File
techniques to develop carving methods independent of Carving Survey on Techniques, Tools and Areas of Use.
the internal structure and content of files. Transactions on Networks and Communications, 2020,
Limitations. This paper does not provide an over- vol. 8, no. 1, pp. 16–26. DOI: 10.14738/tnc.81.7636.
view of all available data recovery methods. Emphasis 8. Al-Jawry, Rabei., & Mohamad, Kamaruddin.,
Jamel, Sapiee., & Ahmad Khalid, Shamsul Kamal. A
was placed on methods of recovering fragmented files
review of digital forensics methods for JPEG file carving.
with lost or damaged metadata. In addition, the goal was Journal of Theoretical and Applied Information
not to study methods of minimizing the cost of resources Technology, 2018, vol. 96, no. 17, pp. 5841-5856.
and time, such as building a map of unused data [61]. Available at: http://www.jatit.org/volumes/Vol96No17/
Future research should focus on increasing the ac- 17Vol96No17.pdf (accessed 19.09.2023)
curacy and efficiency of the proposed methods and the 9. Rintu Aleyamma Thomas., & Mathai, M. A
resource and time economy. Improving artificial intelli- Survey on File Carving Process Using Foremost and
gence techniques for identifying blocks of data types will Scalpel. National Conference on Emerging Computer
allow the detection of a more complete set of fragments Applications (NCECA2021), Kerala, 2021, vol. 3, no. 1,
of target file types and minimize erroneously omitted pp. 70-72. DOI: 10.5281/ZENODO.5091663.
10. Ali, N. U. A., Iqbal, W., & Shafqat, N. Analysis
data. With regard to data reconstruction, due to the large
of Windows OS’s Fragmented File Carving Techniques:
variety of file types, the current issues are to improve ex- A Systematic Literature Review. 16th International
isting methods and develop new approaches. Conference on Information Technology-New
Generations (ITNG 2019). Springer International
Contribution of authors: conceptualization of the Publishing, 2019, pp. 63–67. DOI: 10.1007/978-3-030-
problem, supervision and revision – Viacheslav 14070-0_10.
Moskalenko; original draft preparation – Maksym 11. Sari, S. A., & Mohamad, K. M. A Review of
Boiko; visualization, review, and editing – Oksana Graph Theoretic and Weightage Techniques in File
Shovkoplias. Carving. Journal of Physics: Conference Series. IOP
All authors have read and agreed with the published Publishing, 2020, vol. 1529, no. 5. DOI: 10.1088/1742-
6596/1529/5/052011.
version of this manuscript.
12. Ramli, N. I. S., Hisham, S. I., & Razak, M. F. A.
Survey of File Carving Techniques. Innovative Systems
References for Intelligent Health Informatics (IRICT 2020). Lecture
Notes on Data Engineering and Communications Tech-
1. Carrier, B. File System Forensic analysis. nologies, Springer, 2021, vol 72, pp. 815–825. DOI:
Addison-Wesley Professional, 2005. 600 p. 10.1007/978-3-030-70713-2_74.
2. Bonetti, G., Viglione, M., Frossi, A., Maggi, F., 13. Alherbawi, N., Shukur, Z., & Sulaiman, R. A
& Zanero, S. Black-box forensic and antiforensic Survey on Data Carving in Digital Forensic. Asian
characteristics of solid-state drives. Journal of Computer Journal of Information Technology, 2016, vol. 15, no. 24,
Virology and Hacking Techniques, 2014, vol. 10, no. 4, pp. 5137-5144. Available at: http://docsdrive.com/pdfs/
pp. 255–271. DOI: 10.1007/s11416-014-0221-z. medwelljournals/ajit/2016/5137-5144.pdf (accessed
3. Ligh, M. H., Case, A., Levy, J., & Walters, 19.09.2023).
A. The Art of Memory Forensics: Detecting Malware 14. Kävrestad, J. Analyzing Data and Writing
and Threats in Windows, Linux, and Mac Memory 1st Reports. Fundamentals of Digital Forensics. Springer
Edition. John Wiley & Sons, 2014. 912 p. International Publishing, 2020, pp. 85–98. DOI:
4. Ali, N. U. A., Iqbal, W., & Afzal, H. Carving of 10.1007/978-3-030-38954-3_10.
the OOXML document from volatile memory using 15. Lin, X. File Carving. Introductory Computer
unsupervised learning techniques. Journal of Forensics. Springer International Publishing, 2018, pp.
Information Security and Applications, 2022, vol. 65, ar- 211–233. DOI: 10.1007/978-3-030-00581-8_9.
ticle no. 103096. DOI: 10.1016/j.jisa.2021.103096. 16. Garfinkel, S. L. Carving contiguous and
5. Darnowski, F., & Chojnaki, A. Selected fragmented files with fast object validation. Digital
methods of file carving and analysis of digital storage Investigation, 2007, vol. 4, pp. 2–12. DOI:
media in computer forensics. Teleinformatics Review, 10.1016/j.diin.2007.06.017.
2015, vol. 1-2, pp. 25–40. Available at: 17. Dubettier, A., Gernot, T., Giguet, E., &
https://yadda.icm.edu.pl/ baztech/element/bwmeta1.ele- Rosenberger, C. File type identification tools for digital
ment.baztech-10af3f4e-db53-4ae5-9b7f- investigations. Forensic Science International: Digital
b7e850dd08d0/c/Darnowski_F_Chojnacki_A.pdf (ac- Investigation, 2023, vol. 46, article no. 301574. DOI:
cessed 19.09.2023). 10.1016/j.fsidi.2023.301574.
6. Pahade, R. K., Singh, B., & Singh, U. A Survey 18. Alghafli, K., Jones, A., & Martin, T.
on Multimedia File Carving. International Journal of Investigating and measuring capabilities of the forensics
Computer Science & Engineering Survey (IJCSES), file carving techniques. Future Information Technology.
2015, vol. 6, no. 6, pp. 27–46. DOI: 10.5121/ijcses. Lecture Notes in Electrical Engineering, Springer, 2014,
2015.6603.
214 ISSN 1814-4225 (print)
Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

vol 276, pp. 329–336. DOI:10.1007/978-3-642-40861- 30. Mittal, G., Korus, P., & Memon, N. FiFTy:
8_47. Large-Scale File Fragment Type Identification Using
19. Kloet, S. J. J. Measuring and Improving the Convolutional Neural Networks. IEEE Transactions on
Quality of File Carving Methods. MSc thesis, Eindhoven Information Forensics and Security, 2021, vol. 16, pp.
University of Technology, Department of Mathematics 28–41. DOI: 10.1109/TIFS.2020.3004266.
and Computer Science, The Netherlands, 2007. 111 p. 31. Sester, J., Hayes, D., Scanlon, M., & Le-Khac,
Available at: https://research.tue.nl/files/46916835/ N. A. A comparative study of support vector machine and
635640 -1.pdf (accessed 25.06.2023) neural networks for file type identification using n-gram
20. Laurenson, T. Performance analysis of file analysis. Forensic Science International: Digital
carving tools. IFIP Advances in Information and Investigation, 2021, vol. 36, article no. 301121. DOI:
Communication Technology. Security and Privacy 10.1016/j.fsidi.2021.301121.
Protection in Information Processing Systems, 2013, vol. 32. Chen, Q., Liao, Q., Jiang, Z. L., Fang, J., Yiu,
405, pp. 419–433. DOI: 10.1007/978-3-642-39218-4_31. S., Xi, G., Li, R., Yi, Z., Wang, X., Hui, L. C. K., Liu, D.,
21. Zanero, S. File block classification by Support & Zhang, E. File fragment classification using grayscale
Vector Machines. 2011 Sixth International Conference image conversion and deep learning in digital forensics.
on Availability, Reliability and Security, Vienna, Austria, 2018 IEEE Security and Privacy Workshops (SPW), San
2011, pp. 307-312. DOI: 10.1109/ARES.2011.52. Francisco, CA, USA, 2018, pp. 140-147. DOI:
22. Fitzgerald, S., Mathews, G., Morris, C., & 10.1109/SPW.2018.00029.
Zhulyn, O. Using NLP techniques for file fragment 33. Hiester, L. File Fragment Classification Using
classification. Digital Investigation, 2012, vol. 9, Neural Networks with Lossless Representations.
pp.S44–S49. DOI: 10.1016/j.diin.2012.05.008. Bachelor Thesis, East Tennessee State University.
23. Beebe, N. L., Maddox, L. A., Liu, L., & Sun, M. Undergraduate Honors Theses, 2018, Paper 454, 36 p.
Sceadan: Using concatenated N-gram vectors for Available at: https://dc.etsu.edu/honors/454 (accessed
improved file and data type classification. IEEE 25.06.2023).
Transactions on Information Forensics and Security, 34. Ghaleb, M., Saaim, K., Felemban, M., Al-Saleh,
2013, vol. 8, no. 9, pp. 1519-1530. DOI: S. M., & Al-Mulhem, A. File Fragment Classification
10.1109/TIFS.2013.2274728. using Light-Weight Convolutional Neural Networks.
24. Pan, J., Liu, L., Sun, G., & Tang, Y. A method arXiv (Cornell University), 2023. DOI:
to identify the AVI-type blocks based on their four- 10.48550/arxiv.2305.00656.
character codes and C4.5 algorithm. 2014 International 35. Liu, W., Wang, Y., Wu, K., Yap, K., & Chau, L.
Conference on Behavioral, Economic, and Socio- A Byte Sequence is Worth an Image: CNN for File
Cultural Computing (BESC2014), Shanghai, China, Fragment Classification Using Bit Shift and n-Gram
2014, pp. 1-7. DOI: 10.1109/BESC.2014.7059521. Embeddings. arXiv (Cornell University), 2023. DOI:
25. Wang, F., Quach, T.-T., Wheeler, J., Aimone, J. 10.48550/arxiv.2304.06983.
B., & James, C. D. Sparse Coding for N-Gram Feature 36. Bharadwaj, S. Using convolutional neural
Extraction and Training for File Fragment Classification. networks to detect compression algorithms. arXiv
IEEE Transactions on Information Forensics and (Cornell University), 2021. DOI: 10.48550/arxiv.
Security, 2018, vol. 13, no. 10, pp. 2553-2562. DOI: 2111.09034.
10.1109/TIFS.2018.2823697. 37. Haque, E., & Tozal, M. E. Byte embeddings for
26. Karampidis, K., Kavallieratou, E., & file fragment classification. Future Generation
Papadourakis, G. Comparison of Classification Computer Systems, 2022, vol. 127, pp. 448–461. DOI:
Algorithms for File Type Detection A Digital Forensics 10.1016/j.future.2021.09.019.
Perspective. POLIBITS, 2017, vol. 56, pp. 15-20. Avail- 38. Vulinovic, K., Ivkovic, L., Petrovic, J., Skracic,
able at: https://api.semanticscholar.org/CorpusID: K., & Pale, P. Neural Networks for File Fragment
51882719 (accessed 25.06.2023). Classification. 2019 42nd International Convention on
27. Al-Sadi, A., Yahya, M. B., & Almulhem, A. Information and Communication Technology,
Identification of image fragments for file carving. World Electronics and Microelectronics (MIPRO), Opatija,
Congress on Internet Security (WorldCIS-2013), Croatia, 2019, pp. 1194-1198. DOI: 10.23919/mipro.
London, UK, 2013, pp. 151-155. DOI: 10.1109/ 2019.8756878.
WorldCIS.2013.6751037. 39. Heo, H.-S., So, B.-M., Yang, I.-H., Yoon, S.-H.,
28. Bhatt, M., Mishra, A., Kabir, M. W. U., Blake- & Yu, H.-J. Automated recovery of damaged audio files
Gatto, S. E., Rajendra, R., Hoque, M. T., & Ahmed, I. using deep neural networks. Digital Investigation, 2019,
Hierarchy-Based File Fragment Classification. Machine vol. 30, pp. 117-126. DOI: 10.1016/j.diin.2019.07.007.
Learning and Knowledge Extraction, 2020, vol. 2, no. 3, 40. Na, G. H., Shim, K. S., Moon, K. W., Kong, S.
pp. 216-232. DOI: 10.3390/make2030012. G., Kim, E. S., & Lee, J. Frame-based recovery of
29. Sportiello, L., & Zanero, S. Context-based file corrupted video files using video codec specifications.
block classification. IFIP Advances in Information and IEEE Transactions on Image Processing, 2014, vol. 23,
Communication Technology, 2012, vol 383, pp. 67-82. no. 2, pp. 517-526. DOI: 10.1109/TIP.2013.2285625.
DOI: 10.1007/978-3-642-33962-2_5. 41. Amrouche, S. C., & Salamani, D. Non-
parametric adaptative JPEG fragments carving. Tenth
Information security and functional safety 215

International Conference on Machine Vision, Vienna, Approaches to Digital Forensic Engineering, Berkeley,
Austria, 2017, article no. 106962D. DOI: CA, USA, 2009, pp. 3-14. DOI: 10.1109/SADFE.
10.1117/12.2310079. 2009.21.
42. Alghafli, K., & Martin, T. Identification and 52. Lin, W., & Xu, M. A Microsoft Word
recovery of video fragments for forensics file carving. documents carving method base on interior virtual
2016 11th International Conference for Internet streams. Advanced Materials Research, 2012, vols. 433–
Technology and Secured Transactions (ICITST), 440, pp. 3028-3032. DOI: 10.4028/www.scientific.
Barcelona, Spain, 2016, pp. 267-272. DOI: net/AMR.433-440.3028.
10.1109/ICITST.2016.7856710. 53. Birmingham, B., Farrugia, R. A., & Vella, M.
43. Qiu, W., Zhu, R., Guo, J., Tang, X., Liu, B., & Using thumbnail affinity for fragmentation point
Huang, Z. A new approach to multimedia files carving. detection of JPEG files. IEEE EUROCON 2017 -17th
2014 IEEE International Conference on Bioinformatics International Conference on Smart Technologies, Ohrid,
and Bioengineering, Boca Raton, FL, USA, 2014, pp. Macedonia, 2017, pp. 3-8. DOI: 10.1109/EUROCON.
105-110. DOI: 10.1109/BIBE.2014.31. 2017.8011068.
44. Guo, J., He, J., & Huang, N. Research of 54. Durmus, E., Korus, P., & Memon, N. Every
Multiple-type Files Carving Method Based on Entropy. Shred Helps: Assembling Evidence from Orphaned
Proceedings of the 2015 4th National Conference on JPEG Fragments. IEEE Transactions on Information Fo-
Electrical, Electronics and Computer Engineering, 2016, rensics and Security, 2019, vol. 14, no. 9, pp. 2372-2386.
pp. 521-528. DOI: 10.2991/nceece-15.2016.98. DOI: 10.1109/TIFS.2019.2897912.
45. Ali, R. R., & Mohamad, K. M. RX_myKarve 55. Chang, X., Wu, J., & Hao, F. JPEG fragment
carving framework for reassembling complex carving based on pixel similarity of MED-ED. 2019
fragmentations of JPEG images. Journal of King Saud Chinese Control Conference (CCC), Guangzhou, China,
University - Computer and Information Sciences, 2021, 2019, pp. 8862-8866. DOI: 10.23919/ChiCC.2019.
vol. 33, no. 1, pp. 21–32. DOI: 10.1016/J.JKSUCI. 8865161.
2018.12.007. 56. Uzun, E., & Sencar, H. T. JpgScraper : An
46. Al-Sharif, Z. A., Al-Khalee, A. Y., Al-Saleh, Advanced Carver for JPEG Files. IEEE Transactions on
M. I., & Al-Ayyoub, M. Carving and clustering files in Information Forensics and Security, 2020, vol. 15, pp.
RAM for memory forensics. Far East Journal of 1846-1857. DOI: 10.1109/TIFS.2019.2953382.
Electronics and Communications, 2018, vol. 18, no. 5, 57. Boiko, M., & Moskalenko, V. Syntactical
pp. 695 - 722. DOI: 10.17654/ec018050695. method for reconstructing highly fragmented OOXML
47. Zhang, L., Hao, S., & Zhang, Q. Recovering files. Radioelectronic and Computer Systems, 2023,
SQLite data from fragmented flash pages. Annals of no. 1, pp. 166–182. DOI: 10.32620/reks.2023.1.14.
Telecommunications, 2019, vol. 74, no. 7–8, pp. 451– 58. Hand, S., Lin, Z., Gu, G., & Thuraisingham, B.
460. DOI: 10.1007/s12243-019-00707-9. Bin-Carver: Automatic recovery of binary executable
48. Hilgert, J. N., Lambertz, M., Rybalka, M., & files. Digital Investigation, 2012, vol. 9, pp.S108–117.
Schell, R. Syntactical Carving of PNGs and Automated DOI: 10.1016/j.diin.2012.05.014.
Generation of Reproducible Datasets. Digital 59. Xu, M., Sun, J., Zheng, N., Qiao, T., Wu, Y.,
Investigation, 2019, vol. 29, pp. S22-S30. DOI: Shi, K., & Yang, T. A Novel File Carving Algorithm for
10.1016/j.diin.2019.04.014. EVTX Logs. Digital Forensics and Cyber Crime.
49. Tang, Y., Fang, J., Chow, K. P., Yiu, S. M., Xu, ICDF2C 2017, Prague, Czech Republic, 2017, vol. 216,
J., Feng, B., Li, Q., & Han, Q. Recovery of heavily pp. 97–105. DOI: 10.1007/978-3-319-73697-6_7.
fragmented JPEG files. Digital Investigation, 2016, vol. 60. Memon, N., & Pal, A. Automated reassembly of
18, pp. S108-S117. DOI: 10.1016/j.diin.2016.04.016. file fragmented images using greedy algorithms. IEEE
50. Ravi, A., Kumar, T. R., & Mathew, A. R. A Transactions on Image Processing, 2006, vol. 15, no. 2,
method for carving fragmented document and image pp. 385-393. DOI: 10.1109/tip.2005.863054.
files. 2016 International Conference on Advances in 61. Karresand, M., Warnqvist, A., Lindahl, D.,
Human Machine Interaction (HMI), Kodigehalli, India, Axelsson, S., & Dyrkolbotn, G. O. Creating a Map of
2016, pp. 1-6. DOI: 10.1109/HMI.2016.7449170. User Data in NTFS to Improve File Carving. Advances
51. Roussev, V., & Garfinkel, S. L. File fragment in Digital Forensics XV. 15th IFIP WG 11.9
classification - The case for specialized approaches. 2009 International Conference, Orlando, FL, USA, 2019, pp.
Fourth International IEEE Workshop on Systematic 133-158. DOI: 10.1007/978-3-030-28752-8_8.

Received 27.07.2023, Accepted 20.09.2023

УДОСКОНАЛЕНИЙ КАРВІНГ ФАЙЛІВ: ТАКСОНОМІЯ, МОДЕЛІ ТА МЕТОДИ

Максим Бойко, В’ячеслав Москаленко,
Оксана Шовкопляс
Техніки карвінгу файлів мають важливе значення у сфері цифрової криміналітики. При цьому бурхливе
зростання кількості і типів даних, обумовлює необхідність розвитку методів карвінгу файлів із точки зору
216 ISSN 1814-4225 (print)
Radioelectronic and Computer Systems, 2023, no. 3(107) ISSN 2663-2012 (online)

можливостей, точнісних характеристик та обчислювальної ефективності. Проте переважна більшість методів

розробляється для вирішення конкретних вузьких задач і опирається на певний набір припущень і апріорних
знань про файли, які потрібно відновити. Існує брак досліджень, що систематизують методи і структурують
підходи задля виявлення прогалин і визначення перспективних напрямків розвитку з урахуванням останніх
досягнень в галузі інформаційних технологій та штучного інтелекту. Предметом вивчення в статті є струк-
тура, фактори, критерії ефективності, методи та інструменти карвінгу файлів, а також поточний стан і тенде-
нції розвитку методів карвінгу. Метою є систематизація знань про сучасні методи карвінгу файлів та вияв-
лення перспективних напрямків розвитку. Завдання: виділити основні етапи карвінгу файлів і проаналізувати
підходи до їх реалізації; побудувати онтологічну схему карвінгу файлів; визначити перспективні напрямки
розвитку методів карвінгу файлів. Використовуваними методами є: літературний огляд, систематизація і уза-
гальнення. Отримано такі результати. Побудовано онтологічну схему концепції карвінгу файлів. Схема
включає в себе принципи, властивості, етапи, техніки, критерії оцінки, інструменти карвінгу файлів, а також
фактори, що впливають на процес. Наведено особливості, обмеження та області застосування методів відно-
влення даних. Встановлено, що досі широкорозповсюдженим підходом до реконструкції файлів є ручне дета-
льне вивчення внутрішньої структури файлів та/або їх вмісту, виявлення певних закономірностей, що дозво-
ляють відтворити у правильному порядку послідовність фрагментів даних. При цьому переважна більшість
методів не гарантує стовідсоткового результату. Проаналізовано поточний стан та перспективи використання
методів штучного інтелекту в сфері комп’ютерно-технічної експертизи, зокрема для ідентифікації блоків да-
них, кластеризації та реконструкції файлів, а також відтворення вмісту медіафайлів з пошкодженими або втра-
ченими заголовками. Визначено необхідність наявності апроіорної інформації про структуру або вміст файлів
для успішності карвінгу фрагментованих даних. Висновки. Наукова новизна отриманих результатів полягає
в наступному: вперше систематизовано і проаналізовано сучасні методи карвінгу файлів за напрямками роз-
витку і виявлено перспективність використання штучного інтелекту для ідентифікації блоків даних, класте-
ризації та відновлення вмісту файлів; вперше побудовано онтологічну схему карвінгу файлів, яка може бути
використана як дорожня карта під час розроблення нових перспективних систем у сфері комп’ютерно-техні-
чної експертизи.
Ключові слова: комп’ютерно-технічна експертиза; метадані; фрагментація; фрагментований файл; від-
новлення даних; карвінг файлів; ідентифікація фрагменту файлу; реконструкція файлу; відновлення файлу;
штучний інтелект.

Бойко Максим Володимирович – асп. каф. комп’ютерних наук, Сумський державний університет,
Суми, Україна; старший детектив, Управління аналітики та обробки інформації, Національне антикорупційне
бюро України, Київ, Україна.
Москаленко В’ячеслав Васильович – канд. техн. наук, доц., доц. каф. комп’ютерних наук, Сумський
державний університет, Суми, Україна; докторант каф. комп’ютерних систем, мереж та кібербезпеки,
Національний аерокосмічний університет ім. М. Є. Жуковського “Харківський авіаційний інститут”, Харків,
Україна.
Шовкопляс Оксана Анатоліївна – канд. фіз.-мат. наук, старш. викл. каф. комп’ютерних наук,
Сумський державний університет, Суми, Україна.

Maksym Boiko – PhD Student at Computer Sciences Department of Sumy State University, Sumy, Ukraine;
Senior Detective, Information Processing and Analysis Department, the National Anti-Corruption Bureau of Ukraine,
Kyiv, Ukraine,
e-mail: [email protected], ORCID: 0000-0003-0950-8399, Scopus Author ID: 58199360000.
Viacheslav Moskalenko – PhD, Associate Professor at Computer Science Department of Sumy State
University, Sumy, Ukraine; Doctoral Student at Department of Computer Systems, Networks and Cybersecurity,
National Aerospace University “KhAI”, Kharkiv, Ukraine,
e-mail: [email protected], ORCID: 0000-0001-6275-9803, Scopus Author ID: 57189099775.
Oksana Shovkoplias – PhD, Senior Lecturer at Computer Science Department of Sumy State University, Sumy,
Ukraine,
e-mail: [email protected], ORCID: 0000-0002-4596-2524, Scopus Author ID: 55647364100.

Diagnostic Questions: Managing Implementation and Ensuring Solution and Operations Reliability
No ratings yet
Diagnostic Questions: Managing Implementation and Ensuring Solution and Operations Reliability
11 pages
Base I Technical Specs - Vol 2
No ratings yet
Base I Technical Specs - Vol 2
314 pages
An Introduction To Digital Forensics
No ratings yet
An Introduction To Digital Forensics
11 pages
621 7 Troubleshooting
No ratings yet
621 7 Troubleshooting
42 pages
1 PB
No ratings yet
1 PB
12 pages
Advances in File Carving: Rob Zirnstein, President Forensic Innovations, Inc
No ratings yet
Advances in File Carving: Rob Zirnstein, President Forensic Innovations, Inc
26 pages
Advanced Data Recovery v2
No ratings yet
Advanced Data Recovery v2
19 pages
The Advanced Way of Data Recovery PDF
No ratings yet
The Advanced Way of Data Recovery PDF
7 pages
CF_6
No ratings yet
CF_6
24 pages
Data Carving
No ratings yet
Data Carving
10 pages
Paper 75-Data Recovery Comparative Analysis
No ratings yet
Paper 75-Data Recovery Comparative Analysis
8 pages
CSI Linux - Data Recovery and Data Carving
No ratings yet
CSI Linux - Data Recovery and Data Carving
13 pages
Analytical Forensic Investigation With Data Carving Tools
No ratings yet
Analytical Forensic Investigation With Data Carving Tools
12 pages
Data Carving Methods
No ratings yet
Data Carving Methods
6 pages
Investigating The Impact On Data Recovery in Computer Forensics
No ratings yet
Investigating The Impact On Data Recovery in Computer Forensics
6 pages
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Recovering Deleted and Wiped Files A Dig
No ratings yet
Recovering Deleted and Wiped Files A Dig
10 pages
Data Recovery Tomer
No ratings yet
Data Recovery Tomer
6 pages
Wiping Techniques and Anti-Forensics Methods
No ratings yet
Wiping Techniques and Anti-Forensics Methods
6 pages
AStudyonDigitalForensicTools Cameraready
No ratings yet
AStudyonDigitalForensicTools Cameraready
7 pages
Lab2_LamVinhHung_Ce181667
No ratings yet
Lab2_LamVinhHung_Ce181667
4 pages
Forensics II: File Carving
No ratings yet
Forensics II: File Carving
39 pages
2.-4.-Template-3
No ratings yet
2.-4.-Template-3
5 pages
The Evolution of File Carving
No ratings yet
The Evolution of File Carving
14 pages
Carving Contiguous and Fragmented Files With Object Validation
No ratings yet
Carving Contiguous and Fragmented Files With Object Validation
12 pages
Digital Forensics
No ratings yet
Digital Forensics
7 pages
Defining Digital Forensic Examination and Analysis Tools Using Abstraction Layers
No ratings yet
Defining Digital Forensic Examination and Analysis Tools Using Abstraction Layers
12 pages
Forensic_Recovery_of_File_System_Metadata_for_Digital_Forensic_Investigation
No ratings yet
Forensic_Recovery_of_File_System_Metadata_for_Digital_Forensic_Investigation
16 pages
On The Role of File System Metadata in Digital Forensics
No ratings yet
On The Role of File System Metadata in Digital Forensics
15 pages
File Reconstruction in Digital Forensic
No ratings yet
File Reconstruction in Digital Forensic
19 pages
Towards A Forensic-Aware File System: Ryan Q. Hankins and Jigang Liu, Member, IEEE
No ratings yet
Towards A Forensic-Aware File System: Ryan Q. Hankins and Jigang Liu, Member, IEEE
6 pages
ppt 7, 8 (4)
No ratings yet
ppt 7, 8 (4)
29 pages
Secuity System
No ratings yet
Secuity System
13 pages
Curator,+CISSE v01 I01 A03
No ratings yet
Curator,+CISSE v01 I01 A03
10 pages
Module 04 Digital Evidence and First Responder Procedure
No ratings yet
Module 04 Digital Evidence and First Responder Procedure
12 pages
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Carving of the OOXML document from volatile memory using unsupervised learning techniques
No ratings yet
Carving of the OOXML document from volatile memory using unsupervised learning techniques
14 pages
IJCT_Vol_15_Iss_1_Paper_5_3126_3131
No ratings yet
IJCT_Vol_15_Iss_1_Paper_5_3126_3131
7 pages
The Acquisition and Analysis of Random Access Memory: Pre-Publication
No ratings yet
The Acquisition and Analysis of Random Access Memory: Pre-Publication
11 pages
A Survey On Digital Evidence Collection and analysis-CAMERA READY
No ratings yet
A Survey On Digital Evidence Collection and analysis-CAMERA READY
7 pages
Data Recovery
No ratings yet
Data Recovery
7 pages
LO - 2 - Data Capture and Memory Forensics
No ratings yet
LO - 2 - Data Capture and Memory Forensics
51 pages
Digital Forensics Module 3
No ratings yet
Digital Forensics Module 3
24 pages
Chapter 6 DF Merged
No ratings yet
Chapter 6 DF Merged
332 pages
CaseRichard-memory-forensics-path
No ratings yet
CaseRichard-memory-forensics-path
11 pages
Taxonomy of Challenges For Digital Forensics
No ratings yet
Taxonomy of Challenges For Digital Forensics
28 pages
4.review of Live Forensic Analysis Techniques
No ratings yet
4.review of Live Forensic Analysis Techniques
10 pages
Counter-Forensic Tools Analysis and Data Recovery
No ratings yet
Counter-Forensic Tools Analysis and Data Recovery
22 pages
Data Formats
No ratings yet
Data Formats
89 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Module 3 Word file Part 3
No ratings yet
Module 3 Word file Part 3
25 pages
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Digital Forensics Research: The Next 10 Years: Simson L. Garfinkel
No ratings yet
Digital Forensics Research: The Next 10 Years: Simson L. Garfinkel
10 pages
Forensics LEC 2
No ratings yet
Forensics LEC 2
22 pages
Secure Data Sanitization
No ratings yet
Secure Data Sanitization
12 pages
ETI Unit3 Notes Msbte Store
No ratings yet
ETI Unit3 Notes Msbte Store
11 pages
cf1
No ratings yet
cf1
5 pages
Hide and Seek: Concealing and Recovering Hard Disk Data
No ratings yet
Hide and Seek: Concealing and Recovering Hard Disk Data
17 pages
brunty1
No ratings yet
brunty1
13 pages
200923080963157
No ratings yet
200923080963157
8 pages
A Suspect Oriented Intelligent Automated Computer Forensic Analysis
No ratings yet
A Suspect Oriented Intelligent Automated Computer Forensic Analysis
21 pages
Computer Forensics: Computer Forensics (Sometimes Known As Computer Forensic Science
No ratings yet
Computer Forensics: Computer Forensics (Sometimes Known As Computer Forensic Science
4 pages
Lecture-5&6 Data Acquisition
No ratings yet
Lecture-5&6 Data Acquisition
66 pages
Adobe Photoshop CC Classroom in - Andrew Faulkner
No ratings yet
Adobe Photoshop CC Classroom in - Andrew Faulkner
25 pages
RMI-Q Installation Guide
No ratings yet
RMI-Q Installation Guide
40 pages
Py Bom 13729140000069375
No ratings yet
Py Bom 13729140000069375
2 pages
Preprocessor Directives: / This Is A Multiline Comment. The Compiler Will Ignore It.
No ratings yet
Preprocessor Directives: / This Is A Multiline Comment. The Compiler Will Ignore It.
1 page
ATM-Error Codes (1) 20200213001116
No ratings yet
ATM-Error Codes (1) 20200213001116
1 page
Unit 1: Problem Solving With Computer (2 HRS.)
No ratings yet
Unit 1: Problem Solving With Computer (2 HRS.)
60 pages
Rencana Anggaran Beaya & Spesifikasi Teknis Pengadaan CCTV System Pada Perguruan Tinggi
No ratings yet
Rencana Anggaran Beaya & Spesifikasi Teknis Pengadaan CCTV System Pada Perguruan Tinggi
8 pages
Put A Date Picker Calendar On An Excel Worksheet
No ratings yet
Put A Date Picker Calendar On An Excel Worksheet
4 pages
RJ61BT11 (Application)
No ratings yet
RJ61BT11 (Application)
218 pages
Hipaa Awareness For Healthcare Providers Certificate For Hoda Tantawi
No ratings yet
Hipaa Awareness For Healthcare Providers Certificate For Hoda Tantawi
3 pages
SQL Python PowerBI Questions and Answers
No ratings yet
SQL Python PowerBI Questions and Answers
4 pages
BNI KK PAGATAN - Setting Google Chrome
No ratings yet
BNI KK PAGATAN - Setting Google Chrome
1 page
Fsuipc5 History
No ratings yet
Fsuipc5 History
9 pages
IAS Pre Fi
No ratings yet
IAS Pre Fi
4 pages
Curriculum Vitae: Major
No ratings yet
Curriculum Vitae: Major
5 pages
Alcatel Omni PCX Enterprise
No ratings yet
Alcatel Omni PCX Enterprise
3 pages
PDF Download Decameron Tarot English and Spanish Edition Textbook 210211144001
No ratings yet
PDF Download Decameron Tarot English and Spanish Edition Textbook 210211144001
34 pages
Arduino Programming Part 7: Flow Charts and Top-Down Design: Goals
No ratings yet
Arduino Programming Part 7: Flow Charts and Top-Down Design: Goals
8 pages
Cs8351-Digital Principles and System Design-1007732531-Dpsd Quest Bank
No ratings yet
Cs8351-Digital Principles and System Design-1007732531-Dpsd Quest Bank
14 pages
StarterGuide_-_RF-Dev
No ratings yet
StarterGuide_-_RF-Dev
47 pages
Functions IP Notes
No ratings yet
Functions IP Notes
13 pages
Analog IC Layout Services 1v2
No ratings yet
Analog IC Layout Services 1v2
14 pages
Net Connector
100% (4)
Net Connector
48 pages
Ansys All You Need To Know About Hardware For Simulation
No ratings yet
Ansys All You Need To Know About Hardware For Simulation
36 pages
5th Sem Main Exam + Test Exam Questions
No ratings yet
5th Sem Main Exam + Test Exam Questions
10 pages
Power BI - 3 in 1 - Beginne...
100% (3)
Power BI - 3 in 1 - Beginne...
381 pages
B UCSM CLI System Monitoring Guide 3 2 Chapter 010
No ratings yet
B UCSM CLI System Monitoring Guide 3 2 Chapter 010
4 pages

7

Uploaded by

7

Uploaded by

204 ISSN 1814-4225 (print)

UDC 004.63.056.3 doi: 10.32620/reks.2023.3.16

Maksym BOIKO1,2, Viacheslav MOSKALENKO1,3, Oksana SHOVKOPLIAS1

ADVANCED FILE CARVING: ONTOLOGY, MODELS AND METHODS

 Maksym Boiko, Viacheslav Moskalenko, Oksana Shovkoplias, 2023

3.2 Deleted file recovery

The complexity of the deleted data recovery process

The biggest problems arise when recovering deleted

Fig. 5. The ontological diagram of file carving

4. Advanced file carving techniques

5. Discussion In addition, the task is complicated by blocks of different

Received 27.07.2023, Accepted 20.09.2023

УДОСКОНАЛЕНИЙ КАРВІНГ ФАЙЛІВ: ТАКСОНОМІЯ, МОДЕЛІ ТА МЕТОДИ

можливостей, точнісних характеристик та обчислювальної ефективності. Проте переважна більшість методів

You might also like