File Carving
File Carving
1. Definition
2. File Systems
3. File Metadata
4. Data Hiding
5. Recovery Methods
6. File Signature
7. File Carving
Reading: Textbook – Chapters 5, 7, 9, 13
1
Data Carving Definition
• According to the Digital Forensic Research Workshop (DFRWS):
“Data carving is the process of extracting a collection of data
from a larger data set. Data carving techniques frequently occur
during a digital investigation when the unallocated file system space
is analyzed to extract files. The files are "carved" from the
未分配的 unallocated space using file type-specific header and footer values.
File system structures are not used during the process.”
• Done
• on a disk when the unallocated file system space is analyzed to extract files because
data cannot be identified due to missing of allocation info,
• or on network captures where files are "carved" from the dumped traffic
2
Data Carving Definition (cont.)
• Many powerful automated forensic analysis tools available for use.
• Unfortunately, there are no standard techniques for the tools to perform
common investigative tasks, such as recovering a deleted file.
• Having an in-depth understanding of some low level details about files
is essential to evaluate forensic tools and understand the carving
output
3
File Systems
• Structure for storing and organizing computer
files and the data they contain to make it easy to
access and find them
• Organize disk sectors (typically 512 bytes each)
into files and directories and keep track of which
sectors belong to which file (allocated) and
which are not being used (unallocated)
• Data is the content of files.
• Metadata tells how to work with the disk and the data.
■ Partition table 分区表
■ List of available sectors
■ Directory information
• Common file systems are FAT (File Allocation
Table) / NTFS (New Technology File System) on
Windows Systems and UFS/JFS on Unix Systems.
4
File Systems (cont.)
The EXT3FS File System
-Although Linux can use several different file systems, a typical
installation has an EXT3FS file system, which is based on the Berkeley
Fast File System (FFS).
– Block and Fragments
• Goal: to store file content
• A disk that is used in an x86 system is organized into 512-byte
sectors.
• An EXT3FS file system is organized into fragments, which are
consecutive sectors; 连续的
– a fragment can also be a single sector;
– each fragment is given an address
• A block is a group of consecutive fragments;
• The EXT3FS file systems use blocks and fragments that have the same
size, although that’s not the case for some other FS.
5
File Systems (cont.)
The EXT3FS File System (cont.)
• Example
6
File Systems (cont.)
The EXT3FS File System (ctd.)
• Inodes 索引结点
• Goal: to store meta-data information about files and directories,
• e.g., file size, user ID, group ID, time, and fragment information.
• Inodes structures are located in tables, and each inode has an address
7
File Systems (cont.)
The EXT3FS File System (ctd.)
• File Deletion
• When a file is deleted, corresponding structures (e.g., blocks, inode) are set
in an unallocated state so that they can be re-allocated to other files.
• Hence, we will be able to see the names of deleted files in a directory, but
we will not know the original content.
• By analyzing the unallocated inodes, we can see when the inode was
unallocated, but we will not know the original file name or the original
content.
8
File Systems (cont.)
Windows File Systems
• The volume is the basic building block of storage in a Windows system.
• Corresponds to a primary or logical partition on a physical disk.
• Contains either a file system or other data storage structure, e.g., database
• File systems supported by Windows:
• File Allocation Table (FAT)
• New Technologies File System (NTFS)
9
File Systems (cont.)
Windows File Systems – FAT
• FAT
– Has been around since the 1980s and is one of the simplest file systems
• It contains no security features
– The FAT is a table where each entry can point to the next cluster in a
file, the End Of File (EOF) marker as -1, and 0 if the cluster is not
used.
• There are 3 variations of FAT, according to the size: FAT12, FAT16, FAT32
– FAT32 has a 32-bit table entry
– FAT16 can address 2 to 16th power, or 65,536 clusters
– FAT12 can address 2 to the 12th power, or 4096 clusters.
10
File Systems (cont.)
Windows File Systems – FAT
Structure of a FAT Volume
File tyui.jpg:
- occupies clusters 2, 3, and 4.
- The file size is 1,400 bytes, it occupies 1,536 bytes (3
clusters) on the disk, and cluster 4 includes 136 bytes of
slack space.
File mes.doc:
- occupies clusters 5 and 6.
- The file size is 980 bytes, it occupies 1,024 bytes (2 clusters),
and has 44 bytes of slack space in cluster 6.
12
File Systems (cont.)
Windows File Systems (cont.)
• Deleting FAT Files
• system places deletion mark on the file
• deletion mark ⇒ first letter of the file name is replaced
with E5 (lower-case Greek letter σ)
• FAT entries of respective clusters are still unchanged!
• in DATA AREA, clusters still preserve the original data!
13
File Systems (cont.)
Windows File Systems (cont.)
Example: Deleting by sending to Recycle Bin
File Directory Table (FDT) before and after deletion of “test1.txt” file.
32 bytes
related to
test1.txt
32 bytes
related to
test1.txt
14
File Systems (cont.)
Windows File Systems (cont.)
Example: Deleting by sending to Recycle Bin (cont.)
File Allocation Table (FAT) before and after deletion of “test1.txt” file.
误解!
Common
misconception:
when we delete a
file, OS writes 0-
bytes over
corresponding
segments/clusters
on the disk.
15
File Systems (cont.)
Windows File Systems (cont.)
Example: Deleting by clearing from Recycle Bin
File Allocation Table (FAT) before and after clearing recycle bin
Only after
Recycle Bin is
emptied, FAT
clusters of the
deleted file are
set to ‘free’
(0x00).
16
File Systems (cont.)
Windows File Systems (ctd.)
• NTFS
• Microsoft more modern file system
• It is modular, flexible and much more complex than FAT
• Provides a great balance of performance, reliability and compatibility.
• Removes the size limitations of FAT, adds security features, and allows quicker
recovery from a crash.
• Uses clusters (like FAT) to store data
• The Master File Table (MFT) is the central structure in an NTFS file system.
• Meta-data file: contains information about files and directories.
• Each entry is given an address and contains several attributes, which store specific
information about the file or directory, e.g., $File_Name, $Data,
$Standard_Information
• For instance, $standard_Information attribute contains time stamps, file status
(hidden, read-only, archive) and link count (how many directories point to the file)
17
File Metadata
MFT List of possible attributes
• Defined in $AttrDef entry of MFT, but default is:
– 0x10 STANDARD_INFORMATION
– 0x20 $ATTRIBUTE_LIST
– 0x30 $FILE_NAME0
– 0x40 (NT) $VOLUME_VERSION (2K) $OBJECT_ID
– 0x50 $SECURITY_DESCRIPTOR
– 0x60 $VOLUME_NAME
– 0x70 $VOLUME_INFORMATION
– 0x80 $DATA
– 0x90 $INDEX_ROOT
– 0xA0 $INDEX_ALLOCATION
– 0xB0 $BITMAP
– 0xC0 (NT) $SYMBOLIC_LINK, (2K) $REPARSE_POINT
– 0xD0 $EA_INFORMATION
– 0xE0 $EA0xF0NT$PROPERTY_SET
– 0x100 (2K) $LOGGED_UTILITY_STREAM
18
File Metadata
Date-Time Stamps Significance
• File Created
• Usually shows when a file or folder was created
• When an existing file is copied, the File Created date-time stamp of
the new copy is set to the current time
• When a file is moved onto a different volume using the Windows
command line or drag-and-drop feature, the File Created date-time
stamp of the new copy is set to the current time
• When a file is moved onto a different volume using the Cut and
Paste menu options, the File Created date-time stamp remains
unchanged (the Last Accessed and Entry Modified date-time
stamps would most likely change).
• Modified
• Represents the last time the $DATA attribute of a file was altered.
19
File Metadata (ctd.)
Date-Time Stamps Significance
• Last Accessed
• Represents the most recent time a file or folder was accessed by the
file system.
• Does not necessarily indicate that a file was opened.
• SIA Modified
• Represents the last time any attribute in the MFT record for the file or
folder was modified.
20
Data Hiding
• Hidden Directories
• Basic method of hiding data that relies on the non-discovery
of the directory containing the data.
• Two main approaches:
• First approach: involves giving the directory a strange name that
may go unnoticed on file listings
• E.g. Using in Linux, the following directory names: "..." (three dots)
and ".. " (two dots and a space)
• Second approach: involves creating the directory in a part of the
system where it is least likely to be found by a system administrator.
• E.g. the /dev (in Linux) and Windows System directories are some of
the most frequently used locations to hide other directories
21
Data Hiding (cont.)
• Camouflaged Files 伪装文件
• Basic method of hiding data that relies only on the file name.
• By giving innocuous names to files containing forbidden data
无害的
• By changing file extensions, e.g., changing the extension of illegally
downloaded MP3 audio files to “.doc”.
22
Data Hiding (cont.)
• Deleting Files
• Most basic methods of hiding data is simply to delete the
file containing the data.
• In the file system the data concerned does not immediately
disappear from the hard disk.
• Instead, the file system merely marks the relevant area on
the hard disk as being available for use (depending on
several factors such as the amount)
23
Data Hiding (cont.)
• Slack Space
• During a low-level format, hard disks are divided
up into tracks and sectors so that operating
systems can later use these divisions to store
and find data.
• Most hard disks use a sector size of 512 bytes
• File system performs read and write operations on
the disk in groupings of sectors called blocks (or
clusters for the Windows OS).
• E.g. In ext2/ext3 file system, a block will invariably
be a grouping of either two, four or eight sectors -
in other words 1024, 2048 or 4096 bytes
• Although the effective space taken up by the file
when written to the hard disk will always be a
multiple of the block size, the actual space could
be less
• Slack space: is the area on the hard disk
between the end-of-file indicator and the final
block boundary
• it can be used to hide data, although the amount
of hidden data is limited to the file system's block
size.
24
Data Hiding (cont.)
• Alternate Data Streams (ADS) 备用数据流
• In NTFS the “DATA” attribute points to the data (i.e. file content), resident (located within the
MFT – Master File Table) or not
• the "DATA" attribute can point to more than one file.
• these additional files are called Alternate Data Streams.
• are like invisible attachments to a file
• their physical information is not included in the results of a DIR or Explorer window.
• Example: C:>notepad test.txt:alternate.txt
• Create a simple ADS, named alternate.txt
• Format: use the filename of the main data file, then add a colon (:), followed by the name of the ADS
• To access these ADS files later, all you need to do is use Notepad to edit the ADS just as you did when
originally creating it.
C:>notepad test.txt:alternate.txt
• Once you've created a text ADS, it is easy to add a binary data file as an ADS.
• would add the binary.file to the test.txt as an ADS
C:>type binary.file >> test.txt:binary.mds
25
Data Hiding (cont.)
• Steganography: The art of storing information in such a way
that the existence of the information is hidden.
• Example: To human eyes, data usually contains known forms, like images, e-mail,
sounds, and text. Most Internet data naturally includes gratuitous
headers, too. These are media exploited using new controversial logical
encodings: steganography and marking.
Hidden message
To human eyes, data usually contains known forms, like images, e-mail,
sounds, and text. Most Internet data naturally includes gratuitous
headers, too. These are media exploited using new controversial logical
encodings: steganography and marking.
Hidden message
26
Data Hiding (cont.)
• Watermarking: Hiding data within data
• Information can be hidden in almost any file format.
• File formats with more room for compression are best
• Image files (JPEG, GIF)
• Sound files (MP3, WAV)
• Video files (MPG, AVI)
• The hidden information may be encrypted, but not
necessarily
• Numerous software applications will do this for you: Many
are freely available online
• E.g., Steganos, S-Tools (GIF, JPEG), StegHide (WAV, BMP),
Invisible Secrets (JPEG), JPHide, Camouflage, Hiderman
27
Data Recovery Methods
• Recovery is largely a manual process using heuristics
• Variety of tools may help in different capacity, e.g.
• Forensics analysis tools, e.g., FTK Imager, Autopsy
• File carving tools, e.g., Scalpel, Foremost
• Steganography tools, e.g., QuickStego
• Hex Editors, e.g. HxD Hexeditor
• Alternate Data Streams Trackers, e.g., streams (from SysInternals).
28
File Signature
• Commonly, extensions in file names are used to help identify what
they contain (the file type)
• But the user of a computer can also manipulate files if necessary
• Magic number: more reliable way to recognize a file consists of
analyzing its structure rather than its extension
29
File Signature (cont.)
• Magic number: a constant used to
identify a file format Example: Executables have the header MZ (0x4D)
• Provides a simple way of
distinguishing between file formats
• Rely on the fact that every file has a
header and a footer in order to get
correctly recognized
• E.g.:
• a pdf file starts with “%PDF” and ends with
“%EOF”
• a jpeg image file begins with “0xFFD8” and
ends with “0xFFD9”.
• File signatures can be changed,
resulting in a fake file type recognition
30
File Signature (cont.)
Description Extension Magic Number
Bitmap graphic .bmp 42 4D [BM] More file signatures
JPEG graphic file .jpg FFD8 Can be found at:
JPEG 2000 graphic file .jpg2 0000000C6A5020200D0A [....jP..]
GIF graphic file .gif 47 49 46 38 [GIF89]
TIF graphic file .tif 49 49 [II]
http://www.garykessler.
net/library/file_sigs.html
PNG graphic file .png 89 50 4E 47 .PNG
WAV audio file .png 52 49 46 46 RIFF
https://asecuritysite.com/
ELF Linux EXE .png 7F 45 4C 46 .ELF
forensics/magic
AVI video file .avi 52 49 46 46 [RIFF]
MOV video file .mov 6D 6F 6F 76 [....moov]
PKZip .zip 50 4B 03 04 [PK]
GZip .gz 1F 8B 08
Tar file .tar 75 73 74 61 72
Executable file .exe 4D 5A [MZ]
PDF Document .pdf 25 50 44 46 [%PDF]
Word Document .doc D0 CF 11 E0 A1 B1 1A E1
Excel Document .xls D0 CF 11 E0 A1 B1 1A E1
PowerPoint Document .ppt D0 CF 11 E0 A1 B1 1A E1 31
File Carving
-Used to recover deleted or damaged data from storage devices, e.g., when
■ Directory entries are overwritten
■ Directory entries are damaged
■ File formats aren’t known
32
File Carving (cont.)
• File carving can be classified as basic and advanced
• Basic data carving:
• Assumes that :
• the beginning of file is not overwritten;
• the file is not fragmented;
• the file is not compressed;
• Basically this type of carving is made with header and footer
• Advanced data carving:
• occurs even to fragmented files, where fragments are:
• not sequential;
• out of order;
• missing;
• relies also on internal file’s structure
• New operating systems try to avoid fragmentation (unless necessary) in order to
speed writing and reading of files
• However, a malicious user might force file writing using fragmentation, in order to
make it unrecoverable when deleted
33
File Carving (cont.)
Basic File Carving
• Rely on header and footer analysis => does not consider the file’s content, which means that
sectors inserted, deleted or modified are not considered
• Operate by looking for file headers and/or footers, and then "carving out" the blocks between
these two boundaries.
• By using a database of headers and footers for specific file types, file carver can retrieve files from
raw disk images, even if the file system metadata has been destroyed.
one sector
header, footer,
0x474946e8e761 0x003B
(GIF) Carving searches for objects
(GIF)
based on content, rather
than on metadata. 34
File Carving (cont.)
Advanced File Carving
• Rely on “internal file structure”
• Not only, some files have the header or SOF (Start Of
File) but also the footer or EOF (End Of File);
• => Having deep knowledge of internal file’s structure
could result in less false positive
F
r
e
e
F
r
e
e
36
File Carving (cont.)
Example – Carving JPEG Files
Possible explanations:
1.This file may be fragmented.
2.The file may have been overwritten.
Commercial:
■ Adroit Photo Recovery — Amazing, but only works on JPEGs
■ EnCase - comes with some eScripts that will carve
■ DataLifter - File Extractor Pro
39