Forensic-implications-of-stacked-_2024_Forensic-Science-International-Digit
Forensic-implications-of-stacked-_2024_Forensic-Science-International-Digit
DFRWS EU 2024 - Selected Papers from the 11th Annual Digital Forensics Research Conference Europe
A R T I C L E I N F O A B S T R A C T
Keywords: While file system analysis is a cornerstone of forensic investigations and has been extensively studied, certain file
Storage forensics system classes have not yet been thoroughly examined from a forensic perspective. Stacked file systems, which
File systems use an underlying file system for data storage instead of a volume, are a prominent example. With the growth of
Stacked file systems
cloud infrastructure and big data, it is increasingly likely that investigators will encounter distributed stacked file
Distributed file systems
moosefs
systems, such as MooseFS and the Hadoop File System, that employ this architecture. However, current standard
Glusterfs models and tools for file system analysis fall short of addressing the complexities of stacked file systems. This
Ecryptfs paper highlights the forensic challenges and implications associated with stacked file systems, discussing their
unique characteristics in the context of forensic analyses. We provide insights through three analyses of different
stacked file systems, illustrating their operational details and emphasizing the necessity of understanding this file
system category during forensic investigations. For this purpose, we present general considerations that must be
made when dealing with the analysis of stacked file systems.
* Corresponding author.
E-mail addresses: [email protected] (J.-N. Hilgert), [email protected] (M. Lambertz), [email protected]
(D. Baier).
https://doi.org/10.1016/j.fsidi.2023.301678
2666-2817/© 2024 The Author(s). Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
specialized file format. We denote the stacked file system as the upper file reconstructing distributed data, they do not address the underlying file
system and its files as the upper files, which are the files accessible when system used by HDFS. Another analysis of a distributed stacked file
the file system is mounted. The underlying file system it relies on is system was performed in (Martini and Choo, 2014). During their anal
termed the lower file system storing the lower files, as depicted in Fig. 1. ysis of XtreemFS, the authors also focused on the Object Storage Devices
In instances where the upper and lower file systems operate on the storing the lower file systems including its identification. However, their
same machine, the stacked file system is termed as local. Nevertheless, work falls short in providing a detailed discussion of general implica
an upper file can encompass multiple lower files, potentially distributed tions of the underlying concept of stacked file systems. Furthermore, the
across various detached lower file systems. Given this, the concept of additional value of an analysis of the underlying file system is not
stacked file systems is frequently employed within distributed stacked file examined.
systems like the Hadoop Distributed File System or MooseFS, as they can
be constructed atop a pre-existing and reliable lower file system.
2.2. Forensic analysis of stacked file systems
Furthermore, distributed stacked file systems can be categorized as
either managed or unmanaged. In a managed setup, a designated entity
In addition to the research gap, it is important to note that this
like a main daemon can be used to orchestrate tasks such as data dis
deficiency extends to forensic tools as well. Current tools, like The Sleuth
tribution and managing the metadata of the upper file system.
Kit, are equipped to analyze various lower file system types but lack the
Conversely, in an unmanaged configuration, the systems housing the
functionality to associate them with any existing upper file system. This
lower file systems inherently possess all the requisite data to construct
limitation underscores the need for an updated standard workflow for
the upper file system. Both of these types can be encountered during
file system forensic analysis, as depicted in Fig. 2, to effectively handle
forensic investigations due to the increasing usage of distributed storage
the complexities of stacked file systems.
in cloud environments. Hence, comprehending the forensic implications
In the revised workflow, the initial steps shown in white remain, but
and nuances of stacked file system analysis is crucial.
we introduce an additional phase, highlighted in purple, specifically
dedicated to the analysis of stacked file systems, building upon the re
2.1. Related work sults from the prior analysis of the lower file system. This emphasizes
that the detection and analysis of traditional file systems continue to be
A detailed concept of stacking file system layers was already pre the foundational elements of the process. However, these steps may now
sented in 1994 (Heidemann and Popek, 1994). However, this work fo yield multiple lower files or metadata files associated with stacked file
cuses on file system development and describes stacking as a method to systems, requiring thorough examination in the newly added step to
leverage already existing file systems facilitating the development pro ensure a comprehensive forensic analysis. Crucially, the results from the
cess of new file systems and features. A few years later, Erez Zadok stacked file system analysis must also be correlated with information
utilized the concept of stacked file systems to implement a wrapper file derived from the lower file system analysis and vice versa.
system called Wrapfs (Zadok, 1999). While it still stores its data on a The remainder of this paper deals with the additional step of stacked
lower file system, Wrapfs can be used to create arbitrary upper file file system forensics and integration into forensic investigations. In
systems, for example to provide encryption or prevent deletions of files. particular, we look at six specifics, we believe are essential for file sys
In 2007, Zadok together with others discussed various issues of stacked tem analysis: 1) Identification of Stacked File Systems, 2) Correlation of
file systems within Linux, such as cache coherency between the upper File Names, 3) Data Reconstruction, 4) Timestamps and their Update
and lower file system (Sipek et al., 2007). Furthermore, file systems for Behavior, 5) Slack Space and 6) Possibilites for File Recovery.
secure deletion and tracing of file interactions based on the concept of
stacked file systems have been proposed (Bhat and Quadri, 2012; Aranya 2.3. Experimental setup
et al., 2004).
While all of the aforementioned research does focus on stacked file To derive the most comprehensive guidance possible, it is crucial to
systems, it does not cover them from a forensic point of view. Still, include a diverse range of stacked file systems in the experiments.
limited research on the forensic analysis of distributed stacked file sys
tems has been published (Asim et al., 2019; Harshany et al., 2020). take
a closer look at the Hadoop Distributed File System. While their work
yields interesting results, such as analyzing various commands and
Fig. 1. Overview of a stacked file system utilizing a traditional lower file sys Fig. 2. Extension of Brian Carrier’s model for the applicability of stacked
tem for data storage. file systems.
2
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
Accordingly, three distinct stacked file systems, previously overlooked lower file system. Instead, the hierarchy of the files and directories of the
in research, were selected as representative examples: upper file system are stored in an identical way on the lower file system.
MooseFS released in 2008, is an open-source, managed, distributed If file name encryption is enabled, the distinct prefix ECRYPTFS_FNE
stacked file system designed for big data storage. Its architecture includes K_ENCRYPTED defined in the Linux kernel source is used for each lower
Chunk Servers that store data, a Master Server managing metadata, file. Lower files in eCryptfs contain magic markers stored in a special
Metaloggers for metadata backup, and a client interface for mounting header. These markers can be detected by performing an XOR operation
the file system. In MooseFS, large files are split into smaller chunks on bytes 9–12 of the file at hand using the magic 0x3c81b7f5. The
distributed across multiple servers. resulting 8 bytes should match bytes 13–16 in case of an eCryptfs lower
GlusterFS is an unmanaged, distributed stacked file system thatdiffers file.
from MooseFS in that it lacks a dedicated master server. Instead, its
storage servers form a trusted pool by connecting directly to each other. It 3.2. Key takeaways
supports any file system as a brick, the lower file system for storing data.
These bricks are combined to create a volume, which is subsequently Depending on the stacked file system at hand various types of in
mounted by a client. dicators resulting from the analysis of the lower file system can be used
eCryptfs introduced in 2005 as a cryptographic file system to for its identification. This includes distinct hierarchies, file structures as
operate on top of an existing file system (Halcrow and ecryptfs, 2005), well as certain extended attributes. Furthermore, the internal structure
was integrated into the Linux kernel in version 2.6.19. Although su of lower files can be used to identify them directly, for example in cases
perseded by other mechanisms such as LUKS, eCryptfs remains a notable in which they are included in a backup outside of the lower file system.
early example of stacked file systems. It functions as a local stacked file Once identified, investigators can mount the stacked file system
system, not used in a distributed manner, and is mounted by specifying a using its native software or conduct an in-depth forensic examination.
source directory from the lower file system to store its data. However the current shortfall in forensic tools specifically designed for
This variety ensures a thorough exploration of the potential scenarios stacked file system analysis necessitates manual reconstruction of the
forensic experts may encounter. For our experiments, the stacked file system under investigation at the moment.
systems were setup, mounted and populated with arbitrary data. Spe
cifics of each experiment are presented in the corresponding section. As
4. Correlation of file names
a lower file system during the experiments, we utilized Ext4 due to its
widespread use and to keep the results comparable. Drawing on these
For a more comprehensive analysis and deeper understanding, it is
findings, the following sections also outline practical key takeaways to
essential to establish the relation between the names of upper files and
aid forensic investigators in their work with stacked file systems.
the corresponding lower files that represent them. During this experi
ment, we analyzed if and how this connection could be determined.
3. Identification of stacked file systems
Furthermore, we examined how the entire hierarchical structure of the
upper file system is reflected within the lower file system.
As described in Section 2.3, it is crucial to identify a stacked file
system following the analysis of the lower file system. During these
experiments, the lower file systems were analyzed for any indicators 4.1. Findings
hinting at the usage of a stacked file system.
4.1.1. MooseFS
3.1. Findings In MooseFS, neither the header, the file name or any other metadata
of a lower file contain any reference to the original upper file. In order to
3.1.1. MooseFS obtain this relation and thus also the file name, it is necessary to analyze
As soon as a file system is being used as part of a chunk server in information stored on the Master Server. By default, chunk servers use
MooseFS, a distinct hierarchy of directories from 00 up to FF is created on the DNS name mfsmaster to connect to the Master Server. However,
it. These directories are used to store the chunks, which in turn utilize a file this can be configured within the chunk server’s configuration stored
name pattern like in/etc/mfs/mfschunkserver.cfg. The Master Server stores its
consisting of an identifier, the chunk ID, a corresponding and the metadata files within the directory/var/lib/mfs, including meta
. Lower files in MooseFS can also be identified by their in data.mfs.back. This metadata file can be extracted and subsequently
ternal structure that can be inferred by taking a look at the open source code inspected or analyzed using the mfsmetadump tool. During our exper
of the file system. In the default MooseFS installation, i.e. not the light iments, recent MooseFS updates were not instantly reflected in the
version, each chunk begins with a 0x2000 bytes long header. It starts with metadata file, requiring a Master Server restart to save these changes.
either a signature of MFSC 1.0 or MFSC 1.1, followed by eight and Fig. 3 illustrates the mfsmetadump utility output. In MooseFS, file
respectively four bytes representing the chunk ID and version, both of names are stored as EDGEs in the filesystem tree, found in the EDGE
which can also found within the chunk’s file name. section. Each line represents a file, detailing the parent inode, child
inode, and file’s name. The child inode number can be used to link an
3.1.2. GlusterFS
A similar behavior can be observed on servers of a GlusterFS pool,
when a volume is created and started. This includes a hidden.glus
terfs directory storing directories named 00 up to ff. Each upper file
in GlusterFS is assigned a UUID referred to as the GlusterFS internal file
identifier (GFID). This GFID names each lower file inside the hidden
hierarchy. GlusterFS also mirrors the upper system’s structure in the
lower system using hard links as depicted in Fig. 4. While GlusterFS does
not make use of any specific internal structure within its lower files, it
uses extended attributes to store meta information about its files.
3.1.3. eCryptfs Fig. 3. Excerpt of the mfsmetadump tool displaying metadata of a MooseFS
eCryptfs on the other hand does not create a unique hierarchy on the file system.
3
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
entry to a corresponding NODE section entry, which represents an upper 5. Data reconstruction
file.
For the reconstruction of upper files from their corresponding lower
4.1.2. GlusterFS files, analysts need to tackle common problems such as fragmentation
Fig. 4, the lower files 1847be7c-7a84-4c41-932b- and data transformation.
5e0740c5e809 and data.txt share the same inode number. Addi
tionally, GlusterFS also utilizes extended attributes and soft links, which
can be analyzed to infer the hierarchy. The extended attributes of the 5.1. Fragmentation
lower file contain a reference including the original file’s name as well as
the GFID of the directory, in which it was stored. The lower file In most cases, the content of a file does not fit into a single data unit,
belonging to this directory is again a soft link pointing to its own parent which is why file systems allocate multiple data units. While different
directory and so on. allocation strategies may be used, it often results in file fragmentation.
Thus, traditional file systems need to keep track of the exact data units
4.1.3. eCryptfs used by a file as well as the order in which they belong. This fragmen
If the eCryptfs file system is mounted files within the upper file tation not only complicates forensic efforts but has also been a long-
system can be matched to the files in the lower file system by comparing standing focal point of research (Garfinkel, 2007; van der Meer et al.,
the corresponding inode numbers. This is already implemented in the 2021). Yet, the topic of fragmentation in stacked file systems has not
ecryptfs-find utility. By default, the file names of the lower files are been explored. To address this, we have created multiple large-sized
identical to the file names of the corresponding upper files. In case of file upper files, aiming to analyze and understand the fragmentation pat
name encryption, eCryptfs utilizes a file name encryption key (FNEK), terns in the stacked file systems under study.
which is required to reveal the original file name of the lower file at
hand. However, eCryptfs stores a hex signature of the utilized FNEK 5.1.1. Findings
within all of the encrypted files names. For this reason, it is still possible
to infer, which lower files were encrypted using the same FNEK and thus 5.1.1.1. MooseFS. Our experiments demonstrated that MooseFS splits
probably belonged to the same mounted file system. The signature of the files larger than 64 MiB into multiple lower files, irrespective of the
FNEK is encoded within the FNEK-encrypted file name, also referred to number of chunk servers. Although mountable with a single chunk
as a Tag 70 packet and follows the packet type 0x46 and the length of server, MooseFS ideally operates with multiple, and it is advised to use
the packet. By decoding the file name it is possible to extract the at least three, as done in our experiments. By default, each chunk is
signature of the FNEK used, which can be used for further analyses. replicated onto two of the three available chunk servers. Consequently,
large files, fragmented into multiple chunks, may be distributed across
4.2. Key takeaways all chunk servers within the MooseFS file system. Since information
within the chunks themselves did not suffice to reassemble an upper file,
Our experiments indicate that in local or unmanaged distributed it is necessary to consult the Master Server metadata to efficiently
stacked file systems, it is generally possible to deduce the original file assemble fragmented upper files. As depicted in Fig. 3, the NODE section
names and file system hierarchy. This is to be expected as for these stores a list of chunks composing the upper file, each identified by a
kinds, the corresponding metadata can be found within the lower file unique ID, which is also reflected in the chunk name on the lower file
system. In contrast, with stacked file systems that incorporate a man systems. Since it is unique to each chunk, it can also be used to identify
agement component, e.g. a dedicated server, it becomes vital to identify replicas of chunks across multiple chunk servers.
and extract the metadata that holds this information. Our findings
demonstrate how this analysis can be executed for stacked file systems 5.1.1.2. GlusterFS. Depending on the type of volume used, fragmenta
like MooseFS, enabling the determination of the relationship between tion as well as replicas of lower files can be encountered. The most
upper and lower files. However, this task varies significantly depending important volume types are:
on the specifics of the stacked file system, necessitating customized
implementations within forensic tools. • Distributed: In this default mode, upper files are not fragmented,
but stored randomly across all available bricks, i.e. all available
lower file systems.
• Replica: This mode is used to ensure redundancy by storing
unfragmented upper files across multiple bricks similar to RAID
mirrors. The corresponding lower files stored across multiple lower
file systems can be correlated by their GFID file name.
• Dispersed: A dispersed volume can be compared to a RAID5-like
volume. Data is split and stored across multiple lower file systems
along with parity information. Again, fragments belonging together
can be matched by their GFID. When creating a dispersed volume, it
is possible to configure the number of bricks used for redundancy, i.e.
how many bricks can be lost without causing any data loss.
The first two modes do not cause any fragmentation of upper files.
However, the replicated mode causes multiple copies of the same files to
be stored on multiple lower file systems, i.e. storage servers. To identify
these lower file systems for further analysis, the configuration of the
GlusterFS at hand can be utilized. A directory for each volume of a
storage server can be found in its/var/lib/glusterd/vols/direc
tory. It stores the volume configuration in an info file and consists of a
bricks subdirectory that offers configurations for each associated
Fig. 4. Example of a hierarchy on a lower file system in GlusterFS. brick. These files appear across all storage servers in the GlusterFS pool
4
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
that created the volume, detailing the server’s hostname and brick path. Additionally, the cryptographic context for each file is stored in a header
Notably, the values listen-port and brick-fsid only seem to exist preceding the encrypted data. The minimum size for this header is
in the brick configuration of the respective server. This also allows for defined as 8192 bytes, resulting in slightly larger files on the lower file
pinpointing the exact GlusterFS server at hand. system compared to the original stacked file system. Though 8192 bytes
When dealing with dispersed volumes, upper files become frag is only the minimum size, we did not encounter any larger header sizes
mented within GlusterFS. An efficient way to identify a lower file of a in our experiments including files up to 1 GB. The size of the original file
dispersed volume is to analyze its extended attributes. Each chunk is not encrypted and can be found in bytes 0–7 of the header, which
belonging to a dispersed file utilizes extended attributes in the trus starts directly at offset 0 of the lower file. Further information within the
ted.ec name space, e.g. trusted.ec.size which stores the real size header includes the version as well as the encrypted session key used for
of the corresponding file. However, they do not contain any information the encryption of the file’s content.
about the order in which they should be reassembled. Furthermore,
GlusterFS uses Erasure coding for dispersed volumes, which requires an 5.2.2. Key takeaways
additional step to obtain the original version of the file described in the Depending on the stacked file system at hand, practitioners can
next section. benefit from the absence of a transformation layer during their analysis.
This enables them to analyze a the files of a lower file system without the
5.1.1.3. eCryptfs. In eCryptfs upper files are not split into multiple need to perform an analysis of the upper stacked file system. However, in
lower files and thus no fragmentation occurs. certain cases, when encryption or error encoding is utilized, it is
required to retranslate the content of lower files to obtain the original
5.1.2. Key takeaways file content. We have illustrated various considerations that have to be
Though our findings illustrated that fragmentation may not be as made when using native software to perform this task for GlusterFS. Yet
complex as with traditional file systems, it still has to be considered again, this strongly depends on the features of the stacked file system at
especially when dealing with distributed stacked file systems spanning hand.
across multiple lower file systems. In these cases, it is crucial to identify,
which other servers were part of the stacked file system at hand in order 6. Timestamps and their update behavior
to adequately extend the acquisition process.
Furthermore, our experiments showed that the upper file system’s In traditional file systems, timestamps are stored along the metadata
metadata plays a crucial role for an efficient reassembly of fragmented of the files and include information about the last Access, Modification,
files highlighting the importance of dedicated approaches for stacked Change and in some cases Birth time of a file. The intricacies and
file system analysis. In absence of this information, correlating lower challenges of interpreting timestamps are well-recognized within the
files via their timestamps is an alternative though less reliable due to digital forensic community (Raghavan, 2013).
discrepancies as discussed in Section 6. Naturally, these timestamps retain their critical importance in the
context of stacked file systems. However, we encounter an additional
layer of timestamp sources:
5.2. Transformation
• Upper file system: Timestamps of the upper file system refer to the
Unlike early file systems, more advanced ones like APFS or ZFS upper files and consist of one set of timestamps per upper file. The
started to implement features such as encryption or compression. This way this meta information is stored is completely specific to the
resulted in some kind of transformation between a file’s original content upper file system itself.
and the content stored on disk. A similar concept may also be employed • Lower file system: Employing an additional file system to store file
by stacked file systems for various purposes like encryption or the uti content introduces an extra layer of timestamps stored along the
lization of erasure coding, when data is distributed. For this analysis, we lower files in the lower file systems.
compared the content of lower files to the original content stored within
the upper files of the stacked file system. Moreover, it is equally important to grasp the timestamp update
behavior within both the upper and lower file systems as well as how
5.2.1. Findings they affect each other. In our experiments, we conducted fundamental
file operations like creating and modifying files to examine how the
5.2.1.1. MooseFS. Besides the inclusion of an extra 0x2000 byte chunk stacked file systems in question update timestamps. This investigation
header, the open-source MooseFS leaves the original data unaltered. encompassed both the lower and upper file systems, with a particular
focus on understanding how timestamps in the latter could be accurately
5.2.1.2. GlusterFS. In distributed and replicated volumes, GlusterFS retrieved. We kept a multi-server configuration for the distributed file
leaves the original content in lower files unchanged as well. However, systems to observer the timestamp update behavior across multiple
for dispersed volumes, it employs a Reed-Solomon based Erasure coding. lower file systems.
For an efficient recovery of dispersed files, we recreated the relevant
GlusterFS setup to tackle the fragmentation as well as transformation 6.1. Findings
hurdle. After the original GlusterFS configuration is identified as
described in the previous section, it is possible to recreate a new Glus The initial part of this section details the findings on timestamp
terFS volume using identical parameters. Afterwards, the obtained lower sources, while the subsequent sections explore the timestamp update
file systems can be copied to the freshly created GlusterFS bricks. It is behavior of the corresponding file system.
essential to preserve extended attributes; failure to do so will lead
GlusterFS to misidentify dispersed files. Additionally, the sequence of 6.1.1. Timestamp sources
declaring bricks is crucial, i.e. the original first lower file system’s data MooseFS keeps track of the timestamps for all of its upper files within
should populate the first brick in the new volume and so on. Any the metadata that can be found on the Master Server or Metaloggers.
inconsistency led to reconstruction failure in GlusterFS in our tests. This information can be extracted by using the mfsmetadump utility as
shown previously in Fig. 3. In GlusterFS, this information is not stored in
5.2.1.3. eCryptfs. Since eCryptfs’s main feature is encryption, file con an external file, but directly within the extended trusted.glus
tents found on the lower file system are naturally encrypted. terfs.mdata attribute of the corresponding lower files across all
5
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
bricks. Any change to the upper file’s timestamps inevitably results in an Reading all files from the client (with caching off and default
update of the metadata of the lower files. The actual timestamps can be MooseFS settings) resulted in the updating of all 10,000 Access time
extracted from the decoded Base64 string stored within the extended stamps in the lower file system. Yet, in setups with fewer upper files,
attribute. It holds 8 byte timestamps in seconds followed by the time Access timestamps of chunks only updated during daemon startup, not
stamp for nanoseconds following the big-endian format as shown in during later client requests. The cause for this disparity is still unclear
Fig. 5 eCryptfs relies solely on the timestamps already present in the and requires further research. Given these complexities, interpreting
lower file system, without storing any additional timestamp Access timestamps on MooseFS’s lower file systems demands caution.
information. On the other hand, Modification timestamps were consistently ac
curate and updated as anticipated, which is especially relevant for large
6.1.2. Update behavior for MooseFS upper files generating multiple chunks.
When a file smaller than the maximum chunk size is created, two Large files: In MooseFS, files exceeding the maximum chunk size are
identical chunk copies are made on two out of three chunk servers. divided into multiple chunks. When such a large file is created, the
Although MooseFS sets the Modification and Change timestamps of the timestamps in MooseFS and the underlying file system are set in the
upper file identically, the Access timestamp appeared slightly earlier in manner previously detailed. Thus, a 200 MB upper file results in eight
our tests. This pattern was also seen in timestamps of the corresponding (four distinct but replicated) lower files, each with unique timestamps,
chunk servers. Notably, the birth timestamp from the lower file system is spread across three chunk servers.
not reflected in MooseFS. Furthermore, it was observed that different In our experiment, we modified the first bytes of the 200 MB file.
chunk servers displayed varying timestamps for the same chunk. While MooseFS only holds a singular set of timestamps for the upper file,
When an upper file is modified, its Modification and Change time the Modification and Change timestamps are updated the same way
stamps are updated to the same value. The same holds true for the regardless of the position, in which the file is modified. On the lower file
corresponding chunks stored within the lower file systems. However, systems however, only the Modification and Change timestamps of the
timestamps might again vary across chunk servers. If the upper file’s impacted chunk, the first of four, were updated across the two chunk
timestamps are changed without data alteration, e.g. by utilizing the servers hosting that chunk. A similar pattern was observed when other
touch command in Linux, the chunk timestamps remain unaffected. sections of the file were altered: only the relevant chunk’s timestamps
The update of File Access timestamps is rather complex and depends changed. This level of granularity provides a more intricate view into file
on multiple of factors: modifications on the upper file system.
In our MooseFS experiments, we observed an unexpected behavior
• MooseFS Configuration: The ATIME_MODE in the Master Server where chunks sometimes moved between chunk servers after file
config determines the Access time update policy for upper files. modification or idling periods. While MooseFS naturally rebalances
Default is always, with options like ”always for files” or ”never” chunks across servers, the reasons for these specific movements were
(similar to Linux’s noatime). unclear. Crucially, this behavior has implications for timestamps. When
• Client Caching: When mounting MooseFS, it is possible to set a data a chunk is relocated to a new server, it behaves as if it’s newly created,
cache mode. Options include DIRECT (no caching) and YES (always thus resetting all its timestamps to the time of the relocation.
use cache). Default is AUTO, which behaved like YES in our tests.
• Chunk Pre-Fetch: For performance, MooseFS uses pre-fetch and 6.1.3. Update behavior for GlusterFS
read-ahead algorithms on chunk servers to pre-load expected chunks When a file is created in GlusterFS, the Modification and Change
into the OS memory. This is hardcoded and cannot be changed. timestamps of the upper file are set to the same value, while the Access
• Lower File System Configuration: The lower file system on the timestamp was always set to a value a little earlier. For lower file sys
chunk server has its own Access timestamp policy. In Linux, the tems, the behavior of the initial timestamps depended on the volume
default is relatime, which doesn’t update Access times with every mode. For a replicated volume, all timestamps were set to same value,
access. while a dispersed volume resulted in different Change and Modified
timestamps. Furthermore and as expected, the timestamps across the
In MooseFS, the file access timestamps for upper files are influenced lower file systems stored on multiple servers varied. Additionally, the
by its configuration and client caching. It was observed that when client Birth timestamp was utilized by the lower file system, but also not
caching is disabled, every file access updates the Access timestamps on populated to the upper file system.
the client, which the Master Server adopts in the default configuration. If After the modification of a file, the Modification and Change time
client caching is enabled however, the Access timestamp stamp of a file is stamps of the upper file were updated to the same values. Furthermore,
only updated on its first access or when it gets reloaded into cache. If the Modification, Change and Access of the lower files were updated.
MooseFS is however configured to never Modification the Access time Since the Access timestamp of an upper file has to be propagated to
stamps, client-side updates aren’t stored on the Master Server and are each lower file, GlusterFS doesn’t by default keep track of Access times
lost almost instantaneously. In our tests, Access timestamps for lower preventing any performance drops. It was however observed, that access
files were updated upon the chunk server daemon’s initial start, pro to an upper file could update the Access timestamp of a corresponding
vided its access time mode was set accordingly, e.g. using the stric lower file depending naturally on the atime configuration of the lower
tatime option. With 10,000 files (and corresponding lower files), file systems. In a setup with three replicated bricks, the specific accessed
Access timestamps changed post-daemon start without client read re lower file alternated for each access. Furthermore, the timestamp was
quests. This is likely due to MooseFS’s pre-fetch algorithms reading data not updated for each access, most likely due to again some kind of
in memory for some time, though no clear order was discernible. caching performed within the client. Caching within the GlusterFS
servers itself was not observed.
6
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
After the modification of a file, the Modification and Change time slack exist within a stacked file system and how they can be detected and
stamps were updated and contained the same values within eCryptfs as extracted. During the following experiments, we have evaluated the
well as the lower file system. The same holds true for a file access and feasibility of slack space within stacked file systems by utilizing it to hide
metadata modification, updating the corresponding Access and Modi data.
fication timestamps respectively. Consequently, all of the timestamp
modifications performed directly on the lower file system were also 7.1. Findings
mirrored to the stacked file system.
This section is divided into two parts: the first presents the findings
6.2. Key takeaways related to the lower file slack space, while the second focuses on the
extra lower slack space resulting from expanding the size of a lower file.
For stacked file system analysis, we advise practitioners to harness
both potential sources of timestamps within the upper and lower file 7.1.1. Lower file slack
system. Extracting timestamps from the upper file system is crucial, as
outlined in our previous section for MooseFS and GlusterFS. In addition, 7.1.1.1. MooseFS. Chunks start with a 0x2000 byte header, followed
timestamps of the lower files should also be extracted and analytical by the upper file’s content in 0x10000 byte blocks. The final 0x1000
methods need to be able to correlate both timestamp sources. This bytes of the chunk header store CRC checksums: four bytes for each
approach is particularly beneficial in distributed stacked file systems, block, accommodating up to 1024 blocks. This results in the maximum
where data fragmentation leads to a more detailed level of timestamp chunk size of 64 MiB, plus the header size. Given the large block size,
granularity for each file. MooseFS’s lower file slack can be used to hide up to 64 KiB of data
Furthermore, in situations where the upper file system depends without altering the chunk size. Data hidden here doesn’t affect the
solely on the lower file system’s timestamps, two aspects should be upper file’s accessibility or its displayed size. However, inserting data
considered: First, analyzing the lower file system can already provide causes a mismatch of the CRC checksums, which led to the chunk
valuable temporal insights. Second, as our eCryptfs example shows, marked as INVALID upon server restarts during our experiments. For
these timestamps may be more susceptible to manipulation. effective concealment, it’s crucial to update these checksums. Further
Moreover, akin to conventional file system forensics, understanding more, modifying the upper file doesn’t affect the data hidden in the
the behavior of timestamp updates in both the upper and lower file lower file slack. However, if the file expands, reducing the chunk’s slack,
systems is essential. the concealed data is overwritten.
7.1.2.1. MooseFS. With stacked file systems, data can also be hidden in
the extra lower file slack by appending it to an existing lower file. Since
MooseFS utilizes a maximum size for its lower files, it is also ensured
that storing data past this offset is protected from being overwritten due
to any modifications of the upper file. In our experiments, we filled the
space up to the maximum size with zeros and placed data in the suc
ceeding space. Subsequent modifications to the upper file did not
overwrite the hidden data in the extra lower file slack. However, as soon
as a chunk was transferred to another chunk server, the hidden data did
not persist and was lost.
7
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
7.1.2.2. GlusterFS. For GlusterFS, hiding data in extra lower file slack 8.1.2. GlusterFS
proved impractical across all modes. In distributed and replicated In its default configuration, GlusterFS does not utilize its Trash
modes, data added to the lower file also appears when reading the translator feature. Thus, as soon as an upper file is deleted, the corre
corresponding upper file, though the upper file size remains unchanged. sponding files in the lower file system are removed as well and the
For replicated volumes with only one replica containing hidden slack possibilities of recovery depend on the lower file system. Enabling this
data, the upper file consistently reveals this data. When multiple replicas feature results in the creation of a.trashcan directory on each of the
have extra slack data, the upper file reads from the largest lower file. To bricks, which is used to hold deleted upper files and is also mounted
hide data, one might place it in any replica, but ensure another copy has within the upper file system. After the deletion, the GFID-named lower
more benign data, like null bytes. In dispersed mode, data concealed in file remains intact, while the hard link in the original hierarchy is
the slack of one file in a three-brick setup vanished upon reading the removed from the lower file system. Instead, a new hard link within the.
upper file, while adding data to two lower files caused an I/O error. trashcan directory is created, whose name consists of the original
upper file’s name and the actual time of deletion. Furthermore, the
7.1.2.3. eCryptfs. In eCryptfs, an upper file remained accessible with its original path hierarchy of the deleted file is also recreated within the
file size unchanged when the data was stored in the corresponding extra trash directory.
lower file slack. Notably, this appended hidden data persisted when the
upper file was modified or when new data was added, provided it did not 8.1.3. eCryptfs
surpass the padding limit. If the file grew beyond the available padding, Following a file deletion within eCryptfs, the corresponding lower
the appended data was overwritten. To avoid this, one can add ample file was also deleted instantaneously in our experiments.
padding before the hidden data, ensuring that any growth of the original
file only replaces this ’dummy’ padding, thereby preserving the hidden 8.2. Key takeaways
data.
Our research reveals that stacked file systems can offer an extra
opportunity for file recovery through their own trash features. Under
7.2. Key takeaways standing the specific structure and metadata of the stacked file system is
key, and the data from these trash bin mechanisms should be factored
Stacked file systems differ from traditional ones in that slack space into the analysis process.
does not contain remnants of previous files, primarily because new Investigators should consider the new opportunities presented by the
lower files are created for each new upper file. However, our findings presence of an additional lower file system. Even if content is deleted
suggest that exploiting slack space in lower files, or even in additional from the upper file system, the original data might still exist as lower
lower file slack, could be a viable tactic in certain stacked file system files within the lower file system. While file recovery becomes wholly
implementations. Consequently, forensic practitioners should not only dependent on the lower file system following a complete file deletion,
focus on the upper file system but also thoroughly examine the lower file the inherent structure of these lower files can be utilized for advanced
system during their analyses. recovery techniques, such as file carving. In summary, these findings
Detecting file slack requires a detailed comparison between the file imply that acquiring the lower file system, either physically or logically,
sizes recorded in the upper file system and those of the corresponding is more advantageous than merely performing a simple logical acquisi
lower files. Additionally, cross-referencing replicas of lower files across tion of the upper file system.
various lower file systems is critical to identify any discrepancies that
may indicate tampering or manipulation. 9. Conclusion
8. Possibilities for file recovery Contrary to traditional file systems, the concept of stacked file sys
tems utilizes an additional file system for data storage. Given its inte
Besides operating system or application specific concepts such as gration into various modern distributed file systems, encountering
trash bins, file deletion is completely file system specific. Some file stacked file systems is inevitable in present and future forensic in
systems such as older versions of Ext may keep references to the actual vestigations. In this paper, we focused on the forensic analysis of stacked
data blocks, while others may wipe these entirely. In these experiments, file systems and presented an updated model that is capable of handling
we circumvented the operating system’s Trash bin by directly deleting this class of file systems. Complementing this, we presented various
files from the stacked file system using the rm command. This approach forensic implications based on traditional analysis techniques and
allowed us to examine the file deletion processes of the stacked file explored them using three representative stacked file systems as
systems in question, thereby identifying the potential methods available examples.
for file recovery. Our findings reveal that understanding the architecture, mecha
nisms, and features of stacked file systems is crucial for effective anal
ysis. We demonstrated basic procedures like identification and metadata
8.1. Findings extraction in our findings, noting that further research is essential for a
more comprehensive understanding of these systems. The significance of
8.1.1. MooseFS the underlying file system was also emphasized, particularly its potential
MooseFS’s own trash mechanism holds deleted files for 24 h by to enhance investigations with finer details, such as more precise
default. When an upper file is deleted, it becomes inaccessible, but its timestamps. Notably, even when access to the upper file system itself is
chunks in the lower system persist. These deleted files are labeled as hindered, for example by encryption or incomplete distributed struc
trash files in the NODE metadata section on the Master Server. Further tures, valuable data can still be retrieved from the lower file system.
more, the Change timestamp of these deleted upper files stored in the To fully leverage these insights, it is imperative for current forensic
metadata can be used to infer the time of deletion. Notably, even with a methodologies and tools to adapt. Our research lays a solid groundwork
trash duration set to zero, chunks stayed active for a couple of minutes. for future exploration in this area and aims to increase awareness among
During this time the upper files got tagged as sustained files in the met forensic investigators regarding the complexities and opportunities
adata indicating they were deleted but still open. The Change timestamp presented by stacked file systems.
of these files can hint at their deletion time. Once a file was fully deleted,
its chunks were too.
8
J.-N. Hilgert et al. Forensic Science International: Digital Investigation 48 (2024) 301678
Acknowledgement Halcrow, M.A., ecryptfs, 2005. An enterprise-class encrypted filesystem for linux.
Proceedings of the 2005 Linux Symposium 1, 201–218.
Harshany, E., Benton, R., Bourrie, D., Glisson, W., 2020. Big data forensics: Hadoop 3.2.
We thank our shepherd and our anonymous reviewers for their 0 reconstruction. Forensic Sci. Int.: Digit. Invest. 32, 300909.
invaluable feedback on this paper. Heidemann, J.S., Popek, G.J., 1994. File-system development with stackable layers. ACM
Trans. Comput. Syst. 12 (1), 58–89.
Hilgert, J.N., Lambertz, M., Plohmann, D., 2017. Extending the sleuth kit and its
References underlying model for pooled storage file system forensic analysis. Digit. Invest. 22,
S76–S85.
Aranya, A., Wright, C.P., Zadok, E., 2004. Tracefs: a file system to trace them all. FAST Huebner, E., Bem, D., Wee, C.K., 2006. Data hiding in the ntfs file system. Digit. Invest. 3
129–145. (4), 211–226.
Asim, M., McKinnel, D.R., Dehghantanha, A., Parizi, R.M., Hammoudeh, M., Martini, B., Choo, K.K.R., 2014. Distributed filesystem forensics: Xtreemfs as a case
Epiphaniou, G., 2019. Big data forensics: Hadoop distributed file systems as a case study. Digit. Invest. 11 (4), 295–313.
study. Handbook of Big Data and IoT Security 179–210. Raghavan, S., 2013. Digital forensic research: current state of the art. Csi Transactions on
Bhat, W., Quadri, S., 2012. Restfs: secure data deletion using reliable & efficient ICT 1, 91–114.
stackable file system. In: 2012 IEEE 10th International Symposium on Applied Sipek, J., Pericleous, Y., Zadok, E., 2007. Kernel support for stackable file systems. In:
Machine Intelligence and Informatics (Sami). IEEE, pp. 457–462. Proc. Of the 2007 Ottawa Linux Symposium, vol. 2. Citeseer, pp. 223–227.
Carrier, B., 2005. File System Forensic Analysis. Addison-Wesley Professional. van der Meer, V., Jonker, H., van den Bos, J., 2021. A contemporary investigation of
Garfinkel, S.L., 2007. Carving contiguous and fragmented files with fast object NTFS file fragmentation. Forensic Sci. Int.: Digit. Invest. 38, 301125.
validation. Digit. Invest. 4, 2–12. Wani, M.A., Bhat, W.A., Dehghantanha, A., 2020. An analysis of anti-forensic capabilities
Göbel, T., Türr, J., Baier, H., 2019. Revisiting data hiding techniques for apple file of b-tree file system (btrfs). Aust. J. Forensic Sci. 52 (4), 371–386.
system. In: Proceedings of the 14th International Conference on Availability, Zadok, E., 1999. Stackable File Systems as a Security Tool. Tech. Rep.; Citeseer.
Reliability and Security, vols. 1–10.