Research Paper 5 2004
Research Paper 5 2004
net/publication/2902953
CITATIONS READS
20 65
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Michael P. Mesnier on 02 February 2015.
Daniel Ellard, Michael Mesnier, Eno Thereska, Gregory R. Ganger, Margo Seltzer
File attributes
What properties at create time
We present evidence that attributes that are known to to predict?
the file system when a file is created, such as its name, .#pico, 644, uid, ...
permission mode, and owner, are often strongly related FS activity feedback
to future properties of the file such as its ultimate size, Model Model
Generator
lifespan, and access pattern. More importantly, we show (NFS, Local)
that we can exploit these relationships to automatically
generate predictive models for these properties, and that Predictions
of wanted properties
these predictions are sufficiently accurate to enable opti-
mizations.
Figure 1: Using file attributes to predict file properties.
During the training period a predictor for file proper-
ties (i.e., lifespan, size, and access pattern) is constructed
1 Introduction from observations of file system activity. The file sys-
tem can then use this model to predict the properties of
newly-created files.
In “Hints for Computer System Design,” Lampson
tells us to “Use hints to speed up normal execution.” [14]
The file system community has rediscovered this prin- predictions are accurate. Finally, we discuss uses for
ciple a number of times, suggesting that hints about a such predictions, including an implementation of a sys-
file’s access pattern, size, and lifespan can aid in a va- tem that uses them to improve file layout by anticipating
riety of ways including improving the file’s layout on which blocks will be the most frequently accessed and
disk and increasing the effectiveness of prefetching and grouping these blocks in a small area on the disk, thereby
caching. Unfortunately, earlier hint-based schemes have improving reference locality.
required the application designer or programmer to sup-
ply explicit hints using a process that is both tedious and The rest of this paper is organized as follows: Sec-
error-prone, or to use a special compiler that can recog- tion 2 discusses related work. Section 3 describes the
nize specific I/O patterns and automatically insert hints. collection of NFS traces we analyze in this study. Sec-
Neither of these schemes have been widely adopted. tion 4 makes the case for attribute-based predictions by
presenting a statistical analysis of the relationship be-
In this paper, we show that applications already give tween attributes of files and their properties. Section 5
useful hints to the file system, in the form of file names presents ABLE, a classification-tree-based predictor for
and other attributes, and that the file system can success- several file properties based on their attributes. Section 6
fully predict many file properties from these hints. discusses how such models might be used, and demon-
We begin by presenting statistical evidence from three strates an example application which increases the local-
contemporary NFS traces that many file attributes, such ity of reference for on-disk block layout. Section 7 con-
as the file name, user, group, and mode, are strongly re- cludes.
lated to file properties including file size, lifespan, and
access patterns. We then present a method for automati-
cally constructing tree-based predictors for the properties
2 Related Work
of a file based on these attributes and show that these
Intel Corporation and Parallel Data Laboratory, Carnegie Mellon increased many efforts have attempted to address it. An
University. entire industry and research community has emerged to
1
attack I/O performance; file systems have been modified, or incomplete file system heuristics is for applications
rewritten and rethought in attempts to reduce the number to supply hints to the file system about the files’ antici-
of synchronous disk requests. Significant effort has also pated access patterns. In some contexts these hints can
been expended to make caches more effective so that the be extremely successful, especially when used to guide
number of disk requests can be reduced. Many powerful the policies for prefetching and selective caching [5, 25].
heuristics have been discovered, often from the analyses The drawback of this approach is that it requires that
of real workloads, and incorporated into production file applications be modified to provide hints. There has
systems. All of these endeavors have been productive, been work in having the compiler automatically generate
but I/O performance is still losing ground to CPU, mem- hints, but success in this area has been largely confined to
ory, and network performance, and we have not resolved scientific workloads with highly regular access patterns
the I/O crisis to which Patterson refers in the original [21], and no file system that uses these ideas has been
RAID paper, written more than fifteen years ago [24]. widely deployed.
There is extensive ongoing research in the file system In previous work, we noted that for some workloads,
and database communities regarding the optimization of applications (and the users of the applications) already
various aspects of performance, reliability, and availabil- provide hints about the future of the files that they create
ity of data access. Many heuristics have been developed via the names they choose for those files [8]. In this paper
and incorporated into popular file systems like the Fast we generalize this finding and show that file names, as
File System (FFS) [17]. Many of these heuristics depend well as other attributes such as uid and mode, are, in fact,
on assumptions about workloads and file properties. hints that may be useful to the file system.
One example of a contemporary file system is the Fast In addition to static analyses of workloads, there has
File System (FFS) [17], whose basic design is nearly been research aimed at understanding the dynamic be-
twenty years old and yet continues to be tuned [7]. For haviors of files. Previous work has shown that proper-
example, FFS is optimized to handle small files in a dif- ties of files depend on the applications and users access-
ferent manner than large files; it attempts to organize ing them [2, 9] and because users and applications may
small files on disk so that they are near their metadata change, workloads change as well.
and other files in the directory, under the assumption that
Considerable work has been done in developing and
files in the same directory are often accessed together.
exploiting predictive models for the access patterns of
Some file systems go to more extreme lengths, such as
files (or their data blocks) [1, 30, 31]. Most work in this
storing the contents of short files in the same disk block
area focuses on rather complex and computationally ex-
as their inode [22] or storing the directory and inode in-
pensive predictive models. Furthermore, such models are
formation in the same block [11].
often needed on a file-by-file basis and do not attempt
In addition to size, other properties of files, such as to find relationships or classes among files to generalize
whether they are write-mostly or read-mostly, have been [15]. We extend this work by providing a framework for
found useful to drive various file system policies. For automatically classifying files with similar behaviors.
example, the assumption underlying the design of the
There also exist systems that use a variety of layout
log-structured file system (LFS) is that write-latency is
policies that provide non-uniform access characteristics.
the bottleneck for file system performance [26]. Hybrid
In the most extreme case, a system like AutoRAID [31]
schemes that use LFS to store write-mostly files have
employs several different methods to store blocks with
also found this approach useful [23]. In contrast, if a
different characteristics. On a more mundane level, the
file is known to be read-mostly, it may benefit from ag-
performance of nearly all modern disk drives is highly
gressive replication for increased performance and avail-
influenced by the multi-zone effect, which can cause the
ability [27].
effective transfer rate for the outer tracks of a disk to be
Unfortunately, every widespread heuristic approach considerably higher than that of the inner [19]. There
suffers from at least one of the following problems: First, is ample evidence that adaptive block layout can im-
if the heuristics are wrong, they may cause performance prove performance; we will demonstrate that we can pre-
to degrade, and second, if the heuristics are dynamic, emptively determine the layout heuristics to achieve this
they may take considerable time, computation, and stor- benefit without having to reorganize files after their ini-
age space to adapt to the current workload (and if the tial placement.
workload varies over time, the adaptation might never
Advances in artificial intelligence and machine learn-
converge).
ing have resulted in efficient algorithms for building ac-
One partial solution to the problem of inappropriate curate predictive models that can be used in today’s file
2
systems. We leverage this work and utilize a form of study [8], although we draw our samples from a longer
classification tree to capture the relationships between subset of the trace. All three traces were collected with
file attributes and their behaviors, as further described in nfsdump [10].
Section 5.
Table 1 gives a summary of the average hourly oper-
The work we present here does not focus on new ation counts and mixes for the workloads captured in the
heuristics or policies for optimizing the file system. In- traces. These show that there are differences between
stead it enables a file system to choose the proper policies these workloads, at least in terms of the operation mix.
to apply by predicting whether or not the assumptions on CAMPUS is dominated by reads and more than 85% of
which these policies rely will hold for a particular file. the operations are either reads or writes. DEAS03 has
proportionally fewer reads and writes and more meta-
data requests (getattr, lookup, and access) than
CAMPUS, but reads are still the most common opera-
3 The Traces tion. On EECS03, meta-data operations comprise the
majority of the workload.
To demonstrate that our findings are not confined to Earlier trace studies have shown that hourly opera-
a single workload, system, or set of users, we analyze tion counts are correlated with the time of day and day of
traces taken from three servers: week, and much of the variance in hourly operation count
is eliminated by using only the working hours [8]. Table
DEAS03 traces a Network Appliance Filer that serves 1 shows that this trend appears in our data as well. Since
the home directories for professors, graduate stu- the “work-week” hours (9am-6pm, Monday through Fri-
dents, and staff of the Harvard University Divi- day) are both the busiest and most stable subset of the
sion of Engineering and Applied Sciences. This data, we focus on these hours for many of our analyses.
trace captures a mix of research and development,
One aspect of these traces that has an impact on
administrative, and email traffic. The DEAS03
our research is that they have been anonymized, using
trace begins at midnight on 2/17/2003 and ends on the method described in earlier work [8]. During the
3/2/2003.
anonymization UIDs, GIDs, and host IP numbers are
EECS03 traces a Network Appliance Filer that serves simply remapped to new values, so no information is lost
the home directories for some of the professors, about the relationship between these identifiers and other
graduate students, and staff of the Electrical Engi- variables in the data. The anonymization method also
neering and Computer Science department of the preserves some types of information about file and direc-
Harvard University Division of Engineering and tory names – for example, if two names share the same
Applied Sciences. This trace captures the canonical suffix, then the anonymized forms of these names will
engineering workstation workload. The EECS03 also share the same suffix. Unfortunately, some informa-
trace begins at midnight on 2/17/2003 and ends on tion about file names is lost. A survey of the file names
3/2/2003. in our own directories leads us to believe that capital-
ization, use of whitespace, and some forms of punctua-
CAMPUS traces one of 14 file systems that hold home tion in file names may be useful attributes of file names,
directories for the Harvard College and Harvard but none of this information survives anonymization. As
Graduate School of Arts and Sciences (GSAS) stu- we will show in the remaining sections of this paper, the
dents and staff. The CAMPUS workload is almost anonymized names provide enough information to build
entirely email. The CAMPUS trace begins at mid- good models, but we believe that it may be possible to
night 10/15/2001 and ends on 10/28/2003. build even more accurate models from unanonymized
data.
Ideally our analyses would include NFS traces from
a variety of workloads including commercial datacenter
servers, but despite our diligent efforts we have not been
able to acquire any such traces.
4 The Case for Attribute-Based Predic-
tions
The DEAS03 and EECS03 traces are taken from the
same systems as the DEAS and EECS traces described
in earlier work [9], but are more recent and contain infor- To explore the associations between the create-time
mation not available in the earlier traces. The CAMPUS attributes of a file and its longer-term properties, we be-
trace is the same trace described in detail in an earlier gin by scanning our traces to extract both the initial at-
3
All Hours
Host read write lookup getattr access
DEAS03 48.7% (50.9%) 15.7% (55.3%) 3.4% (161.6%) 29.2% (49.3%) 1.4% (119.5%)
EECS03 24.3% (73.8%) 12.3% (123.8%) 27.0% (69.5%) 3.2% (263.2%) 20.0% (67.7%)
CAMPUS 64.5% (48.2%) 21.3% (58.9%) 5.8% (44.4%) 2.3% (60.7%) 2.9% (51.4%)
Peak Hours (9:00am – 6:00pm Weekdays)
DEAS03 50.0% (24.3%) 16.8% (28.9%) 3.4% (29.3%) 26.6% (29.4%) 1.3% (44.8%)
EECS03 18.2% (63.5%) 12.3% (86.7%) 27.0% (33.6%) 3.0% (129.9%) 21.5% (39.8%)
CAMPUS 63.5% (8.5%) 22.3% (16.7%) 5.6% (8.1%) 2.4% (32.6%) 3.0% (10.8%)
Table 1: The average percentage of read, write, lookup, getattr, and access operations for a fourteen day trace from
each server. The averages are shown for both all hours during the trace and for the peak hours (9:00am – 6:00pm on
weekdays). The coefficient of variation for each hourly average is given in parentheses.
tributes of each file (such as those we observe in cre- a file name to three name components (first, middle, and
ate calls) and the evolution of the file throughout the last). Files with more than three components would have
trace, so we can record information about its eventual the remainder subsumed by the middle name. For exam-
size, lifespan, and read/write ratio. From these observa- ple, the file foo.bar.gz.tmp would have a middle
tions, we are able to measure the statistical association name of bar.gz. Filenames with fewer than three com-
between each attribute and each property. The stronger ponents will take on NULL component values. There
the association, the greater the ability to predict a prop- may be other useful features within file names, but we
erty, given the attribute. are constrained by the information that remains after the
anonymization process.
Some of these associations are intuitive: files that
have the suffix .gz tend to be large, files whose names In the remainder of this section we use the chi-square
contain the string lock tend to be zero-length and live test [16] (pages 687–693) to show that the association
for a short period of time, etc. Other associations are less between a files attributes and properties is more than a
obvious. Particular users and groups, for example, of- coincidence. We provide statistical evidence that the as-
ten have unique lifespan distributions for their files. We sociation is significant and quantify the degree of asso-
find that the mode of a file (i.e., whether the file is read- ciativity for each attribute.
able or writable) often serves as a fingerprint of the en-
vironment in which it was created, and can even expose
certain idiosyncrasies of the users and their applications.
4.1 Statistical Evidence of Association
The mode is often a surprisingly good indicator of how
a file will be used, but not as one would expect: on the We use a chi-square test (also known as a two-
DEAS03 trace, for example, any file with a mode of 777 dimensional contingency table) to quantify the associa-
is likely to live for less than a second and contain zero tion between each attribute and property. The chi-square
bytes of data – it is somewhat nonintuitive that a file that test of association serves two purposes. First, it provides
is created with a mode that makes it readable and write- statistical evidence that associations exist and quantifies
able by any user on the system is actually never read or the degree of associativity for each attribute. Second, it
written by anyone. Most of these files are lock files (files is one of the mechanisms we use to automatically con-
that are used as a semaphore for interprocess commu- struct a decision tree that uses the information we extract
nication; their existence usually indicates that a process from the traces to predict the properties of files.
desires exclusive access to a particular file).
If there is no association, then the probability of a file
In order to capture some of the information expressed having a given property is independent of the attribute
by different file-naming conventions (for example, us- values. For example, suppose that we find that 50% of
ing suffixes to indicate the format of the contents of a the files we observe are write-only. If the write-only
file), we decompose file names into name components. property is associated with the file name suffix, then this
Each of these components is treated as a separate at- percentage will be different for different suffixes. For ex-
tribute of the file. We have found it is effective to use ample, we may find that 95% of the .log files are write-
a period (’.’) to delimit the components. For example, only. If no association exists, then the expected percent-
the file name foo.bar would have two name compo- age of write-only files with each extension will not differ
nents (foo and bar). To simplify our analysis, we limit from 50% in a statistically significant manner. The diffi-
4
culty with such a test is distinguishing natural variation
from a statistically significant difference; the chi-squared
test is used to detect and quantify such differences.
DEAS03 3/24/2003
siz e=0
The sum of squared differences between the expected
lifetime<=1s (file)
and observed number of files is our chi-squared statis-
mode
tic, and we calculate this statistic for each combination
gid of attribute and property. In statistical terms, we are try-
ing to disprove the null hypothesis that file attributes are
uid
not associated with file properties. A chi-squared statis-
last tic of zero indicates that there is no association (i.e., the
middle expected and observed values are the same), while the
magnitude of a non-zero statistic indicates the degree of
first association. This value is then used to calculate a p-value
0 0.2 0.4 0.6 0.8 1 which estimates the probability that the difference be-
relative strength of association tween the expected and observed values is coincidental.
EECS03 3/24/2003 For all of our tests, we have a high chi-square statis-
lifetime<=1s (file) siz e=0
tic, and the p-values are very close to zero. Therefore
we may, with very high confidence, reject the null hy-
mode
pothesis and claim that attributes are associated with file
gid properties.
uid The chi-squared test can also be used to rank the at-
tributes by the degree of association. Figure 2 shows
last
how the chi-squared values differ for the size and lifespan
middle properties. There are two important points to take from
first this figure. First, the attribute association differs across
properties for a given trace – for example, in CAM-
0 0.2 0.4 0.6 0.8 1
relative strength of association
PUS the uid shows a relatively strong association with
the lifespan, yet a weak association with the size. The
CAMPUS 10/22/2001 second point is that the relative rankings differ across
lifetime<=1s (file) siz e=0
traces. For example, on CAMPUS the middle compo-
nent of a file name has strong association with lifespan
mode
and size, but the association is much weaker on DEAS03
gid and EECS03.
uid Although we show only two properties in these
graphs, similarly diverse associations exist for other
last
properties (e.g., directory entry lifespan and read/write
middle ratio). In Section 5 we show how these associations can
first be dynamically discovered and used to make predictions.
0 0.2 0.4 0.6 0.8 1 The chi-squared test described in this section is a one-
relative strength of association way test for association. This test provides statistical ev-
idence that individual attributes are associated with file
Figure 2: The relative strength of the correlation between properties. It does not, however capture associations be-
the properties “lifetime (lftmd) is one second or shorter” tween subsets of the attributes and file properties. It also
and “size is zero” and several file attributes (as indicated does not provide an easy way to understand exactly what
by the chi-squared values) for one day of each trace. The those associations are. One can extend this methodology
chi-squared values are normalized relative to the attribute to use -way chi-square tests, but the next section dis-
with the strongest association. The last, middle, and first cusses a more efficient way for both capturing multi-way
attributes refer to components of the file name, as de- associations and extracting those associations efficiently.
scribed in Section 5.
5
5 The ABLE Predictor 6, for example, we show that by identifying small, short-
lived files and hot directories, we can use predictions to
optimize directory updates in a real file system.
The results of the previous section establish that each
of a file’s attributes (file name, uid, gid, mode) are, to ABLE consists of three steps:
some extent, associated with its long term properties
(size, lifespan, and access pattern). This fact suggests
Step 1: Obtaining Training Data. Obtain a sample of
that these associations can be used to make predictions
files and for each file record its attributes (name,
on the properties of a file at creation time. The chi-
uid, gid, mode) and properties (size, lifespan, and
squared results also give us hope that higher order as-
access pattern).
sociations (i.e., an association between more than one at-
tribute and a property) may exist, which could result in Step 2: Constructing a Predictive Classifier. For each
more accurate predictions. file property, we train a learning algorithm to clas-
To investigate the possibility of creating a predictive sify each file in the training data according to that
model from our data, we constructed an Attribute-Based property. The result of this step is a set of predic-
Learning Environment (ABLE). ABLE is a learning en- tive models that classifies each file in the training
vironment for evaluating the predictive power of file at- data and can be used to make predictions on newly
tributes. The input to ABLE is a table of information created files.
about files whose attributes and properties we have al-
ready observed and a list of properties for which we wish Step 3: Validating the Model. Use the model to pre-
to predict. The output is a statistical analysis of the sam- dict the properties of new files, and then check
ple, a chi-squared ranking of each file attribute relative to whether the predictions are accurate.
each property, and a collection of predictive models that
can be used to make predictions about new files. Each of these steps contains a number of interesting
In this paper, we focus on three properties: the file issues. For the first step, we must decide how to obtain
size, the file access pattern (read-only or write-only), and representative samples. For the second, we must choose
the file lifespan. On UNIX file systems, there are two as- a learning algorithm. For the third, we must choose how
pects of file lifespan that are interesting: the first is how to evaluate the success of the predictions. We may con-
long the underlying file container (usually implemented sider different types of errors to have different degrees of
as an inode) will live, and the other is how long a par- importance – for example, if the file system treats short-
ticular name of a file will live (because each file may be lived files in a special manner, then incorrectly predicting
linked from more than one name). We treat these cases that a file will be short-lived may be worse than incor-
separately and make predictions for each. rectly predicting that a file will be long-lived.
Table 2: ABLE training samples obtained from the Figure 3: Constructing a simple decision tree from the
DEAS03. training data in Table 2.
Table 3: A comparison of the accuracy of the ABLE and MODE predictors for several properties for the three traces.
MODE always predicts the value that occurred most frequently in the training sample, without considering any at-
tributes of the new file.
all, even if the new file shared many other attributes with ple model named MODE that always predicts the mode
.hot or .cold files. To avoid this problem, ABLE in- of a property, which is defined as the value of the prop-
structs the ID3 algorithm to continue expanding the tree erty that occurs most frequently in the training data. For
until all attributes are exhausted (or the data perfectly example, if most of the files created on Monday were
classified) and then ABLE prunes bottom leaves of the write-only, then the MODE predictor would predict that
tree to eliminate potentially unnecessary or overly spe- every file created on Tuesday would be write-only, with-
cific. This is one of many pruning methods commonly out considering any of the file attributes. Because all our
used to favor a smaller tree to a larger one [4](pages 279– properties are binary, each prediction is either correct or
293), in the hope that a smaller tree generalizes better incorrect and the predication accuracy is simply the ratio
on future samples. Note that building the tree top-down of correct predictions to the sample size.
and selecting the most strongly associated attributes first
Table 3 shows the prediction accuracies on Tuesday,
guarantees that only the least associated attributes will be
for each of DEAS03, EECS03 and CAMPUS. In nearly
pruned in this process.
all cases, ABLE more accurately predicts the properties
of files, and in some cases, nearly doubles the accuracy
5.3 Validating the Model relative to probability-based guessing (MODE). How-
ever, there are some cases, specifically on the CAMPUS
At this point, we have used our training data to in- trace, where the workload is so uniform that MODE does
duce a decision tree model that can classify the data. The almost as well.
result is a model that can be used to classify new files
(i.e., predict their properties). For example, if a new file 5.4 MABLE and NABLE
were to be created with mode 600 and name foo.log,
the model will predict that the file will be write-only.
For our simple example, we only have one rule: a file ABLE’s decision trees successfully exploit the sta-
is write-only only if its mode is 600 and its last name is tistical association between file attributes and properties
.log. In general, a rule is a conjunction of all attributes and can be used to produce accurate predictions about
on a path to a positively classified leaf node. future file system activity. We were also curious about
which attributes make the largest contribution. The chi-
For each of the binary predicates, we induce a de-
squared analysis in Section 4 established that many of the
cision tree from a sample of files seen during the
attributes had strong associations, but this is not enough
peak hours (9am-6pm) on a Monday from the trace
to determine whether or not multi-way attribute associa-
(10/22/2001 for CAMPUS and 3/24/2003 for EECS03
tions would have much effect on prediction accuracy.
and DEAS03). We then make predictions about the files
created during the peak hours on the following day. The The easiest way to measure the effects of additional
decision to train on the peak hours of Monday and test on attributes is to compare the ABLE trees (induced using
the peak hours of Tuesday is not completely arbitrary; all available attributes) against a set of constrained trees
as shown in Section 3, the peak hours are the most ac- (induced with a limited set attributes).
tive hours of the day. The resulting size of the training
If multi-way associations exist between the attributes,
and testing samples are approximately 40,000 files for
then we can empirically measure their effect by compar-
DEAS03, 35,000 for CAMPUS, and 15,000 for EECS03.
ing prediction accuracies. To this end, we construct two
For comparison purposes, we compare against a sim- new sets of constrained decision trees, and compare these
8
against the ABLE (unconstrained) decision trees.
80
clouded by transient multi-way associations that do not
apply in the long run. Second, NABLE and MABLE 60
offer predictions roughly equivalent to ABLE. This is
somewhat surprising, particularly in the case of MABLE, 40
because it means that we can make accurate predictions
20
even if we do not consider file names at all.
0
Given enough training data, ABLE always outper- size=0 wronly lftmd<=1s (direntry)
forms MABLE and NABLE. For the results presented
in the paper, ABLE required an extra week of training Figure 4: Comparing the prediction accuracy of ABLE,
to detect the false attribute associations, due in part to NABLE, and MABLE for the properties size=0, write-
the small number of attributes. We anticipate that more only, and lifetime
1 second. Prediction accuracy is
training will be required for systems with larger attribute measured as percentage correct.
spaces, such as object-based storage with extended at-
tributes [18] and non-UNIX file systems such as CIFS or
NTFS [29]. Furthermore, irrelevant attributes may need
to be pre-filtered before induction of the decision tree [6]
to prevent over-fitting. The automation of ABLE’s train-
ing policies, including attribute filtering, is an area for
future work.
9
• NABLE predicts “write-mostly" if from our predictions is file cache management policy.
first=cache & last=gif [5742/94.0%] When choosing a buffer to evict, it would be helpful to
• MABLE predicts “size=0” if have an accurate prediction of whether or not that buffer
mode=777 [4535/99.8%] would be accessed in the near future (or at all). For the
• ABLE predicts “deleted within 1 sec” if DEAS03 workload, for example, we can identify write-
first = 01eb & last = 0004 & mode = 777 & only files with a high degree of accuracy, and we know
uid = 18abe [1148/99.7%]
that we can immediately evict the buffers created by writ-
ing these files. Similarly, in a disconnected environment,
knowing which files are read-only can help select files to
Figure 5: Example rules for DEAS03 discovered by
hoard.
NABLE, MABLE, and ABLE. The number of files that
match the attributes and the observed probability that Pre-fetching can also benefit from predictions; if we
these files have the given property are shown on the right. can identify files that are highly likely to be read sequen-
For example, NABLE predicts that names whose name tially from beginning to end (perhaps on a user-by-user
begins with cache and end in .gif will be “write- basis), then we can begin pre-fetching blocks for that file
mostly”. This prediction is based on observations of as soon as a client opens it. If cache space is plentiful,
5742 files, 94.0% of which have the “write-only” prop- it might make sense to do aggressive pre-fetching for ev-
erty. ery file opened for reading, but if cache space is at a pre-
mium, it is valuable to know which files will benefit the
most from this treatment.
5.5 Properties of Our Models
Our predictions may also be helpful in optimizing file
layout – if we can predict how large a file will be, and
In our experience, a model that predicts well for one what access patterns the file will experience, then we can
day will continue to perform well for at least the next pre-allocate space on the disk in order to optimally ac-
several days or weeks [9]. However, workloads evolve commodate these properties (instead of adapting to these
over time, and therefore our models will eventually de- properties as they become evident). For example, yFS
crease in accuracy. We are exploring ways to automati- uses three different block allocation and layout policies
cally detect when new models are necessary. Fortunately, for different kinds of files and migrates files from one
building a new model is an inexpensive process (requir- policy to another as they grow or their access patterns
ing approximately ten minutes of processing on a mod- change [32]. Given accurate predictions, we can begin
est Pentium-4 to build a new model from scratch for the with the correct policy instead of discovering it later.
peak hours of the heaviest workloads in our collection),
Another application of ABLE is to guide adaptive as
so one possible approach is simply to build new models
well as pro-active techniques – we can use its models to
at regular intervals, whether or not the current models
predict not only what the future holds for new files, but
have shown any sign of degradation.
also for existing files. In this paper we focus primarily
In general our decision trees yield roughly a 150:1 ra- on the prediction of the properties of new files, because
tio of files to rules. Rules can be easily inspected after this is a capability we have not had before. Nevertheless
the fact to determine interesting patterns of usage (which it is important to recognize that the ABLE models can be
is how we discovered the associations originally). On used for adaptation as well.
DEAS03, for example, the 45K sample files on which
The rest of this section discusses the use of name-
we induced a decision tree with only 300 rules (i.e., a de-
based hints to cluster active directory blocks and inodes
cision tree with 300 leafs). This means that the resulting
into a designated “hot” area of the disk. By placing this
model only requires only a few kilobytes to store.
hot area in high-speed media (e.g., NVRAM) or placing
it in the middle of the disk, we should reduce the overall
disk access time. We use as our evaluation metric the
6 Using the Predictions degree to which we induce a hot spot on the designated
area of the file system. We discuss how to benchmark
the resulting system, and measure its performance on our
Now that we have the ability to make predictions three traces.
about the future properties of a file based on its attributes
when it is created, the question remains what benefit we
can reap from this foresight.
One type of application that we believe can benefit
10
6.1 Benchmarking Attribute-Based Systems is finished, we may either fill in the missing values with
reasonable defaults or discard the incomplete items.
One of the difficulties of measuring the utility of Because we are using attribute-based models, we can-
attribute-based hints in the context of real file systems is not simply invent file attributes and hope that they will
finding a suitable benchmark. Synthetic workload gener- work. However, there is a danger that if we discard all
ators typically create files in a predictable and unrealistic the objects for which we have incomplete information,
manner – they make little or no attempt to use realis- we may lose a significant portion of workload. For the
tic file names or mimic the diverse behaviors of differ- experiment described in this section, we use only name
ent users. If we train our models on data gathered when attributes. After examining the traces we cannot find
these benchmarks are running then our predictions will names for fewer than than 5% of the files mentioned in
probably be unrealistically accurate, but if we train on a the workload (and typically much less). Therefore we
workload that does not include the benchmarks, then our believe that discarding these “anonymous files” does not
predictions for the files created by the benchmark will be alter the workload to an important degree.
uncharacteristically bad. Files or directories for which we cannot infer the par-
Our solution to this problem is to construct a ent are attached to the root directory, because from our
benchmark directly from traces of the target work- own experiments we have found that this is the direc-
load, thereby ensuring that the associations between file tory most likely to be cached on the client. For example,
names, modes, and uids during the trace will resemble we rarely see lookups for /home/username, because
those present in the actual workload. This leads imme- home directories are frequently accessed and rarely in-
diately to a new problem – in order to replay the traces, validated.
we need a real file system on which to play them. The The output of the first pass is a table of pathnames of
usual solution to this problem is to recreate the traced each file and directory observed in the trace along with a
file system from a snapshot of its metadata taken at a unique identifier for each object, and the size, mode, and
known time, and then begin replaying from that time other relevant information necessary to reconstruct the
[28]. This method works well when snapshots are avail- object. The purpose of the new identifier is to provide
able, and when a suitable device is available on which to a convenient substitute for the file handle that is inde-
reconstruct. Unfortunately we have neither – there are no pendent of the actual implementation of the file system.
publicly-available snapshots of the systems from which (File handles usually encode the mount point and inode
the traces were taken, and even if there were, reconstruct- numbers, and we cannot ensure that we will get the same
ing them would require at least 500GB of disk space and values when we reconstruct the file system.)
many hours of set-up time per test.
The second pass through the trace replaces all of the
To solve this problem, we have developed a new file handles in the trace with the unique identifiers cre-
method of performing a snapshot-less trace replay that ated in the first pass, and removes references to files for
uses the trace itself to reconstruct the subset of the file which no information could be inferred.
system necessary to replay a given section of the trace.
We call these sub-snapshots. In essence, our method is to Based on the table created after the first pass, we then
replay the trace several times, inferring knowledge about create a file system that matches the rewritten trace, and
the underlying file system by observing how it is used. replay the new trace on that file system. The result is
both realistic and repeatable.
The first pass reconstructs as much as it can of the
file system hierarchy, primarily by observing the param- Using this method, we constructed several sub-
eters and responses from lookup, getattr, create, snapshots for each workload. A typical hour of ac-
mkdir, rename, remove, and link calls. The idea tivity on these systems accesses files containing only
of discovering the file system hierarchy by snooping NFS five to ten GB of data (although there are hours when
calls is not new and has been in widespread use since the many directories are scanned, resulting in enormous and
technique was described by Blaze [3]. Unfortunately, as unwieldy sub-snapshots). One of the challenges with
other researchers have noted, this method is imperfect – DEAS03 and EECS03 is that there are apparently some
some of the information may be absent from the trace jobs that periodically scan large parts of the directory
because of missed packets or because it is cached on the hierarchy, checking the modification time of each file.
client during the trace period and thus never visible in the Since most of these files are never actually read or writ-
trace. To compensate for this missing data, we keep track ten, we could modify our sub-snapshot builder to recog-
of each file or directory that is accessed during the trace, nize this and treat these files differently (only creating a
but whose metadata we cannot infer. When the first pass short or empty file, instead of a file the same size as the
11
original). This would permit us to create sub-snapshots Heuristic Ops
Reads Writes
for a much larger fraction of the underlying file system. DEAS03
Perfect 26.17% 0.85% 42.28%
HotDir 0.57% 0.22% 0.76%
6.2 Increasing Locality of Reference HotFile 0.59% 0.00% 0.95%
HotDir+HotFile 1.10% 0.22% 1.60%
EECS03
As an example application, we explore the use of Perfect 23.89% 8.96% 41.61%
attribute-based hints to control the locality of block ref- HotDir 3.09% 1.11% 4.61%
erence by anticipating which blocks are likely to be hot HotFile 2.82% 0.00% 5.00%
and grouping them in the same cylinder. HotDir+HotFile 5.95% 1.15% 9.65%
CAMPUS
We use two methods to identify hot data blocks.
Perfect 3.90% 0.76% 11.28%
The first method, which we call HotName, automatically
HotDir 1.43% 0.58% 3.36%
classifies as hot any file that we predict will be short-lived
HotFile 1.13% 0.00% 3.70%
and/or zero-length. For this type of file, the overhead
HotDir+HotFile 2.60% 0.57% 7.23%
of creating and maintaining the inode and name the file
(i.e., the directory entry for the file) can be a large frac- Table 4: Average percentage of the total ops, reads, and
tion of the cost incurred by the file, and therefore there writes that fall in the 4MB target region of the disk for
may be benefit to reducing this overhead. The second each of the heuristics on DEAS03, EECS03, and CAM-
method, which we call HotDir, predicts which directo- PUS. The “Perfect” heuristic shows the maximum per-
ries are most likely to contain files that have the Hot- centage attainable by an algorithm with perfect knowl-
Name property. Since these directories are where the edge. The working set for these runs varies from 5-
names for the HotName files will be entered, there may 10GB.
be benefit from identifying them as well.
The model that we use for HotDir is constructed via
a method similar to ABLE, but unfortunately in our pro- 6.3 Results
totype requires some external logic because ABLE is fo-
cused on files and does not currently gather as much in-
formation about directories. In general, the HotDir rules To test our heuristics, we ran a series of one-hour
are that directories identified as home directories, mail trace replays for the hours noon-5pm for several days on
spool directories, and directories named Cache are clas- each of our traces. The models are trained on a Monday
sified as hot directories. (ABLE is capable of identifying (3/24/03 for DEAS03 and EECS03, 10/22/01 for CAM-
the mail and Cache directories as interesting, but does PUS), and the replays are constructed from the following
not currently have a “is-home-directory” attribute.) Tuesday through Thursday. Each hour-long replay be-
gins with 15 minutes to warm the cache. Then the block
To test the effect of HotDir and HotName, we have
counters are reset, and the test begins in earnest and runs
modified the FreeBSD implementation of FFS so that
for 45 minutes of replay time.
it uses a simplified predictor (similar in nature to the
ABLE predictor, but employing only name attributes, We designate a 4MB region as the target area for hot
and re-coded to live in the kernel environment) to predict objects. Our evaluation examines the distribution of ac-
whether each new directory has the HotDir property and tual accesses to the disk and compares the percentage
whether each new file has the HotName property. If so, that go to the target area to the theoretically maximum
it attempts to allocate blocks for that file or directory in number of accesses that would go to the hottest 4MB
a designated area of the disk. Our goal is to measure the region given perfect knowledge (i.e., if the hottest 256
increase in the number of accesses to this area of the disk 16KB blocks on the disk were allocated in the target re-
when we use policies guided by HotDir and HotName. gion).
We use two systems as our testbed. Both have a As shown in Table 4, both heuristics improve local-
1 GHz Pentium III processor, 1 GB of RAM, and run ity compared to the default layout policy, and using both
FreeBSD 4.8p3. Our experiments use the FreeBSD im- heuristics is an improvement over using either one alone.
plementation of FFS with 16KB blocks and soft-updates Write locality is increased more than read locality; this
enabled [12]. We have instrumented the device driver for is not surprising because directory contents are read-
the disk so that it keeps a count of how many reads and cached. Using both HotDir and HotName, we manage to
writes are done on each 16KB logical disk block. increase the number of accesses to two-thirds of that of
12
the hottest possible region on CAMPUS, and on EECS03 In addition to caching and on-disk layout optimiza-
nearly 6% of all the disk accesses during the trace are tion, we envision a much larger class of applications that
now confined to the target area. These percentages may will benefit from dynamic policy selection. Attribute-
seem small, but keep in mind that we are focusing only based classification of system failures and break-ins (or
on small files and directories, and normal file traffic is anomaly detection) is a natural adjunct to this work
the dominant cause of disk accesses in these workloads. (e.g., “has this file been compromised?”). Moreover,
through the same clustering techniques implemented by
our decision trees, we feel that semantic clustering can
be useful for locating information (e.g., “are these files
7 Conclusions related?”). Both of these are areas of future work.