0% found this document useful (0 votes)
65 views

7 A Taxonomy and Survey On Distributed File Systems

Uploaded by

vikasbhowate
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

7 A Taxonomy and Survey On Distributed File Systems

Uploaded by

vikasbhowate
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Fourth International Conference on Networked Computing and Advanced Information Management

A Taxonomy and Survey on Distributed File Systems

Tran Doan Thanh1, Subaji Mohan1, Eunmi Choi1∗, SangBum Kim2, Pilsung Kim2
1
School of Business IT, Kookmin University, Seoul, Korea
[email protected], [email protected], [email protected]
2
SK Telecom Convergence and Internet R&D Center / SK T-Tower, Seoul, Korea
[email protected], [email protected]

Abstract developed over the years and almost two decades of


research have not succeeded in producing a fully-
Applications that process large volumes of data featured DFS[1, 2, 3].
(such as, search engines, grid computing applications, Multiple users who are physically dispersed in a
data mining applications, etc.) require a backend network of autonomous computers share in the use of a
infrastructure for storing data. The distributed file common file system. A useful way to view such a
system is the central component for storing data system is to think of it as a distributed implementation
infrastructure. There have been many projects focused of the timesharing file system abstraction. The
on network computing that have designed and challenge is in realizing this abstraction in an efficient,
implemented distributed file systems with a variety of secure and robust manner. In addition, the issues of file
architectures and functionalities. In this paper, we location and availability assume significance. One way
develop a comprehensive taxonomy for describing of increasing the availability of files within a DFS is by
distributed file system architectures and use this using the replication of files. Most of the replication
taxonomy to survey existing distributed file system techniques can be divided into two main categories
implementations in very large-scale network such as optimistic replication and pessimistic
computing systems such as Grids, Search Engines, etc. replication [5].
We use the taxonomy and the survey results to identify Another major bottleneck in the performance of
architectural approaches that have not been fully DFS is the dramatic improvements in the processor
explored in the distributed file system research. speeds. To overcome this limitation DFS uses caches
at various points [7] and these caches can be positioned
1. Introduction at either the file server or at the client [6]. To provide a
consistent view of the data seen by all clients in a DFS
The Distributed File System (DFS) is used to build and reliability in the case of failures, write operations
a hierarchical view of multiple file servers and shares are allowed to complete only after the data has been
on the network. Instead of having to think of a specific committed to stable storage. Therefore, the dominant
machine name for each set of files, the user will only loads on the file server are due to writes. Thus
have to remember one name; which will be the 'key' to allowing write-backs from client can reduce this write-
a list of shares found on multiple servers on the load on the server [7, 8].
network. Permanent Storage is a fundamental The ability to use commodity devices for easily and
abstraction in computing. A permanent storage consists economically scale-up is now very important in DFS’s
of a named set of objects that (1) come into existence because of the demand of large-scale distributed
by explicit creation, (2) are immune to temporary applications. This includes the incremental scalability
failures of the system, and (3) persist until explicitly which is the ability to add more devices to scale up the
destroyed. The naming structure, the characteristics of system in incremental fashion.
the objects, and the set of operations associated with The systems surveyed are Google File System
them characterize a specific refinement of the basic (GFS) [15], Lustre [16], Kosmos File System [17],
abstraction. A file system is one such refinement. Hadoop Distributed File System (Hadoop) [14],
A DFS is a file system that supports the sharing of Panasas [18], Parallel Virtual File System (PVFS2)
files in the form of persistent storage over a set of [19], and Redhat Global File System (RGFS) [20].
network connected nodes [4]. Many DFS’s have been Requirements for DFS were described and an abstract
functional model developed. The requirements and


Corresponding author: Eunmi Choi ([email protected]). This work is supported by SKT research project fund in 2008.

978-0-7695-3322-3/08 $25.00 © 2008 IEEE 144


DOI 10.1109/NCM.2008.162

Authorized licensed use limited to: National Science Council. Downloaded on May 21, 2009 at 10:19 from IEEE Xplore. Restrictions apply.
model were used to develop the taxonomy. This helped When we consider the location transparency it can
in identifying some of the key DFS approaches and be viewed as a binding issue. The binding of location
issues that are yet to be explored and we expect such to name is static and permanent when pathnames with
unexplored issues as topics of future research. A embedded machine names are used. The binding is less
comprehensive bibliography forms the importance of permanent in a system like Sun NFS. It is most
the paper. dynamic and flexible in Google and Hadoop. Usage
The structure of the paper is as follows. Sections 2 experience has confirmed the benefits of a fully
cover Background information about Distributed File dynamic location mechanism in a large distributed
System. In section 3, Taxonomy of Distributed File environment.
System is reviewed in detail. In Section 4 overview of Another major issue that attracts attention in the
different Distributed File Systems and comparison DFS is the failure of a machine (server or client),
between them are shown and in Section 5, Findings which cannot be distinguished from the failure of a
related to DFS are presented and in Section 6 communication link, or from slow responses due to
Discussion and Conclusion of the paper is outlined extreme overloading. Therefore, when a site does not
respond one cannot determine if the site has failed and
2. Background stopped processing, or if a communication link has
failed and the site is still operational. One must then
This review was started with the basic abstraction of assume that the inaccessible site is still capable of
DFS and developed taxonomy of issues in the design processing file requests. The file system protocol
of DFS. A major step in the evolution of DFS’s was should handle this case in such a way that the
the recognition that access to remote file could be consistency and semantic guarantees of the system will
made to resemble access to local files. This property, not be violated.
called network transparency, implies that any operation Based on this background issues we have developed
that can be performed on a local file may also be our taxonomy that need to be considered when
performed on a remote file. The extent to which an designing a DFS for grids, search engines etc. The
actual implementation meets this ideal is an important following section covers the taxonomy in detail.
measure of quality. The Newcastle Connection and
Cocanet [9] are two early examples of systems that 3. Taxonomy of Distributed File System
provided network transparency. In both cases the
name of the remote site was a prefix of a remote file The motivation behind developing this taxonomy is
name. to analyze the features that constitute the DFS and
DFS provides location transparency and redundancy helps to incorporate a most appropriate and suitable
to improve data availability in the face of failure or file system that performs better, fault tolerant and
heavy load by allowing shares in multiple different secured one.
locations to be logically grouped under one folder, or
DFS root. Many DFS performances are very low — Architecture
compared to the local file systems because they First Issue considered during the study was to find
perform synchronous I/O operations for cache the types of DFS architectures that are available.
coherence and data safety [10]. File systems such as Different DFS Architectures exists such as Client-
AFS [11] and NFS [12] present users with the Server Architectures (e.g. Sun Microsystem’s
abstraction of a single, coherent namespace shared Network File System) which provides a standardized
across multiple clients. Although caching data on local view of its local file system. This old fashion of DFS
clients improves performance, many file operations comes with a communication protocol that allows
still use synchronous message exchanges between clients to access the files stored on a server thus
client and server to maintain cache consistency and allowing a heterogeneous collection of processes
protect against client or server failure. running on different operating systems and machines
Structuring a distributed system is a demanding share a common file system. Advantage of this scheme
task, even if the size of the system is quite limited. But is that it is largely independent of local file systems.
the work becomes much more difficult when the scale Important issue is that it cannot be used in MS-DOS
of the system is very large. We need to consider that due to its short file names. Another type of
any viable distributed system architecture must support Architecture is Cluster-Based Distributed File
the notion of autonomy if it is to scale at all in the real System such as Google File Systems. It consists of a
world [13]. Single master along with multiple chunk servers and
divided into chunks of 64 Mbytes each. The advantage

145

Authorized licensed use limited to: National Science Council. Downloaded on May 21, 2009 at 10:19 from IEEE Xplore. Restrictions apply.
is its simplicity and it allows single master to control a networks and transport protocols. In RPC approach,
few hundred chunk servers. In the Cluster-based DFS, there are two communication protocols to consider,
there are three important features of architecture that which are TCP and UDP. TCP is mostly used by all
usually be considered during design are Decoupled DFS’s. However, UDP is also considered for
metadata and data, Reliable Autonomic Distributed improving performance in Hadoop. There is also a
Object Storage, and Dynamic Distributed Metadata completely different approach to handle
Management. Third type of architecture is Symmetric communication in DFS is Plan 9. It is mainly a file-
Architecture that is based on peer-to-peer technology. based distributed system and in this all resources are
It uses a DHT based system for distributing data, accessed in the same way, namely with file like syntax
combined with a key based lookup mechanism. In a and operations, including even resources such as
symmetric file system, the clients also host the processes and network interfaces. In this aspect, Lustre
metadata manager code, resulting in all nodes has considered a more flexible architecture in which
understanding the disk structures. In contrast, an they can provide Network Independence. Lustre can
Asymmetric Architecture file system is a file system be used over a wide variety of networks due to its use
in which there are one or more dedicated metadata of an open Network Abstraction Layer. Therefore, it
managers that maintain the file system and its provides unique support for heterogeneous networks.
associated disk structures. Examples include Panasas
ActiveScale, Lustre and traditional NFS file systems. — Naming
Finally, a Parallel Architecture file system is one in It plays an important role as each object has an
which data blocks are striped, in parallel, across associated logical path name and physical address. An
multiple storage devices on multiple storage servers. aggregation of all the logical path names comprises a
Support for parallel applications is provided allowing distributed name space which can be logically
all nodes access to the same files at the same time, thus partitioned into domain. The addresses of the objects
providing concurrent read and write capabilities. Most are used to access the objects in order to retrieve
of the current DFS’s support this important feature. An information from the distributed system. The naming
important note is that all of the above definitions structure of the file system, the application
overlap. A DFS can be symmetric or asymmetric. Its programming interface, the mapping of the file system
servers may be clustered or single servers. And it may abstraction on to physical storage media, and the
support parallel applications or it may not. Based on integrity of the file system across power, hardware,
the survey it is been identified that Multiple layers media and software failures. It is been identified in
architecture allows flexibility so that protocol or systems such as Network File System. Its fundamental
functional layers can be easily added. idea is to provide its clients complete transparent
access to a remote file system. The currently common
— Processes approach employs a central metadata server to
Even though DFS’s processes have no unusual manage file name space. Therefore decoupling
properties the important aspect concerning this is metadata and data improve the file namespace
whether they should be stateless or not. The primary throughput and relief the synchronization problem.
advantage of the stateless approach is simplicity. But it Another approach is metadata distributed in all
will be difficult to follow during implementation nodes resulting in all nodes understanding the disk
because locking a file cannot be done easily by a structure. But serious implication is users do not share
stateless server. Processes in some of the most name spaces due to security issues. It makes file
commonly used DFS’s are studied and its flaws are sharing harder. The different systems are studied and
analyzed. Except PVFS2, almost other DFS’s support analyzed to define the most appropriate naming
stateful processes. The major advantage of a stateless structure and method.
architecture is that clients can fail and resume without
disturbing the system as a whole. This feature allows — Synchronization
PVFS2 to scale to hundreds of servers and thousands The vital issue that is to be analyzed in the DFS is
of clients without being impacted by the overhead and Synchronization issue. In a distributed system, the
complexity of tracking file state or locking information Semantics of File Sharing becomes a bit tricky when
associated with these clients. performance issues are at stake. When a same file is
shared by two or more users, it is necessary to define
— Communication the semantics of reading and writing precisely to avoid
Most of the DFS’s use Remote Procedure Call problems. Even though it looks conceptually simple, it
method to communicate as they make the system is quite difficult to implement. There are few
independent from underlying operating systems, approaches that are available such as UNIX semantics,

146

Authorized licensed use limited to: National Science Council. Downloaded on May 21, 2009 at 10:19 from IEEE Xplore. Restrictions apply.
Session semantics, Immutable semantics, and name “Replication in pipeline” which is employed in
Transactions. Apart from semantics, we also consider GFS and Hadoop.
to analyze the File Locking System in the DFS.
Depending on the purpose of applications deploying on — Fault Tolerance
the DFS, it is developed with different locking Fault tolerance is very much related to the
mechanism. Major usages require Write-once-read- replication feature because replication is created to
many access model. However, there are applications provide availability and support transparency of
such as search engines require Multiple- failures to users. As mentioned in Consistency and
producer/single-consumer access model. GFS is the Replication section, there are two approaches for fault
infamous example for this model. To support their tolerance on object data: failure as exception and
access model, some systems choose to give locks on failure as norm. “Failure as exception” systems will
objects to clients, and some choose to perform all isolate the failure node or recover the system from last
operations synchronously on the server. Giving locks normal running state. “Failure as norm” systems
on objects to clients lead to one performance employ replication of all kind of data and execute re-
improvement by caching at client. Lustre is the one replication whenever replication ratio becomes unsafe.
that apply hybrid solution for File Locking System. In
Lustre, Locking mode is chosen differently depending — Security
on the resource contention level. The last issue we Authentication Issues and access control are some
study in synchronization problem is using leases, of the important security issues in DFS’s that need to
which is the most common method to control the be analyzed. Impact of decentralized authentication is
parallel access to DFS. also taken into consideration during the survey of the
DFS’s. Most DFS employ security with authenticat-
— Consistency and Replication ion, authorization and privacy by leveraging existing
To provide the consistency, most of DFS employ security systems. Yet, some DFS’s for specific
checksum to validate the data after sending through purposes such as GFS and Hadoop, base on the trust
communication network. Besides, Caching and between all nodes and clients so that they don’t employ
Replication play an important role in DFS, most no dedicated security mechanism in their architecture.
notable when they are designed to operate over wide-
area network. It can be done in quite few ways such as — Other Issues
Client-side caching and Server-Side replication. One important issue of DFS’s is the ability to use
There are two types of data need to be considered for commodity devices to build up the system. The
replication: metadata replication and data object advantage of this capability is whenever the
replication. Metadata is the most important part of the commodity devices are improved, the DFS is
whole DFS. Thus, all DFS provide a mechanism to automatically and naturally improved. Besides, it also
ensure the availability and recoverability of this data become very cost effective when there is the need to
such as backup metadata server and snapshot of scale up the system.
metadata with transaction logs. For data objects, there
are different approaches depending on the purpose of 4. Comparison of Distributed File Systems
applications. DFSs like Lustre and Panasas assume that
data object is available as long as the physical devices The systems surveyed are Google File System
are available. Hence, they consider a physical failure (GFS) [15], Lustre [16], Kosmos File System [17],
as exception and the object data can be lost. However, Hadoop Distributed File System (Hadoop) [14],
there are some systems like Lustre which supports Panasas [18], Parallel Virtual File System (PVFS2)
RAID0 model to store data to reduce the probability of [19], and Redhat Global File System (RGFS) [20].
loosing data and increase the access performance. In There are many more DFS in the literature such as
case of other DFSs like GFS and Hadoop, their NFS, AFS, QFS, and ZFS… However, due to space
applications require the availability of data as the limitation, their novelty and their representative, we
critical condition and failure will be the norm rather could not add them into our survey. We summarize the
than the exception. Thus, data objects are replicated in comparison in Table 1.
different servers. This high bandwidth consuming
feature leads to the asynchronous replication method

147

Authorized licensed use limited to: National Science Council. Downloaded on May 21, 2009 at 10:19 from IEEE Xplore. Restrictions apply.
Table 1: Overall Comparison of Different Distributed File Systems

File system GFS KFS Hadoop Lustre Panasas PVFS2 RGFS


Architecture Clustered-based, Clustered-based, Clustered-based, Clustered-based, Clustered-based, Clustered-based, Clustered-
asymmetric, asymmetric, asymmetric, asymmetric, asymmetric, symmetric, based,
parallel, object- parallel, object- parallel, object- parallel, object- parallel, object- parallel, symmetric,
based based based based based aggregation- parallel, block-
based based
Processes Stateful Stateful Stateful Stateful Stateful Stateless Stateful
Communication RPC/TCP RPC/TCP RPC/TCP&UDP Network RPC/TCP RPC/TCP RPC/TCP
Independence
Naming Central metadata Central metadata Central metadata Central metadata Central metadata Metadata Metadata
server server server server server distributed in all distributed in
nodes all nodes
Synchronization Write-once-read- Write-once-read- Write-once-read- Hybrid locking Give locks on No locking Give locks on
many, Multiple- many, give locks many, give locks mechanism, objects to clients method, no objects to
producer/single- on objects to on objects to using leases leases clients
consumer, give clients, using clients, using
locks on objects leases leases
to clients, using
leases
Consistency and Server side Server side Server side Server side Server side No replication, No replication
Replication replication, replication, replication, replication – replication – relaxed semantic
Asynchronous Asynchronous Asynchronous Only metadata Only metadata for consistency
replication, replication, replication, replication, replication
checksum, relax checksum checksum Client side
consistency caching,
among checksum
replications of
data objects
Fault tolerance Failure as norm Failure as norm Failure as norm Failure as Failure as Failure as Failure as
exception exception exception exception
Security No dedicated No dedicated No dedicated Security in the Security in the Security in the Security in the
security security security form of form of form of form of
mechanism mechanism mechanism authentication, authentication, authentication, authentication,
authorization authorization authorization authorization
and privacy and privacy and privacy and privacy

5. Findings
— Hadoop:
Based on the survey and taxonomy, the following Hadoop is a Distributed parallel fault tolerant file
findings on different DFS’s can help to select an system. It is designed to reliably store very large files
appropriate DFS according to the application and the across machines in a large cluster. It is inspired by the
requirements. Google File System. Hadoop DFS stores each file as a
sequence of blocks; all blocks in a file except the last
— Lustre: block are the same size. Blocks belonging to a file are
Lustre is a shared disk file system. Commonly used replicated for fault tolerance. The block size and
for large scale cluster computing. It is an open-standard replication factor are configurable per file. Files are
based system with great modularity and compatibility “write once” and have strictly one writer at any time.
with interconnects, networking components and
storage hardware. It is suitable for general purposes file — Google file system:
systems. Currently, it is only available for Linux. Google File System is a proprietary DFS developed
by Google for its own use. It is designed to provide
— Kosmos File System: efficient, reliable access to data using large clusters of
Kosmos Distributed File System (KFS), a high commodity hardware. In GFS files are huge by
performance DFS that supports applications whose traditional standards and are divided into chunks of 64
workload could be characterized as, Primarily write- megabytes. Most files are mutated by appending new
once/read-many workloads, Few millions of large files, data rather than overwriting existing data: once written,
where each file is on the order of a few tens of MB to a the files are only read and often only sequentially. It is
few tens of GB in size, Mostly sequential access. It also optimized to run on computing clusters, the nodes
provides high performance combined with availability of which consist of cheap, "commodity" computers,
and reliability. It is intended to be used as the backend which means precautions must be taken against the
storage infrastructure for data intensive apps such as, high failure rate of individual nodes and the data loss.
search engines, data mining, grid computing etc.

148

Authorized licensed use limited to: National Science Council. Downloaded on May 21, 2009 at 10:19 from IEEE Xplore. Restrictions apply.
— Panasas: [3] John Douceur and Roger Wattenhofer, "Optimizing file
It implements file system entirely in hardware. It is availability in a server-less distributed file system" In
suitable for general purposes file systems. To improve Proceedings of the 20th Symposium on Reliable Distributed
overall utilization of storage systems, network Systems, 2001.
[4] Eliezer levy and Abraham silberschatz, "Distributed File
performance and increasing access to vital data, Systems: Concepts and Examples", ACM Computing
Panasas has developed ActiveScale Storage cluster. By Surveys, Vol. 22, No. 4, December 1990..
combining a DFS with smart hardware, the Panasas [5]Yasushi Saito and Marc Shapiro, "Optimistic
Storage Cluster scales dramatically in both capacity Replication", ACM Computing Surveys, Vol. 37, No. 1,
and performance and extends appliance-like ease-of- March 2005, pp. 42-81.
manageability to a virtually boundless storage system. [6] Satyanarayanan, M., "A Survey of Distributed File
Systems," Technical Report CMU-CS-89- 116, Department
— PVFS2: of Computer Science, Camegie Mellon University, 1989
The data access is achieved without file or metadata [7] Howard, J.H., et al, "Scale and Performance in a
Distributed File System," ACM Transactions on Computer
locking. PVFS2 is best suited for I/O-intensive (i.e., Systems, Vol. 6, Issue 1, February 1988.
scientific) applications. PVFS2 could be used for high- [8] Nelson, M.N., et al. "Caching in the Sprite Network File
performance scratch storage where data is copied and System," ACM Transactions on Computer Systems,
simulation results are written from thousands of cycles February, 1988
simultaneously. [9] Rowe, L.A., Birman, K.P. “A Local Network Based on
the Unix Operating System”, IEEE Transactions on Software
— RGFS: Engineering SE-8(2), March, 1982.
It is an open-standard based system with great [10] Edmund B. Nightingale, Peter M. Chen, and Jason
modularity and compatibility with interconnects, Flinn, “Speculative Execution in a Distributed File System”,
ACM SOSP’05, October 23–26, 2005, Brighton, United
networking components and storage hardware. Besides, Kingdom.
it is a relatively low-cost, SAN-based technology. It is [11]Howard,J.H.,Kazar,M.L.,Menees,S.G.,Nichols, D. A.,
suitable for general purposes file systems. However, it Satyanarayanan, M., Sidebotham, R. N., and West, M. J.
is only available on Red Hat Enterprise Linux. “Scale and performance in a distributed file system”, ACM
Transactions on Computer Systems, Vol. 6, Issue1, February
6. Conclusion 1988
[12] Callaghan, B., Pavlowski, B., and Staubach, P., “NFS
The DFS is one of the most important and widely- Version 3 Protocol Specification”, Technical Report RFC
used form of shared permanent storage. The 1813, IETF, June 1995.
continuing interest in DFS bears testimony to the [13] Alonso, Rafael and Luis L. Cova, “Resource Sharing in
a Distributed Environment,” Proceedings of ACM SIGOPS
robustness of this model of data sharing. As elaborated European workshop, Cambridge, England, September, 1988.
in the preceding section, architecture, naming, [14] The Hadoop Distributed File System
synchronization, availability, heterogeneity and http://hadoop.apache.org/core/docs/current/hdfs_design.html
support for databases will be key issues that are to be [15] Ghemawat, S., Gobioff, H., Leung, S.T., “The Google
taken into consideration while designing the DFS. file system”, ACM SIGOPS Operating Systems Review,
Security will continue to be a serious concern and may, Volume 37 , Issue 5, pp. 29-43, December, 2003.
in fact, turn out to be the big concern for large [16] Braam, P.J, “The Lustre storage architecture”, White
distributed systems. In this paper, taxonomy was Paper, Cluster File Systems, Inc., October, 2003.
developed for the DFS and based on the taxonomy [17] “KOSMOS DISTRIBUTED FILE SYSTEM”,
http://kosmosfs.sourceforge.net/
some of the most popular and common distributed file [18] Nagle, D., Serenyi, D., Matthews, A., “The Panasas
were reviewed and surveyed. The features, its ActiveScale Storage Cluster: Delivering Scalable High
advantages and disadvantages of each DFS are outlined Bandwidth Storage”, Proceedings of the 2004 ACM/IEEE
and in detailed and also outline the findings that conference on Supercomputing, pp. 53-, 2004.
enables to select an appropriate one according to their [19] Yu, W., Liang, Sh., Panda, D.K., “High performance
needs. support of parallel virtual file system (PVFS2) over
Quadrics”, Proceedings of the 19th annual international
7. References conference on Supercomputing, pp. 323-331, 2005.
[20] “Red Hat Global File System”, White Paper,
[1] Chandramohan A. Thekkath, et al, "Frangipani: A www.redhat.com/whitepapers/rha/gfs/GFS_INS0032US.pdf.
scalable Distributed File System", System Research Center,
Digital Equipment Corporation, Palo Alto, CA, 1997.
[2] Barbara Liskov, et al, "Replication in the Harp File
System", Laboratory of Computer Science, MIT, Cambridge,
CA, 1991.

149

Authorized licensed use limited to: National Science Council. Downloaded on May 21, 2009 at 10:19 from IEEE Xplore. Restrictions apply.

You might also like