The Google File System: Firas Abuzaid
The Google File System: Firas Abuzaid
Firas Abuzaid
Why build GFS?
● Two kinds of reads: large streaming reads & small random reads
○ Large streaming reads usually read 1MB or more
○ Oftentimes, applications read through contiguous regions in the file
○ Small random reads are usually only a few KBs at some arbitrary offset
● Also many large, sequential writes that append data to files
○ Similar operation sizes to reads
○ Once written, files are seldom modified again
○ Small writes at arbitrary offsets do not have to be efficient
● Multiple clients (e.g. ~100) concurrently appending to a single file
○ e.g. producer-consumer queues, many-way merging
Interface
● For the namespace metadata, master does not use any per-directory data
structures – no inodes! (No symlinks or hard links, either.)
○ Every file and directory is represented as a node in a lookup table, mapping pathnames to
metadata. Stored efficiently using prefix compression (< 64 bytes per namespace entry)
● The master now has global knowledge of the whole system, which
drastically simplifies the design
● But the master is (hopefully) never the bottleneck
○ Clients never read and write file data through the master; client only requests from master
which chunkservers to talk to
○ Master can also provide additional information about subsequent chunks to further reduce
latency
○ Further reads of the same chunk don’t involve the master, either
Why a Single Master?
● Files are divided into fixed-size chunks, which has an immutable, globally
unique 64-bit chunk handle
○ By default, each chunk is replicated three times across multiple chunkservers (user can
modify amount of replication)
● Chunkservers store the chunks on local disks as Linux files
○ Metadata per chunk is < 64 bytes (stored in master)
■ Current replica locations
■ Reference count (useful for copy-on-write)
■ Version number (for detecting stale replicas)
Chunk Size
● 64 MB, a key design parameter (Much larger than most file systems.)
● Disadvantages:
○ Wasted space due to internal fragmentation
○ Small files consist of a few chunks, which then get lots of traffic from concurrent clients
■ This can be mitigated by increasing the replication factor
● Advantages:
○ Reduces clients’ need to interact with master (reads/writes on the same chunk only require one
request)
○ Since client is likely to perform many operations on a given chunk, keeping a persistent TCP
connection to the chunkserver reduces network overhead
○ Reduces the size of the metadata stored in master → metadata can be entirely kept in memory
GFS’s Relaxed Consistency Model
● Terminology:
○ consistent: all clients will always see the same data, regardless of which replicas they read
from
○ defined: same as consistent and, furthermore, clients will see what the modification is in
its entirety
● Guarantees:
Data Modifications in GFS