SlideShare a Scribd company logo
Google File System Suman Karumuri Andy Bartholomew Justin Palmer
GFS High performing,  scalable, distributed file system. Batch oriented , data-intensive apps. Fault-tolerant. Inexpensive commodity hardware.
Design Assumptions Inexpensive commodity hardware. Modest number of large files. Large streaming reads, small random reads. (map-reduce) Mostly appends. Consistent concurrent execution is important. High throughput and low latency.
API Open and close Create and delete Read and write Record append Snapshot
Architecture
Architecture GFS Master GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Files and  Chunks Files are divided into 64MB chunks. Each Chunk has globally unique 64-bit handle. Design trade off Optimized for large file sizes for high throughput.  Have very few small files.  Highly contended small files have large replication factor.
GFS  Chunk Servers Manage chunks. Tells master what chunks it has Store chunks as files. Commodity Linux machines. Maintain data consistency of chunks. Design trade off Chunk server knows what chunks are good No need to keep Master and Chunk server in sync
GFS Master Manages file namespace operations. Manages file meta-data. Manages chunks in chunk servers. Creation/deletion. Placement. Load balancing. Maintains replication. Uses a checkpointed  operation log for replication.
Create Operation
Create GFS Master Create /home/user/filename GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Create GFS Master Update operation log update metadata rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Create GFS Master Update operation log update metadata choose locations for chunks across multiple racks across multiple networks machines with low contention machines with low disk use rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Create GFS Master Update operation log update metadata choose locations for chunks rack 2 rack 1 Returns chunk handle, Chunk locations GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Namespaces Syntax for file access same as regular file system /home/user/foo Semantics are different No directory structures Paths exist for fine-grained locking Paths stored using prefix compression No symbolic or hard links.
Locking example Write /home/user/foo Acquires read locks on /home, /home/user Acquires write lock on /home/user/foo Delete /home/user/foo Acquires read lock on /home, /home/user Acquires write lock on /home/user/foo Must wait for write to finish.
Locking Design trade off Simple design Supports concurrent mutations in same directory Canonical lock order prevents deadlocks
Read Operation
Read Operation GFS Master filename and chunk index GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Read Operation GFS Master chunk handle, server locations GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Read Operation GFS Master Chunk handle, bit range GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Read Operation GFS Master Data GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write Except without primary, use master Explain dataflow, and control flow  Dataflow pushing optimization
Write Operation
Write GFS Master Chunk id,  chunk offset GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master Chunkserver locations (caches this) GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master GFS Chunk  Server … data Pass along data to nearest replica GFS Client Application GFS Chunk  Server GFS Chunk  Server
Write GFS Master Serializes all concurrent writes GFS Chunk  Server operation GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master GFS Chunk  Server serialized order of writes GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master GFS Chunk  Server ack ack ack GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write GFS Master GFS Chunk  Server ack, chunk index GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write under failure GFS Master GFS Chunk  Server ack ack GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write under failure GFS Master GFS Chunk  Server retry GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Leases Master is bottleneck. Designates a primary chunk server to handle mutations and serialization.
Write with primary GFS Master Chunk id chunk offset GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write with primary GFS Master Chunkserver locations (caches this) GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write with primary GFS Master GFS Chunk  Server … data Pass along data to nearest replica GFS Client Application GFS Chunk  Server GFS Chunk  Server
Write with primary GFS Master Serializes all concurrent writes GFS Chunk  Server operation GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write with primary GFS Master GFS Chunk  Server serialized operations GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write with primary GFS Master GFS Chunk  Server ack ack GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Write with primary GFS Master GFS Chunk  Server Ack, chunk index GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Failures during writes Chunk boundary overflow Replicas going down retry
Write with primary Leases etc
Record Append
Record append GFS Master Chunk id GFS Chunk  Server GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Record append GFS Master GFS Chunk  Server Ack, chunk index  from end of file GFS Client Application GFS Chunk  Server GFS Chunk  Server …
Record Append Operation
Record append Most common mutation. Write location determined by GFS. Data is atomically appended at least once. Append can’t be more than ¼ size of chunk to optimize chunk occupancy.
Consistency Model
Write -Single process Chunk 1 9:Hello Chunk 1’ 9:Hello
Write – Single process Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
Same with any failed mutation Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
Multiple Writers Chunk 1 9:Hello 10:Wor12345 Chunk 1’ 9:Hello 10:Wor12345 Write(“World”,10:0) Write(“12345”,10:3) Consistent and Undefined
Append Chunk 1 9:Hello 10:World Chunk 1’ 9:Hello Append(“World”) Inconsistent and Undefined retry
Append Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
Same for append with multiple writers Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
Consistency model Chunks are not bitwise identical. Consistent – all servers agree. Defined – Consistent and data as written by one mutation. Fine for Map-Reduce.  Applications can differentiate defined from undefined regions.
Snapshot Snapshot of a file or dir. Should be fast, minimal data overhead. On a snapshot call: Revokes leases. Logs the operation. Copies meta data and makes new chunks pointing to same data. Copy on write is used to create actual chunks.
Delete Operation Meta data operation. Renames file to special name. After certain time, deletes the actual chunks. Supports undelete for limited time. Actual lazy garbage collection Master deletes meta data Piggybacks active chunk list on  HeartBeat . Chunk servers delete files.
Delete API Design trade off Simple design Can do when master is free. Quick logical deletes. Good when failure is common. Difficult to tune when storage is tight. But, there are workarounds.
Fault Tolerance for chunks Re-replication – maintains replication factor. Rebalancing  Load balancing Disk space usage. Data integrity Checksum for each chunk divided into 64KB blocks. Checksum is checked every time an application reads the data.
Fault tolerance for master Master  Replication and checkpointing of Operation Log Shadow Masters. Read Check pointed operation log. Doesn’t make meta data changes. Reduces load on master. Might have stale data.
Fault tolerance for Chunk Server All chunks are versioned. Version number updated when a new lease is granted. Chunks with old versions are not served and are deleted.
High Availability Fast recovery Of Masters and chunk servers. HeartBeat  messages Checking liveness of chunkservers Piggybacking GC commands Lease renewal Diagnostic tools.
Performance metrics.
Conclusions
Conclusions Extremely cheap hardware High failure rate Highly concurrent reads and writes Highly scalable Supports undelete (for configurable time)
Conclusions … Built for map-reduce Mostly appends and scanning reads Mostly large files Requires high throughput Developers understand the limitations and tune apps to suit  GFS.
Thank you?
Design goals Component failures are a norm. Files are huge (2GB is common). Files are appended. Application (map-reduce) and file-system are designed together.
Ad

More Related Content

What's hot (20)

google file system
google file systemgoogle file system
google file system
diptipan
 
Clustering and High Availability
Clustering and High Availability Clustering and High Availability
Clustering and High Availability
Information Technology
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
AbDul ThaYyal
 
Google File System - GFS Presentation Slides PPT
Google File System - GFS Presentation Slides PPTGoogle File System - GFS Presentation Slides PPT
Google File System - GFS Presentation Slides PPT
DiwasPandey3
 
6 understanding DHCP
6 understanding DHCP6 understanding DHCP
6 understanding DHCP
Hameda Hurmat
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
Dilum Bandara
 
NFS(Network File System)
NFS(Network File System)NFS(Network File System)
NFS(Network File System)
udamale
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
HostedbyConfluent
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
Viet-Trung TRAN
 
Distributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationDistributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communication
MNM Jain Engineering College
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
Dr Sandeep Kumar Poonia
 
PBFT
PBFTPBFT
PBFT
Anna Yudina
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
 
Google File System
Google File SystemGoogle File System
Google File System
Junyoung Jung
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
Dr Neelesh Jain
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
SVijaylakshmi
 
11. dfs
11. dfs11. dfs
11. dfs
Dr Sandeep Kumar Poonia
 
Chapter 29 Domain Name System.ppt
Chapter 29 Domain Name System.pptChapter 29 Domain Name System.ppt
Chapter 29 Domain Name System.ppt
webhostingguy
 
google file system
google file systemgoogle file system
google file system
diptipan
 
Google File System - GFS Presentation Slides PPT
Google File System - GFS Presentation Slides PPTGoogle File System - GFS Presentation Slides PPT
Google File System - GFS Presentation Slides PPT
DiwasPandey3
 
6 understanding DHCP
6 understanding DHCP6 understanding DHCP
6 understanding DHCP
Hameda Hurmat
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
Dilum Bandara
 
NFS(Network File System)
NFS(Network File System)NFS(Network File System)
NFS(Network File System)
udamale
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
HostedbyConfluent
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
Viet-Trung TRAN
 
Distributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationDistributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communication
MNM Jain Engineering College
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
Dr Sandeep Kumar Poonia
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
Dr Neelesh Jain
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
SVijaylakshmi
 
Chapter 29 Domain Name System.ppt
Chapter 29 Domain Name System.pptChapter 29 Domain Name System.ppt
Chapter 29 Domain Name System.ppt
webhostingguy
 

Similar to GFS (20)

Google file system
Google file systemGoogle file system
Google file system
Anurag Gautam
 
Gfs
GfsGfs
Gfs
ravi kiran
 
Demo 0.9.4
Demo 0.9.4Demo 0.9.4
Demo 0.9.4
eTimeline, LLC
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
Google File System
Google File SystemGoogle File System
Google File System
DreamJobs1
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Виталий Стародубцев
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
diptipan
 
Gfs介绍
Gfs介绍Gfs介绍
Gfs介绍
yiditushe
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
Lalit Rastogi
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and SeastarAdventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Dsm (Distributed computing)
Dsm (Distributed computing)Dsm (Distributed computing)
Dsm (Distributed computing)
Sri Prasanna
 
Google File System
Google File SystemGoogle File System
Google File System
guest2cb4689
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
vSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformancevSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting Performance
ProfessionalVMware
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
Ron Kuris
 
Measuring Firebird Disk I/O
Measuring Firebird Disk I/OMeasuring Firebird Disk I/O
Measuring Firebird Disk I/O
Mind The Firebird
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
Google File System
Google File SystemGoogle File System
Google File System
DreamJobs1
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Виталий Стародубцев
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
diptipan
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
Lalit Rastogi
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Adventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and SeastarAdventures in Thread-per-Core Async with Redpanda and Seastar
Adventures in Thread-per-Core Async with Redpanda and Seastar
ScyllaDB
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Dsm (Distributed computing)
Dsm (Distributed computing)Dsm (Distributed computing)
Dsm (Distributed computing)
Sri Prasanna
 
Google File System
Google File SystemGoogle File System
Google File System
guest2cb4689
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
vSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting PerformancevSphere vStorage: Troubleshooting Performance
vSphere vStorage: Troubleshooting Performance
ProfessionalVMware
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
Ron Kuris
 
Ad

More from Suman Karumuri (10)

Monorepo at Pinterest
Monorepo at PinterestMonorepo at Pinterest
Monorepo at Pinterest
Suman Karumuri
 
Pintrace: Distributed tracing @Pinterest
Pintrace: Distributed tracing @PinterestPintrace: Distributed tracing @Pinterest
Pintrace: Distributed tracing @Pinterest
Suman Karumuri
 
Pintrace: Distributed tracing@Pinterest
Pintrace: Distributed tracing@PinterestPintrace: Distributed tracing@Pinterest
Pintrace: Distributed tracing@Pinterest
Suman Karumuri
 
PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup
Suman Karumuri
 
Phobos
PhobosPhobos
Phobos
Suman Karumuri
 
Gpu Join Presentation
Gpu Join PresentationGpu Join Presentation
Gpu Join Presentation
Suman Karumuri
 
Dream Language!
Dream Language!Dream Language!
Dream Language!
Suman Karumuri
 
Bittorrent
BittorrentBittorrent
Bittorrent
Suman Karumuri
 
Practical Byzantine Fault Tolerance
Practical Byzantine Fault TolerancePractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance
Suman Karumuri
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
Suman Karumuri
 
Pintrace: Distributed tracing @Pinterest
Pintrace: Distributed tracing @PinterestPintrace: Distributed tracing @Pinterest
Pintrace: Distributed tracing @Pinterest
Suman Karumuri
 
Pintrace: Distributed tracing@Pinterest
Pintrace: Distributed tracing@PinterestPintrace: Distributed tracing@Pinterest
Pintrace: Distributed tracing@Pinterest
Suman Karumuri
 
PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup PinTrace Advanced AWS meetup
PinTrace Advanced AWS meetup
Suman Karumuri
 
Practical Byzantine Fault Tolerance
Practical Byzantine Fault TolerancePractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance
Suman Karumuri
 
Ad

Recently uploaded (20)

DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 

GFS

  • 1. Google File System Suman Karumuri Andy Bartholomew Justin Palmer
  • 2. GFS High performing, scalable, distributed file system. Batch oriented , data-intensive apps. Fault-tolerant. Inexpensive commodity hardware.
  • 3. Design Assumptions Inexpensive commodity hardware. Modest number of large files. Large streaming reads, small random reads. (map-reduce) Mostly appends. Consistent concurrent execution is important. High throughput and low latency.
  • 4. API Open and close Create and delete Read and write Record append Snapshot
  • 6. Architecture GFS Master GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 7. Files and Chunks Files are divided into 64MB chunks. Each Chunk has globally unique 64-bit handle. Design trade off Optimized for large file sizes for high throughput. Have very few small files. Highly contended small files have large replication factor.
  • 8. GFS Chunk Servers Manage chunks. Tells master what chunks it has Store chunks as files. Commodity Linux machines. Maintain data consistency of chunks. Design trade off Chunk server knows what chunks are good No need to keep Master and Chunk server in sync
  • 9. GFS Master Manages file namespace operations. Manages file meta-data. Manages chunks in chunk servers. Creation/deletion. Placement. Load balancing. Maintains replication. Uses a checkpointed operation log for replication.
  • 11. Create GFS Master Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 12. Create GFS Master Update operation log update metadata rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 13. Create GFS Master Update operation log update metadata choose locations for chunks across multiple racks across multiple networks machines with low contention machines with low disk use rack 2 rack 1 Create /home/user/filename GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 14. Create GFS Master Update operation log update metadata choose locations for chunks rack 2 rack 1 Returns chunk handle, Chunk locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 15. Namespaces Syntax for file access same as regular file system /home/user/foo Semantics are different No directory structures Paths exist for fine-grained locking Paths stored using prefix compression No symbolic or hard links.
  • 16. Locking example Write /home/user/foo Acquires read locks on /home, /home/user Acquires write lock on /home/user/foo Delete /home/user/foo Acquires read lock on /home, /home/user Acquires write lock on /home/user/foo Must wait for write to finish.
  • 17. Locking Design trade off Simple design Supports concurrent mutations in same directory Canonical lock order prevents deadlocks
  • 19. Read Operation GFS Master filename and chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 20. Read Operation GFS Master chunk handle, server locations GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 21. Read Operation GFS Master Chunk handle, bit range GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 22. Read Operation GFS Master Data GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 23. Write Except without primary, use master Explain dataflow, and control flow Dataflow pushing optimization
  • 25. Write GFS Master Chunk id, chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 26. Write GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 27. Write GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  • 28. Write GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 29. Write GFS Master GFS Chunk Server serialized order of writes GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 30. Write GFS Master GFS Chunk Server ack ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 31. Write GFS Master GFS Chunk Server ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 32. Write under failure GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 33. Write under failure GFS Master GFS Chunk Server retry GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 34. Leases Master is bottleneck. Designates a primary chunk server to handle mutations and serialization.
  • 35. Write with primary GFS Master Chunk id chunk offset GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 36. Write with primary GFS Master Chunkserver locations (caches this) GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 37. Write with primary GFS Master GFS Chunk Server … data Pass along data to nearest replica GFS Client Application GFS Chunk Server GFS Chunk Server
  • 38. Write with primary GFS Master Serializes all concurrent writes GFS Chunk Server operation GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 39. Write with primary GFS Master GFS Chunk Server serialized operations GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 40. Write with primary GFS Master GFS Chunk Server ack ack GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 41. Write with primary GFS Master GFS Chunk Server Ack, chunk index GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 42. Failures during writes Chunk boundary overflow Replicas going down retry
  • 43. Write with primary Leases etc
  • 45. Record append GFS Master Chunk id GFS Chunk Server GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 46. Record append GFS Master GFS Chunk Server Ack, chunk index from end of file GFS Client Application GFS Chunk Server GFS Chunk Server …
  • 48. Record append Most common mutation. Write location determined by GFS. Data is atomically appended at least once. Append can’t be more than ¼ size of chunk to optimize chunk occupancy.
  • 50. Write -Single process Chunk 1 9:Hello Chunk 1’ 9:Hello
  • 51. Write – Single process Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  • 52. Same with any failed mutation Chunk 1 9:Hello 10: World Chunk 1’ 9:Hello Write(“World”, 10 ) Inconsistent State
  • 53. Multiple Writers Chunk 1 9:Hello 10:Wor12345 Chunk 1’ 9:Hello 10:Wor12345 Write(“World”,10:0) Write(“12345”,10:3) Consistent and Undefined
  • 54. Append Chunk 1 9:Hello 10:World Chunk 1’ 9:Hello Append(“World”) Inconsistent and Undefined retry
  • 55. Append Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  • 56. Same for append with multiple writers Chunk 1 9:Hello 10:World 11:World Chunk 1’ 9:Hello 11:World Append(“World”) Defined interspersed with inconsistent 11 11
  • 57. Consistency model Chunks are not bitwise identical. Consistent – all servers agree. Defined – Consistent and data as written by one mutation. Fine for Map-Reduce. Applications can differentiate defined from undefined regions.
  • 58. Snapshot Snapshot of a file or dir. Should be fast, minimal data overhead. On a snapshot call: Revokes leases. Logs the operation. Copies meta data and makes new chunks pointing to same data. Copy on write is used to create actual chunks.
  • 59. Delete Operation Meta data operation. Renames file to special name. After certain time, deletes the actual chunks. Supports undelete for limited time. Actual lazy garbage collection Master deletes meta data Piggybacks active chunk list on HeartBeat . Chunk servers delete files.
  • 60. Delete API Design trade off Simple design Can do when master is free. Quick logical deletes. Good when failure is common. Difficult to tune when storage is tight. But, there are workarounds.
  • 61. Fault Tolerance for chunks Re-replication – maintains replication factor. Rebalancing Load balancing Disk space usage. Data integrity Checksum for each chunk divided into 64KB blocks. Checksum is checked every time an application reads the data.
  • 62. Fault tolerance for master Master Replication and checkpointing of Operation Log Shadow Masters. Read Check pointed operation log. Doesn’t make meta data changes. Reduces load on master. Might have stale data.
  • 63. Fault tolerance for Chunk Server All chunks are versioned. Version number updated when a new lease is granted. Chunks with old versions are not served and are deleted.
  • 64. High Availability Fast recovery Of Masters and chunk servers. HeartBeat messages Checking liveness of chunkservers Piggybacking GC commands Lease renewal Diagnostic tools.
  • 67. Conclusions Extremely cheap hardware High failure rate Highly concurrent reads and writes Highly scalable Supports undelete (for configurable time)
  • 68. Conclusions … Built for map-reduce Mostly appends and scanning reads Mostly large files Requires high throughput Developers understand the limitations and tune apps to suit GFS.
  • 70. Design goals Component failures are a norm. Files are huge (2GB is common). Files are appended. Application (map-reduce) and file-system are designed together.