SlideShare a Scribd company logo
GOOGLE FILE SYSTEM 
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 
Presented By – Ankit Thiranh
OVERVIEW 
• Introduction 
• Architecture 
• Characteristics 
• System Interaction 
• Master Operation and Fault tolerance and diagnosis 
• Measurements 
• Some Real world clusters and their performance
INTRODUCTION 
• Google – large amount of data 
• Need a good file distribution system to process its data 
• Solution: Google File System 
• GFS is : 
• Large 
• Distributed 
• Highly fault tolerant system
ASSUMPTIONS 
• The system is built from many inexpensive commodity components that often fail. 
• The system stores a modest number of large files. 
• Primarily two kind of reads: large streaming reads and small random needs. 
• Many large sequential writes append data to files. 
• The system must efficiently implement well-defined semantics for multiple clients that 
concurrently append to the same file. 
• High sustained bandwidth is more important than low latency.
ARCHITECTURE
CHARACTERISTICS 
• Single master 
• Chunk size 
• Metadata 
• In-Memory Data structures 
• Chunk Locations 
• Operational Log 
• Consistency Model (figure) 
• Guarantees by GFS 
• Implications for Applications 
Write Record Append 
Serial Success defined Defined 
interspersed with 
inconsistent 
Concurrent 
successes 
Consistent but 
undefined 
Failure inconsistent 
File Region State After Mutation
SYSTEM INTERACTION 
• Leases and Mutation Order 
• Data flow 
• Atomic Record appends 
• Snapshot 
Figure 2: Write Control and Data Flow
MASTER OPERATION 
• Namespace Management and Locking 
• Replica Placement 
• Creation, Re-replication, Rebalancing 
• Garbage Collection 
• Mechanism 
• Discussion 
• State Replica Detection
FAULT TOLERANCE AND DIAGNOSIS 
• High Availability 
• Fast Recovery 
• Chunk Replication 
• Master Replication 
• Data Integrity 
• Diagnostics tools
MEASUREMENTS 
Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves 
show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in 
some cases because of low variance in measurements.
REAL WORLD CLUSTERS 
• Two clusters were examined: 
• Cluster A used for Research and development by over a hundred users. 
• Cluster B is used for production data processing with occasional human 
intervention 
• Storage 
• Metadata 
Cluster A B 
Chunkservers 342 227 
Available disk Size 
72 TB 
Used Disk Space 
55 TB 
Characteristics of two GFS clusters 
180 TB 
155 TB 
Number of Files 
Number of Dead Files 
Number of chunks 
735 k 
22 k 
992 k 
737 k 
232 k 
1550 k 
Metadata at chunkservers 
Metadata at master 
13 GB 
48 MB 
21 GB 
60 MB
PERFORMANCE EVALUATION OF TWO 
CLUSTERS 
• Read and write rates and Master load 
Cluster A B 
Read Rate (last minute) 583 MB/s 380 MB/s 
Read Rate (last hour) 562 MB/s 384 MB/s 
Read Rate (since start) 589 MB/s 49 MB/s 
Write Rate (last minute) 1 MB/s 101 MB/s 
Write Rate (last hour) 2 MB/s 117 MB/s 
Write Rate (since start) 25 MB/s 13 MB/s 
Master ops (last minute) 325 Ops/s 533 Ops/s 
Master ops (last hour) 381 Ops/s 518 Ops/s 
Master ops (since start) 202 Ops/s 347 Ops/s 
Performance Metrics for Two GFS Clusters
WORKLOAD BREAKDOWN 
• Chunkserver Workload 
Operation Read Write Record Append 
Cluster X Y X Y X Y 
0K 0.4 2.6 0 0 0 0 
1B….1K 0.1 4.1 6.6 4.9 0.2 9.2 
1K…8K 65.2 38.5 0.4 1.0 18.9 15.2 
8K…64K 29.9 45.1 17.8 43.0 78.0 2.8 
64K….128K 0.1 0.7 2.3 1.9 < 0.1 4.3 
128K….256K 0.2 0.3 31.6 0.4 < 0.1 10.6 
256K…512K 0.1 0.1 4.2 7.7 < 0.1 31.2 
512K….1M 3.9 6.9 35.5 28.7 2.2 25.5 
1M..inf 0.1 1.8 1.5 12.3 0.7 2.2 
Operation Read Write Record Append 
Cluster X Y X Y X Y 
1B….1K < 0.1 <0.1 < 0.1 <0.1 < 0.1 <0.1 
1K…8K 13.8 3.9 < 0.1 <0.1 < 0.1 0.1 
8K…64K 11.4 9.3 2.4 5.9 78.0 0.3 
64K….128K 0.3 0.7 0.3 0.3 < 0.1 1.2 
128K….256K 0.8 0.6 16.5 0.2 < 0.1 5.8 
256K…512K 1.4 0.3 3.4 7.7 < 0.1 38.4 
512K….1M 65.9 55.1 74.1 58.0 0.1 46.8 
1M..inf 6.4 28.0 3.3 28.0 53.9 7.4 
Operations Break down by Size (% ) Bytes Transferred Breakdown by Operation Size (% )
WORKLOAD BREAKDOWN 
• Master Workload 
Cluster X Y 
Open 26.1 16.3 
Delete 0.7 1.5 
FindLocation 64.3 65.8 
FindLeaseHolder 7.8 13.4 
FindMatchingFiles 0.6 2.2 
All other combined 0.5 0.8 
Master Requests Break down by Type (% )
Google file system
Ad

More Related Content

What's hot (20)

Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Edureka!
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
Hariharan Ganesan
 
Google File System
Google File SystemGoogle File System
Google File System
guest2cb4689
 
Google File System
Google File SystemGoogle File System
Google File System
Junyoung Jung
 
MAC-Message Authentication Codes
MAC-Message Authentication CodesMAC-Message Authentication Codes
MAC-Message Authentication Codes
DarshanPatil82
 
Hash Function
Hash FunctionHash Function
Hash Function
Siddharth Srivastava
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Publish subscribe model overview
Publish subscribe model overviewPublish subscribe model overview
Publish subscribe model overview
Ishraq Al Fataftah
 
Issues in design_of_code_generator
Issues in design_of_code_generatorIssues in design_of_code_generator
Issues in design_of_code_generator
vinithapanneer
 
MD-5 : Algorithm
MD-5 : AlgorithmMD-5 : Algorithm
MD-5 : Algorithm
Sahil Kureel
 
Caching
CachingCaching
Caching
Nascenia IT
 
Hadoop
HadoopHadoop
Hadoop
Nishant Gandhi
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
symmetric key encryption algorithms
 symmetric key encryption algorithms symmetric key encryption algorithms
symmetric key encryption algorithms
Rashmi Burugupalli
 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
Krishna Gehlot
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
SHA 1 Algorithm.ppt
SHA 1 Algorithm.pptSHA 1 Algorithm.ppt
SHA 1 Algorithm.ppt
Rajapriya82
 
Virtual Machine provisioning and migration services
Virtual Machine provisioning and migration servicesVirtual Machine provisioning and migration services
Virtual Machine provisioning and migration services
ANUSUYA T K
 
GFS
GFSGFS
GFS
Suman Karumuri
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Edureka!
 
Google File System
Google File SystemGoogle File System
Google File System
guest2cb4689
 
MAC-Message Authentication Codes
MAC-Message Authentication CodesMAC-Message Authentication Codes
MAC-Message Authentication Codes
DarshanPatil82
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Publish subscribe model overview
Publish subscribe model overviewPublish subscribe model overview
Publish subscribe model overview
Ishraq Al Fataftah
 
Issues in design_of_code_generator
Issues in design_of_code_generatorIssues in design_of_code_generator
Issues in design_of_code_generator
vinithapanneer
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
symmetric key encryption algorithms
 symmetric key encryption algorithms symmetric key encryption algorithms
symmetric key encryption algorithms
Rashmi Burugupalli
 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
Krishna Gehlot
 
SHA 1 Algorithm.ppt
SHA 1 Algorithm.pptSHA 1 Algorithm.ppt
SHA 1 Algorithm.ppt
Rajapriya82
 
Virtual Machine provisioning and migration services
Virtual Machine provisioning and migration servicesVirtual Machine provisioning and migration services
Virtual Machine provisioning and migration services
ANUSUYA T K
 

Viewers also liked (7)

Google file system
Google file systemGoogle file system
Google file system
Dhan V Sagar
 
The google file system
The google file systemThe google file system
The google file system
Daniel Checchia
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
Azeem Mumtaz
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
Antonio Cesarano
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
Edahn Small
 
Google file system
Google file systemGoogle file system
Google file system
Dhan V Sagar
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
Azeem Mumtaz
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
Antonio Cesarano
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
Edahn Small
 
Ad

Similar to Google file system (20)

Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
DataStax Academy
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
DataStax Academy
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
Andrew Trossman
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGER
robertmccoll
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
DataStax Academy
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
Andrew Trossman
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGER
robertmccoll
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Ad

Recently uploaded (20)

How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Political History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptxPolitical History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 

Google file system

  • 1. GOOGLE FILE SYSTEM Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presented By – Ankit Thiranh
  • 2. OVERVIEW • Introduction • Architecture • Characteristics • System Interaction • Master Operation and Fault tolerance and diagnosis • Measurements • Some Real world clusters and their performance
  • 3. INTRODUCTION • Google – large amount of data • Need a good file distribution system to process its data • Solution: Google File System • GFS is : • Large • Distributed • Highly fault tolerant system
  • 4. ASSUMPTIONS • The system is built from many inexpensive commodity components that often fail. • The system stores a modest number of large files. • Primarily two kind of reads: large streaming reads and small random needs. • Many large sequential writes append data to files. • The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. • High sustained bandwidth is more important than low latency.
  • 6. CHARACTERISTICS • Single master • Chunk size • Metadata • In-Memory Data structures • Chunk Locations • Operational Log • Consistency Model (figure) • Guarantees by GFS • Implications for Applications Write Record Append Serial Success defined Defined interspersed with inconsistent Concurrent successes Consistent but undefined Failure inconsistent File Region State After Mutation
  • 7. SYSTEM INTERACTION • Leases and Mutation Order • Data flow • Atomic Record appends • Snapshot Figure 2: Write Control and Data Flow
  • 8. MASTER OPERATION • Namespace Management and Locking • Replica Placement • Creation, Re-replication, Rebalancing • Garbage Collection • Mechanism • Discussion • State Replica Detection
  • 9. FAULT TOLERANCE AND DIAGNOSIS • High Availability • Fast Recovery • Chunk Replication • Master Replication • Data Integrity • Diagnostics tools
  • 10. MEASUREMENTS Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements.
  • 11. REAL WORLD CLUSTERS • Two clusters were examined: • Cluster A used for Research and development by over a hundred users. • Cluster B is used for production data processing with occasional human intervention • Storage • Metadata Cluster A B Chunkservers 342 227 Available disk Size 72 TB Used Disk Space 55 TB Characteristics of two GFS clusters 180 TB 155 TB Number of Files Number of Dead Files Number of chunks 735 k 22 k 992 k 737 k 232 k 1550 k Metadata at chunkservers Metadata at master 13 GB 48 MB 21 GB 60 MB
  • 12. PERFORMANCE EVALUATION OF TWO CLUSTERS • Read and write rates and Master load Cluster A B Read Rate (last minute) 583 MB/s 380 MB/s Read Rate (last hour) 562 MB/s 384 MB/s Read Rate (since start) 589 MB/s 49 MB/s Write Rate (last minute) 1 MB/s 101 MB/s Write Rate (last hour) 2 MB/s 117 MB/s Write Rate (since start) 25 MB/s 13 MB/s Master ops (last minute) 325 Ops/s 533 Ops/s Master ops (last hour) 381 Ops/s 518 Ops/s Master ops (since start) 202 Ops/s 347 Ops/s Performance Metrics for Two GFS Clusters
  • 13. WORKLOAD BREAKDOWN • Chunkserver Workload Operation Read Write Record Append Cluster X Y X Y X Y 0K 0.4 2.6 0 0 0 0 1B….1K 0.1 4.1 6.6 4.9 0.2 9.2 1K…8K 65.2 38.5 0.4 1.0 18.9 15.2 8K…64K 29.9 45.1 17.8 43.0 78.0 2.8 64K….128K 0.1 0.7 2.3 1.9 < 0.1 4.3 128K….256K 0.2 0.3 31.6 0.4 < 0.1 10.6 256K…512K 0.1 0.1 4.2 7.7 < 0.1 31.2 512K….1M 3.9 6.9 35.5 28.7 2.2 25.5 1M..inf 0.1 1.8 1.5 12.3 0.7 2.2 Operation Read Write Record Append Cluster X Y X Y X Y 1B….1K < 0.1 <0.1 < 0.1 <0.1 < 0.1 <0.1 1K…8K 13.8 3.9 < 0.1 <0.1 < 0.1 0.1 8K…64K 11.4 9.3 2.4 5.9 78.0 0.3 64K….128K 0.3 0.7 0.3 0.3 < 0.1 1.2 128K….256K 0.8 0.6 16.5 0.2 < 0.1 5.8 256K…512K 1.4 0.3 3.4 7.7 < 0.1 38.4 512K….1M 65.9 55.1 74.1 58.0 0.1 46.8 1M..inf 6.4 28.0 3.3 28.0 53.9 7.4 Operations Break down by Size (% ) Bytes Transferred Breakdown by Operation Size (% )
  • 14. WORKLOAD BREAKDOWN • Master Workload Cluster X Y Open 26.1 16.3 Delete 0.7 1.5 FindLocation 64.3 65.8 FindLeaseHolder 7.8 13.4 FindMatchingFiles 0.6 2.2 All other combined 0.5 0.8 Master Requests Break down by Type (% )

Editor's Notes

  • #6: GFS – single master, multiple chunkservers, multiple client. Files- divided into chunks, chunks- immutable and globally unique 64 bit chunk handle. Stored in multiple chunkservers, master- contains metadata includes the namespace, access control information, mapping of file to chunks and current location of chunks
  • #7: Single Master- can make sophisticated chunk replacement and replication decisions using global knowledge. Read example Chunk Size – 64 MB, advantages – reduces client-master interation, client more likely to perform many operations on given chunk, reduces metadata size. Metadata – stores file and chunk namespaces, mapping from files to chunks, location to chunk’s relica, metadata stored in memory to do fast operations, chunk location – does not keep a record, polls at startup, monitor by sending heartbeat messages,operation log- contains a history of critical metadata changes. Guarantee- application mutation on same order to all the replicas , using chunk version numbers to detect any replica Consistent – all replicas have the same data, defined – consistent – defined and client can see what the mutation has written
  • #8: Mutation – operation that changes the content of metadata Data flow – bandwidth – data is [pushed linearly along the server, avoid bottlenecks and high-latency links- each machine forwards the data to closest possible, latency min – pipelining the data transfer over TCP connections. Record append – client specifies the data, GFS appends automatically, same way as control flow Snapshots – makes a copy of file or ‘directory tree’ minimizing any interruption with ongoing mutations
  • #9: Master – executes all namespace operations, manages chunk replicas, Namespace – GFS logically represent its namespace as a look up table mapping full path names to metadata. Replica placement - 1) maximise data reliability and availability, and 2) maximum bandwidth utilization Creation, re-replication – replicas on severs with below average disk utilization, limit recent creation on each chunk server, spread replicas of a chunk across racks Garbage collection – after deletion, file renamed to a hidden file, deleted after 3 days, orphaned chunks, State replica detection – chunkserver failure missing mutation while it is down, master assigns – chunk server numbers to distinguish
  • #10: Fast recovery – mast and chunk server designed such that they restore their data and start in two seconds Chunk replication – discussed earlier Master replication – operations log and checkpoints are replicated on multiple machines, shadow masters – provide read-only access Data integrity – uses checksumming to detect corruption of stored data, we can recover from corruption using replicas, but it is impractical Diagnostic tools – generate diagnostic logs that record many significant events. The RPC logs include the exact requests and responses sent on the wire, except for the file data being read or written.
  • #12: The two clusters have similar numbers of files, though B has a larger proportion of dead files, namely files which were deleted or replaced by a new version but whose storage have not yet been reclaimed. It also has more chunks because its files tend to be larger
  • #14: Read returns no data in Y b’coz applications in production system use file as producer-consumer queues cluster Y sees a much higher percentage of large record appends than cluster X does because our production systems, which use cluster Y, are more aggressively tuned for GFS