SlideShare a Scribd company logo
Bigtable : A Distributed
Storage System for Structured
Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh,
Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes,
Robert E. Gruber
Google, Inc.
1
Index
 Introduction
 Data Model
 API
 Building Blocks
 Implementation
 Refinements
 Real Applications
 Conclusions
2
Introduction
1. Motivation
2. What is a Bigtable?
3. Why not a DBMS?
3
Introduction : Motivation
 Lot of structured data at Google
◦ Web page, Geographic Info. , User data,
Mail
 Millions of machines
 Different projects/applications
4
Introduction : Why not a
DBMS?
 Provide more than Google needs
 Required DB with wide scalability,
wide applicability, high performance
and high availability
 Low-level storage optimizations help
performance significantly
 Cost would be very high
◦ Most DBMSs require very expensive
infrastructure
5
Introduction : What is a
Bigtable?
 Bigtable is a distributed storage
system for managing structured data
 Achieved several goals
◦ wide applicability, scalability, high
performance
 Scalable
◦ Terabytes of in-memory data
◦ Petabyte of disk-based data
◦ Millions of reads/writes per second, efficient
scans
 Self-managing
◦ Servers can be added/removed
dynamically 6
Data Model
1. Row
2. Column families
3. Timestamps
7
Data Model : Row
 The row keys in a table are arbitrary
strings
 Data is maintained lexicographic older
by row key
 Row range is called a “tablet”, which is
the unit of distribution and load
balancing
 Sorted by row key in tablet
8
Data Model : Column Families
 Column keys are grouped into sets
called “column families”
 Basic unit of access control
 A column key is named using the this
syntax “ family:qualifier”
 Access control and disk/memory accounting
are performed at the columns-family level
9
Data Model : Timestamps
 Each cell in a Bigtable can contain
multiple versions of the same data
 sorted by timestamp order by
descending
 64-bit integers
 real time in microseconds or assigned
by client application
10
Data Model : Example
11
Row
Columns Columns family
Timestamps
API
 The Bigtable API provieds functions
◦ Create/delete table and column families
◦ Change table, column family metadata
◦ Look up values from individual rows
◦ Iterate over a subset of the data
 Supports single-row trancsactions
 Can be used with MapReduce(HBase)
12
API : Example
 Uses a Scanner to iterate over all
anchors in particular row
Table *T = OpenOrDie(“/bigtable/web/webtable”);
13
Building Blocks
 Uses the distributed Google File
System(GFS) to store log and data
files
 A Bigtable cluster typically operates in
a shared pool of machines
 Depend on cluster management
system
 The Google SSTable file format is
used internally to store Bigtable data
 Relies on a highly-available and 14
Building Blocks :
GFS & SSTable & Chubby
 Google File System:
◦ Google File System grew out of an earlier
Google effort, "BigFiles”
◦ Select for high data throughputs
15
Building Blocks :
GFS & SSTable & Chubby
 SSTable:
◦ provides a persistent, ordered map from
keys to values
◦ Contains a sequence of index block
16
Building Blocks :
GFS & SSTable & Chubby
 Chubby:
◦ ensure that there is at most one active
master at any time
◦ store the bootstrap location of Bigtable
data
◦ discover tablet servers and finalize tablet
server deaths
◦ store Bigtable schema information (the
column family information for each table)
17
Implementation
1. Tablet Location
2. Tablet Assignment
3. Tablet Serving
18
Implementation
 Three major components
◦ Library that is linked every client
◦ One master server
◦ Many tablet servers
19
Implementation : Tablet
Location
 Use three-level hierarchy analogous to that
of a B+tree to store tablet location
information
(Maximum three level)
 The first level is a file stored in Chubby that
contains the location of the root tablet
20
Implementation : Tablet
Location
 Root tablet
◦ First tablet in the METADATA table
◦ Never split to ensure that the tablet
location hierarchy has no more than three
levels
 METADATA tablet
◦ Stores the location of a tablet under a row
key that is an encoding of the tablet’s
table identifier and its end row
21
Implementation : Tablet
Assignment
 Master server
◦ assign tablets to tablet servers
◦ detect presence of absence(expiration) of
tablet servers
◦ balance tablet-server load
◦ handle schema changes such as table and
column family creations
 Tablet server
◦ manage a set of tablets(ten to a thousand
tablets per tablet server)
◦ handle read/write requests to the tablets
◦ split tablets that have grown too large
Implementation : Tablet
Serving
 Updates are committed to a commit
log that stores redo records.
 Recently committed ones are store in
memtable
 Older updates are stored in a
sequence of SSTables
23
Refinements
1. Locality groups
2. Compression
3. Caching for read performance
4. Bloom filters
5. Commit-log implementation
24
Refinements
 Locality groups
◦ Client can group multiple column families
together into a locality group
 Compression
◦ We benefit in that small portions of an
SSTable can be read without
decompressing the entire file
◦ Encode at 100-200MB/s
◦ Decode at 400-1000MB/s
◦ 10-to-1 reduction in space
25
Refinements
 Caching for read performance
◦ Tablet servers use two levels of caching
 Scan/Block Cache
 Bloom filters
◦ Should be created for SSTable in a
particular locality group
 Commit-log implementation
◦ Co-mingling mutations for different tablets
in the same physical log file
26
Real Applications
1. Google Analytics
2. Personalized Search
27
Real Applications
 Google Analytics
◦ Use two of the tables
 The raw click table(~200TB)
 The summary table(~20TB)
◦ Use a MapReduce
 Personalized Search
◦ History of users
◦ Use a MapReduce
28
Conclusions
 Bigtable clusters have been in
production use since April 2005 at
Google
 Provide Performance and high
availability
 Found that there are significant advantages
to building storage solution at Google
 Apache Hbase based on Bigtable
29
Thank you!
30
Ad

More Related Content

What's hot (20)

Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
Terry Cho
 
Kubernetes PPT.pptx
Kubernetes PPT.pptxKubernetes PPT.pptx
Kubernetes PPT.pptx
ssuser0cc9131
 
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
PgDay.Seoul
 
Terraform 0.12 + Terragrunt
Terraform 0.12 + TerragruntTerraform 0.12 + Terragrunt
Terraform 0.12 + Terragrunt
Anton Babenko
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Vietnam Open Infrastructure User Group
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
Everything as Code with Terraform
Everything as Code with TerraformEverything as Code with Terraform
Everything as Code with Terraform
All Things Open
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
Mydbops
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
Tomcy Thankachan
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
Karthik V
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
Andrea Righi
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
Narendranath Reddy T
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBD
ShapeBlue
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
Terry Cho
 
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계[Pgday.Seoul 2018]  Greenplum의 노드 분산 설계
[Pgday.Seoul 2018] Greenplum의 노드 분산 설계
PgDay.Seoul
 
Terraform 0.12 + Terragrunt
Terraform 0.12 + TerragruntTerraform 0.12 + Terragrunt
Terraform 0.12 + Terragrunt
Anton Babenko
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Vietnam Open Infrastructure User Group
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
Everything as Code with Terraform
Everything as Code with TerraformEverything as Code with Terraform
Everything as Code with Terraform
All Things Open
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
Mydbops
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking VN
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
Andrea Righi
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBD
ShapeBlue
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 

Viewers also liked (16)

분산 파일 시스템을 위한 맵 리듀스 기반 추천
분산 파일 시스템을 위한 맵 리듀스 기반 추천분산 파일 시스템을 위한 맵 리듀스 기반 추천
분산 파일 시스템을 위한 맵 리듀스 기반 추천
영원 서
 
Travel talker
Travel talkerTravel talker
Travel talker
영원 서
 
PCA - Principal Component Analysis
PCA - Principal Component AnalysisPCA - Principal Component Analysis
PCA - Principal Component Analysis
영원 서
 
HIPT
HIPTHIPT
HIPT
영원 서
 
Couch db
Couch dbCouch db
Couch db
amini gazar
 
Intro To Couch Db
Intro To Couch DbIntro To Couch Db
Intro To Couch Db
Shahar Evron
 
Couch db
Couch dbCouch db
Couch db
Christian Castillo
 
google Bigtable
google Bigtablegoogle Bigtable
google Bigtable
elliando dias
 
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
Farley Lai
 
Amazon Dynamo
Amazon DynamoAmazon Dynamo
Amazon Dynamo
Farley Lai
 
Big table
Big tableBig table
Big table
Adhinarayanan Ramanathan
 
Couch db
Couch dbCouch db
Couch db
Rashmi Agale
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
Karel Minarik
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
John Wood
 
Bigtable a distributed storage system
Bigtable a distributed storage systemBigtable a distributed storage system
Bigtable a distributed storage system
Devyani Vaidya
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
Amit Ranjan
 
분산 파일 시스템을 위한 맵 리듀스 기반 추천
분산 파일 시스템을 위한 맵 리듀스 기반 추천분산 파일 시스템을 위한 맵 리듀스 기반 추천
분산 파일 시스템을 위한 맵 리듀스 기반 추천
영원 서
 
PCA - Principal Component Analysis
PCA - Principal Component AnalysisPCA - Principal Component Analysis
PCA - Principal Component Analysis
영원 서
 
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...
Farley Lai
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
Karel Minarik
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
John Wood
 
Bigtable a distributed storage system
Bigtable a distributed storage systemBigtable a distributed storage system
Bigtable a distributed storage system
Devyani Vaidya
 
Ad

Similar to Google - Bigtable (20)

Google Big Table
Google Big TableGoogle Big Table
Google Big Table
Omar Al-Sabek
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
Chapter Six Storage-systemsgggggggg.pptx
Chapter Six Storage-systemsgggggggg.pptxChapter Six Storage-systemsgggggggg.pptx
Chapter Six Storage-systemsgggggggg.pptx
BinyamBekeleMoges
 
storage-systems.pptx
storage-systems.pptxstorage-systems.pptx
storage-systems.pptx
ShimoFcis
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
CLOUDIAN KK
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
Kulvinder Singh
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
Evan Weaver
 
Bigtable_Paper
Bigtable_PaperBigtable_Paper
Bigtable_Paper
Tarun Kumar Sarkar
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
surabhi_dwivedi
 
Gfs sosp2003
Gfs sosp2003Gfs sosp2003
Gfs sosp2003
睿琦 崔
 
Gfs
GfsGfs
Gfs
Shahbaz Sidhu
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
mrlonganh
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
Manivasagam Mohan
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
temp2004it
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Vipin Batra
 
Bigtable
BigtableBigtable
Bigtable
ptdorf
 
Big table
Big tableBig table
Big table
Manuel Correa
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
Worapol Alex Pongpech, PhD
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
Supriya Sahay
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
Romain Jacotin
 
Ad

Recently uploaded (20)

Gas Power Plant for Power Generation System
Gas Power Plant for Power Generation SystemGas Power Plant for Power Generation System
Gas Power Plant for Power Generation System
JourneyWithMe1
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Elevate Your Workflow
Elevate Your WorkflowElevate Your Workflow
Elevate Your Workflow
NickHuld
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...
Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...
Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...
LiyaShaji4
 
aset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edgeaset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edge
alilamisse
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.
Kamal Acharya
 
Building Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdfBuilding Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdf
rabiaatif2
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Engineering Chemistry First Year Fullerenes
Engineering Chemistry First Year FullerenesEngineering Chemistry First Year Fullerenes
Engineering Chemistry First Year Fullerenes
5g2jpd9sp4
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
BCS401 ADA Second IA Test Question Bank.pdf
BCS401 ADA Second IA Test Question Bank.pdfBCS401 ADA Second IA Test Question Bank.pdf
BCS401 ADA Second IA Test Question Bank.pdf
VENKATESHBHAT25
 
Gas Power Plant for Power Generation System
Gas Power Plant for Power Generation SystemGas Power Plant for Power Generation System
Gas Power Plant for Power Generation System
JourneyWithMe1
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Elevate Your Workflow
Elevate Your WorkflowElevate Your Workflow
Elevate Your Workflow
NickHuld
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...
Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...
Explainable-Artificial-Intelligence-in-Disaster-Risk-Management (2).pptx_2024...
LiyaShaji4
 
aset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edgeaset and manufacturing optimization and connecting edge
aset and manufacturing optimization and connecting edge
alilamisse
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.
Kamal Acharya
 
Building Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdfBuilding Security Systems in Architecture.pdf
Building Security Systems in Architecture.pdf
rabiaatif2
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Engineering Chemistry First Year Fullerenes
Engineering Chemistry First Year FullerenesEngineering Chemistry First Year Fullerenes
Engineering Chemistry First Year Fullerenes
5g2jpd9sp4
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
BCS401 ADA Second IA Test Question Bank.pdf
BCS401 ADA Second IA Test Question Bank.pdfBCS401 ADA Second IA Test Question Bank.pdf
BCS401 ADA Second IA Test Question Bank.pdf
VENKATESHBHAT25
 

Google - Bigtable

  • 1. Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. 1
  • 2. Index  Introduction  Data Model  API  Building Blocks  Implementation  Refinements  Real Applications  Conclusions 2
  • 3. Introduction 1. Motivation 2. What is a Bigtable? 3. Why not a DBMS? 3
  • 4. Introduction : Motivation  Lot of structured data at Google ◦ Web page, Geographic Info. , User data, Mail  Millions of machines  Different projects/applications 4
  • 5. Introduction : Why not a DBMS?  Provide more than Google needs  Required DB with wide scalability, wide applicability, high performance and high availability  Low-level storage optimizations help performance significantly  Cost would be very high ◦ Most DBMSs require very expensive infrastructure 5
  • 6. Introduction : What is a Bigtable?  Bigtable is a distributed storage system for managing structured data  Achieved several goals ◦ wide applicability, scalability, high performance  Scalable ◦ Terabytes of in-memory data ◦ Petabyte of disk-based data ◦ Millions of reads/writes per second, efficient scans  Self-managing ◦ Servers can be added/removed dynamically 6
  • 7. Data Model 1. Row 2. Column families 3. Timestamps 7
  • 8. Data Model : Row  The row keys in a table are arbitrary strings  Data is maintained lexicographic older by row key  Row range is called a “tablet”, which is the unit of distribution and load balancing  Sorted by row key in tablet 8
  • 9. Data Model : Column Families  Column keys are grouped into sets called “column families”  Basic unit of access control  A column key is named using the this syntax “ family:qualifier”  Access control and disk/memory accounting are performed at the columns-family level 9
  • 10. Data Model : Timestamps  Each cell in a Bigtable can contain multiple versions of the same data  sorted by timestamp order by descending  64-bit integers  real time in microseconds or assigned by client application 10
  • 11. Data Model : Example 11 Row Columns Columns family Timestamps
  • 12. API  The Bigtable API provieds functions ◦ Create/delete table and column families ◦ Change table, column family metadata ◦ Look up values from individual rows ◦ Iterate over a subset of the data  Supports single-row trancsactions  Can be used with MapReduce(HBase) 12
  • 13. API : Example  Uses a Scanner to iterate over all anchors in particular row Table *T = OpenOrDie(“/bigtable/web/webtable”); 13
  • 14. Building Blocks  Uses the distributed Google File System(GFS) to store log and data files  A Bigtable cluster typically operates in a shared pool of machines  Depend on cluster management system  The Google SSTable file format is used internally to store Bigtable data  Relies on a highly-available and 14
  • 15. Building Blocks : GFS & SSTable & Chubby  Google File System: ◦ Google File System grew out of an earlier Google effort, "BigFiles” ◦ Select for high data throughputs 15
  • 16. Building Blocks : GFS & SSTable & Chubby  SSTable: ◦ provides a persistent, ordered map from keys to values ◦ Contains a sequence of index block 16
  • 17. Building Blocks : GFS & SSTable & Chubby  Chubby: ◦ ensure that there is at most one active master at any time ◦ store the bootstrap location of Bigtable data ◦ discover tablet servers and finalize tablet server deaths ◦ store Bigtable schema information (the column family information for each table) 17
  • 18. Implementation 1. Tablet Location 2. Tablet Assignment 3. Tablet Serving 18
  • 19. Implementation  Three major components ◦ Library that is linked every client ◦ One master server ◦ Many tablet servers 19
  • 20. Implementation : Tablet Location  Use three-level hierarchy analogous to that of a B+tree to store tablet location information (Maximum three level)  The first level is a file stored in Chubby that contains the location of the root tablet 20
  • 21. Implementation : Tablet Location  Root tablet ◦ First tablet in the METADATA table ◦ Never split to ensure that the tablet location hierarchy has no more than three levels  METADATA tablet ◦ Stores the location of a tablet under a row key that is an encoding of the tablet’s table identifier and its end row 21
  • 22. Implementation : Tablet Assignment  Master server ◦ assign tablets to tablet servers ◦ detect presence of absence(expiration) of tablet servers ◦ balance tablet-server load ◦ handle schema changes such as table and column family creations  Tablet server ◦ manage a set of tablets(ten to a thousand tablets per tablet server) ◦ handle read/write requests to the tablets ◦ split tablets that have grown too large
  • 23. Implementation : Tablet Serving  Updates are committed to a commit log that stores redo records.  Recently committed ones are store in memtable  Older updates are stored in a sequence of SSTables 23
  • 24. Refinements 1. Locality groups 2. Compression 3. Caching for read performance 4. Bloom filters 5. Commit-log implementation 24
  • 25. Refinements  Locality groups ◦ Client can group multiple column families together into a locality group  Compression ◦ We benefit in that small portions of an SSTable can be read without decompressing the entire file ◦ Encode at 100-200MB/s ◦ Decode at 400-1000MB/s ◦ 10-to-1 reduction in space 25
  • 26. Refinements  Caching for read performance ◦ Tablet servers use two levels of caching  Scan/Block Cache  Bloom filters ◦ Should be created for SSTable in a particular locality group  Commit-log implementation ◦ Co-mingling mutations for different tablets in the same physical log file 26
  • 27. Real Applications 1. Google Analytics 2. Personalized Search 27
  • 28. Real Applications  Google Analytics ◦ Use two of the tables  The raw click table(~200TB)  The summary table(~20TB) ◦ Use a MapReduce  Personalized Search ◦ History of users ◦ Use a MapReduce 28
  • 29. Conclusions  Bigtable clusters have been in production use since April 2005 at Google  Provide Performance and high availability  Found that there are significant advantages to building storage solution at Google  Apache Hbase based on Bigtable 29