SlideShare a Scribd company logo
Introduction to HBase
Byeongweon Moon / REDDUCK
byeongweon.moon@reddduck.com
HBase Key Point
 Clustered, commodity(-ish) hardware
 Mostly schema-less
 Dynamic distribution
 Spread writes out over the cluster
HBase
 Distributed database modeled on Bigtable
   Bigtable : A Distributed Storage System for
    Structured Data by Chang et al.
 Runs on top of Hadoop Core
 Layers on HDFS for storage
 Native connections to MapReduce
 Distributed, High Availability, High
  Performance, Strong Consistency
HBase (cont.)
 Column-oriented store
    Wide table costs only the data stored
    NULLs in row are ‘free’
    Good compression: columns of similar type
    Column name is arbitrary
 Rows stored in sorted order
 Can random read and write
 Goal of billions of rows X millions of cells
    Petabytes of data across thousands of servers
Column Oriented Storage
!HBase
 “NoSQL” Database
    No joins
    No sophisticated query engine
    No transactions (sort of)
    No column typing
    No SQL, no ODBC/JDBC, etc.
 Not a replacement for RDBMS
 Matching Impedance
Why HBase?
 Datasets are reaching Petabytes
 Traditional databases are expensive to scale
  and difficult to distribute
 Commodity hardware is cheap and powerful
 Need for random access and batch
  processing (which Hadoop does not offer)
Tables
 Table is split into roughly equal sized
  “regions”
 Each region is a contiguous range of keys
 Regions split as they grow, thus dynamically
  adjusting to your data set
Table (cont.)
 Tables are sorted by Row
 Table schema defines column families
    Families consist of any number of columns
    Columns consist of any number of versions
    Everything except table name is byte[]


(Table, Row, Family:Column, Timestamp) -> Value
Table (cont.)
 As a data structrue

  SortedMap(
        RowKey, List(
              SortedMap(
                    Column, List(
                          Value, Timestamp
                    )
              )
        )
  )
HBase Open Source Stack

 ZooKeeper : Small Data Coordination Service
 HBase : Database Storage Engine
 HDFS : Distributed File system
 Hadoop : Asynchrous Map-Reduce Jobs
Server Architecture
 Similar to HDFS
    Master == Namenode
    Regionserver == Datanode
 Often run these alongside each other!
 Difference: HBase stores state in HDFS
 HDFS provides robust data storage across
  machines, insulating against failure
 Master and Regionserver fairly stateless and
  machine independent
Region Assignment
 Each region from every table is assigned to a
  Regionserver
 Master Duties:
   Responsible for assignment and handling
      regionserver problems (if any!)
     When machines fail, move regions
     When regions split, move regions to balance
     Could move regions to respond to load
     Can run multiple backup masters
Master
 The master does NOT
    Handle any write request (not a DB master!)
    Handle location finding requests
    Not involved in the read/write path
    Generally does very little most of the time
Distributed Coordination
 Zookeeper is used to manage master
  election and server availability
 Set up as a cluster, provides distributed
  coordination primitives
 An excellent tool for building cluster
  management systems
HBase Architecture




http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
How data actually stored
Write-ahead-Log




http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
HLog
Demo
HBase - Roadmap
 HBase 0.92.0
   Coprocessors
   Distributed Log Splitting
   Running Tasks in UI
   Performance Improvements
 HBase 0.94.0
   Security
   Secondary Indexes
   Search Integration
   HFile v2
Reference
 http://ofps.oreilly.com/titles/9781449396107/
  index.html
 http://hbase.apache.org/book.html#quicksta
  rt
 http://www.larsgeorge.com/2010/02/fosdem-
  2010-nosql-talk.html

More Related Content

What's hot (20)

PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
PPT
Big Data Fundamentals in the Emerging New Data World
Jongwook Woo
 
PPTX
HBase: Just the Basics
HBaseCon
 
PPT
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 
PDF
HBase
Pooja Sunkapur
 
PDF
HBase for Architects
Nick Dimiduk
 
PPTX
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon
 
PPTX
Session 14 - Hive
AnandMHadoop
 
PDF
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
ODP
Apache hadoop hbase
sheetal sharma
 
PDF
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
PDF
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
PDF
HBaseCon 2015: Just the Basics
HBaseCon
 
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
PPTX
Introduction to Apache Hive(Big Data, Final Seminar)
Takrim Ul Islam Laskar
 
PDF
Intro to HBase
alexbaranau
 
PPTX
Hive
Manas Nayak
 
PPTX
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 
PPTX
HADOOP TECHNOLOGY ppt
sravya raju
 
PPT
Hadoop hive presentation
Arvind Kumar
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Big Data Fundamentals in the Emerging New Data World
Jongwook Woo
 
HBase: Just the Basics
HBaseCon
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 
HBase for Architects
Nick Dimiduk
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon
 
Session 14 - Hive
AnandMHadoop
 
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
Apache hadoop hbase
sheetal sharma
 
Mar 2012 HUG: Hive with HBase
Yahoo Developer Network
 
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
HBaseCon 2015: Just the Basics
HBaseCon
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
Introduction to Apache Hive(Big Data, Final Seminar)
Takrim Ul Islam Laskar
 
Intro to HBase
alexbaranau
 
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop hive presentation
Arvind Kumar
 

Viewers also liked (20)

PPTX
Hadoop
Saeed Iqbal
 
PDF
How to build a news website use CMS wordpress
baran19901990
 
PDF
09 implementing+subprograms
baran19901990
 
PDF
08 subprograms
baran19901990
 
PDF
Untitled Presentation
baran19901990
 
PDF
Datatype
baran19901990
 
PDF
Chapter2
GF Cleiton
 
PDF
Nhập môn công tác kỹ sư
baran19901990
 
PDF
Config websocket on apache
baran19901990
 
PDF
Control structure
baran19901990
 
PPT
Memory allocation
sanya6900
 
DOCX
Chapter 9 & chapter 10 solutions
Saeed Iqbal
 
PDF
Chapter 17 dccn
Hareem Aslam
 
PPTX
Scope - Static and Dynamic
Sneh Pahilwani
 
PDF
Computer Fundamentals Chapter 12 cl
Saumya Sahu
 
PDF
10 logic+programming+with+prolog
baran19901990
 
PPT
Unit 3 principles of programming language
Vasavi College of Engg
 
PPTX
Basic c++ programs
harman kaur
 
PDF
Software engineering lecture notes
Siva Ayyakutti
 
Hadoop
Saeed Iqbal
 
How to build a news website use CMS wordpress
baran19901990
 
09 implementing+subprograms
baran19901990
 
08 subprograms
baran19901990
 
Untitled Presentation
baran19901990
 
Datatype
baran19901990
 
Chapter2
GF Cleiton
 
Nhập môn công tác kỹ sư
baran19901990
 
Config websocket on apache
baran19901990
 
Control structure
baran19901990
 
Memory allocation
sanya6900
 
Chapter 9 & chapter 10 solutions
Saeed Iqbal
 
Chapter 17 dccn
Hareem Aslam
 
Scope - Static and Dynamic
Sneh Pahilwani
 
Computer Fundamentals Chapter 12 cl
Saumya Sahu
 
10 logic+programming+with+prolog
baran19901990
 
Unit 3 principles of programming language
Vasavi College of Engg
 
Basic c++ programs
harman kaur
 
Software engineering lecture notes
Siva Ayyakutti
 
Ad

Similar to Introduction to HBase (20)

PPT
HBASE Overview
Sampath Rachakonda
 
PPTX
Hbase
AmitkumarPal21
 
PPTX
HBase.pptx
Sadhik7
 
PPT
Hbase introduction
yangwm
 
PPTX
H base
Shashwat Shriparv
 
PPS
Big data hadoop rdbms
Arjen de Vries
 
PPTX
Dancing with the elephant h base1_final
asterix_smartplatf
 
PDF
Hbase 20141003
Jean-Baptiste Poullet
 
PDF
Data Storage and Management project Report
Tushar Dalvi
 
DOCX
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
PPTX
Apache h base
Ramakrishna kapa
 
PDF
Techincal Talk Hbase-Ditributed,no-sql database
Rishabh Dugar
 
PPTX
Introduction to Apache HBase
Gokuldas Pillai
 
PDF
Optimization on Key-value Stores in Cloud Environment
Fei Dong
 
PPTX
Hbase
AllsoftSolutions
 
PDF
Nyc hadoop meetup introduction to h base
智杰 付
 
PPTX
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
PPTX
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
PPTX
Hadoop_arunam_ppt
jerrin joseph
 
HBASE Overview
Sampath Rachakonda
 
HBase.pptx
Sadhik7
 
Hbase introduction
yangwm
 
Big data hadoop rdbms
Arjen de Vries
 
Dancing with the elephant h base1_final
asterix_smartplatf
 
Hbase 20141003
Jean-Baptiste Poullet
 
Data Storage and Management project Report
Tushar Dalvi
 
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
Apache h base
Ramakrishna kapa
 
Techincal Talk Hbase-Ditributed,no-sql database
Rishabh Dugar
 
Introduction to Apache HBase
Gokuldas Pillai
 
Optimization on Key-value Stores in Cloud Environment
Fei Dong
 
Nyc hadoop meetup introduction to h base
智杰 付
 
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
Hadoop_arunam_ppt
jerrin joseph
 
Ad

More from Byeongweon Moon (11)

PPTX
No sql solutions - 공개용
Byeongweon Moon
 
PPT
Redis
Byeongweon Moon
 
PPTX
실시간 인벤트 처리
Byeongweon Moon
 
PPTX
Refactoring Seminar
Byeongweon Moon
 
PPT
SIM Initialization
Byeongweon Moon
 
PDF
3GPP UICC Spec. differences
Byeongweon Moon
 
PPTX
USAT : USIM Application Toolkit
Byeongweon Moon
 
PPTX
공개SW공모대전 2009 기술캠프 강의자료
Byeongweon Moon
 
PPTX
미스터리쇼핑 발표
Byeongweon Moon
 
PDF
Google Summer Of Code 2008
Byeongweon Moon
 
PDF
한양대 문병원 최종
Byeongweon Moon
 
No sql solutions - 공개용
Byeongweon Moon
 
실시간 인벤트 처리
Byeongweon Moon
 
Refactoring Seminar
Byeongweon Moon
 
SIM Initialization
Byeongweon Moon
 
3GPP UICC Spec. differences
Byeongweon Moon
 
USAT : USIM Application Toolkit
Byeongweon Moon
 
공개SW공모대전 2009 기술캠프 강의자료
Byeongweon Moon
 
미스터리쇼핑 발표
Byeongweon Moon
 
Google Summer Of Code 2008
Byeongweon Moon
 
한양대 문병원 최종
Byeongweon Moon
 

Introduction to HBase

  • 2. HBase Key Point  Clustered, commodity(-ish) hardware  Mostly schema-less  Dynamic distribution  Spread writes out over the cluster
  • 3. HBase  Distributed database modeled on Bigtable  Bigtable : A Distributed Storage System for Structured Data by Chang et al.  Runs on top of Hadoop Core  Layers on HDFS for storage  Native connections to MapReduce  Distributed, High Availability, High Performance, Strong Consistency
  • 4. HBase (cont.)  Column-oriented store  Wide table costs only the data stored  NULLs in row are ‘free’  Good compression: columns of similar type  Column name is arbitrary  Rows stored in sorted order  Can random read and write  Goal of billions of rows X millions of cells  Petabytes of data across thousands of servers
  • 6. !HBase  “NoSQL” Database  No joins  No sophisticated query engine  No transactions (sort of)  No column typing  No SQL, no ODBC/JDBC, etc.  Not a replacement for RDBMS  Matching Impedance
  • 7. Why HBase?  Datasets are reaching Petabytes  Traditional databases are expensive to scale and difficult to distribute  Commodity hardware is cheap and powerful  Need for random access and batch processing (which Hadoop does not offer)
  • 8. Tables  Table is split into roughly equal sized “regions”  Each region is a contiguous range of keys  Regions split as they grow, thus dynamically adjusting to your data set
  • 9. Table (cont.)  Tables are sorted by Row  Table schema defines column families  Families consist of any number of columns  Columns consist of any number of versions  Everything except table name is byte[] (Table, Row, Family:Column, Timestamp) -> Value
  • 10. Table (cont.)  As a data structrue SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • 11. HBase Open Source Stack  ZooKeeper : Small Data Coordination Service  HBase : Database Storage Engine  HDFS : Distributed File system  Hadoop : Asynchrous Map-Reduce Jobs
  • 12. Server Architecture  Similar to HDFS  Master == Namenode  Regionserver == Datanode  Often run these alongside each other!  Difference: HBase stores state in HDFS  HDFS provides robust data storage across machines, insulating against failure  Master and Regionserver fairly stateless and machine independent
  • 13. Region Assignment  Each region from every table is assigned to a Regionserver  Master Duties:  Responsible for assignment and handling regionserver problems (if any!)  When machines fail, move regions  When regions split, move regions to balance  Could move regions to respond to load  Can run multiple backup masters
  • 14. Master  The master does NOT  Handle any write request (not a DB master!)  Handle location finding requests  Not involved in the read/write path  Generally does very little most of the time
  • 15. Distributed Coordination  Zookeeper is used to manage master election and server availability  Set up as a cluster, provides distributed coordination primitives  An excellent tool for building cluster management systems
  • 19. HLog
  • 20. Demo
  • 21. HBase - Roadmap  HBase 0.92.0  Coprocessors  Distributed Log Splitting  Running Tasks in UI  Performance Improvements  HBase 0.94.0  Security  Secondary Indexes  Search Integration  HFile v2
  • 22. Reference  http://ofps.oreilly.com/titles/9781449396107/ index.html  http://hbase.apache.org/book.html#quicksta rt  http://www.larsgeorge.com/2010/02/fosdem- 2010-nosql-talk.html