0% found this document useful (0 votes)

77 views

Large-Scale Data Management: Hbase

HBase is a distributed column-oriented database built on top of HDFS that provides Bigtable-like capabilities for the Hadoop ecosystem. It stores data in tables containing rows, columns, and versions organized into column families that are partitioned into regions distributed across HBase region servers which handle read and write requests under the management and coordination of the HBase master server. HBase provides a distributed, scalable, big data store with real-time read/write random access capabilities.

Uploaded by

raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Large-Scale Data Management: Hbase

Uploaded by

raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Large-Scale Data Management

Hbase

1
HBase: Overview
• HBase is a distributed column-oriented data
store built on top of HDFS

• HBase is an Apache open source project whose goal

is to provide storage for the Hadoop Distributed
Computing

• Data is logically organized into tables, rows and

columns

2
Difference
• Hive and HBase are two different Hadoop based
technologies –

• Hive is an SQL-like engine that runs MapReduce jobs,

and

• HBase is a NoSQL key/value database on Hadoop.

• Just like Google can be used for search and Facebook

for social networking, Hive can be used for analytical
queries while HBase for real-time querying.

3
HBase: Part of Hadoop’s
Ecosystem

HBase is built on top of HDFS

HBase files are

internally stored
in HDFS

4
HBase vs. HDFS
• Both are distributed systems that scale to hundreds or
thousands of nodes

• HDFS is good for batch processing (scans over big files)

• Not good for record lookup
• Not good for incremental addition of small batches
• Not good for updates

5
HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently address the above points
• Fast record lookup
• Support for record-level insertion
• Support for updates (not in place)

• HBase updates are done by creating new versions of

values

6
HBase vs. HDFS (Cont’d)

If application has neither random reads or writes  Stick to HDFS

7
HBase Data Model

8
HBase Data Model
• HBase is based on Google’s Bigtable model
• Key-Value pairs

Column Family

Row key

TimeStamp value

9
HBase Logical View

10
HBase: Keys and Column
Families
Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

11
Column family named “anchor”
Column family named “Contents”

Column
Time
Row key “content Column “anchor:”
• Key Stamp
s:”
• Byte array
“<html>
• Serves as the primary key t12
…”
for the table “com.apac
“<html>
Column named “apache.com”
• Indexed far fast lookup he.ww t11
…”
w”
• Column Family t10
“anchor:apache
.com”
“APACH
E”
• Has a name (string)
“anchor:cnnsi.co
• Contains one or more t15 “CNN”
m”
related columns
“anchor:my.look. “CNN.co
t13
ca” m”
• Column
“com.cnn.w “<html>
• Belongs to one column ww” t6
…”
family
“<html>
• Included inside the row t5
…”
• familyName:columnName “<html>
t3
…”

12
Version number for each row

Column
Time
Row key “content Column “anchor:”
Stamp
• Version Number s:”

• Unique within each “<html>

t12
key …” value
“com.apac
“<html>
• By default System’s he.ww
w”
t11
…”
timestamp t10
“anchor:apache “APACH
.com” E”
• Data type is Long
“anchor:cnnsi.co
t15 “CNN”
m”
• Value (Cell) “anchor:my.look. “CNN.co
t13
ca” m”
• Byte array
“com.cnn.w “<html>
t6
ww” …”

“<html>
t5
…”
“<html>
t3
…”

13
Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column Families
• Columns are not part of the schema

• HBase has Dynamic Columns

• Because column names are encoded inside the cells
• Different cells can have different columns

“Roles” column family

has different columns
in different cells

14
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key

• Table can be very sparse

Has two columns
• Many cells are empty [cnnsi.com & my.look.ca]

• Keys are indexed as the primary key

HBase Physical Model

16
HBase Physical Model
• Each column family is stored in a separate file (called HTables)

• Key & Version numbers are replicated with each column family

• Empty cells are not stored

17
Example

18
Column Families

19
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks

Each will be one region

20
HBase Architecture

21
Three Major Components
• The HBaseMaster
• One master

• The HRegionServer
• Many region servers

• The HBase client

22
HBase Architecture
• In HBase, tables are split into regions and are served by
the region servers.

• Regions are vertically divided by column families into

“Stores”.

• HBase has three major components: the client library, a

master server, and region servers.

• Region servers can be added or removed as per

requirement.

23
HBase Architecture
MasterServer
• Assigns regions to the region servers and takes
the help of Apache ZooKeeper for this task.
• Handles load balancing of the regions across
region servers.
• It unloads the busy servers and shifts the
regions to less occupied servers.
• Maintains the state of the cluster by negotiating
the load balancing.
• Is responsible for schema changes and other
metadata operations such as creation of tables
and column families. 24
Regions

• Regions are nothing but tables that are

split up and spread across the region
servers.

• Communicate with the client and

handle data-related operations.

• Handle read and write requests for all

the regions under it.

• When we take a deeper look into the

region server, it contain regions and
stores as shown:
25
• The store contains memory store and
HFiles. Memstore is just like a cache
memory.

• Anything that is entered into the

HBase is stored here initially.

• Later, the data is transferred and saved

in Hfiles as blocks and the memstore is
flushed.

26
Zookeeper
• Zookeeper is an open-source project that
provides services like maintaining configuration
information, naming, providing distributed
synchronization, etc.
• Zookeeper has ephemeral nodes representing
different region servers. Master servers use these
nodes to discover available servers.
• In addition to availability, the nodes are also
used to track server failures or network
partitions.
• Clients communicate with region servers via
zookeeper.
• In pseudo and standalone modes, HBase itself
will take care of zookeeper.

27
Big Picture

28
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Select value from table
Scan() where anchor=‘cnnsi.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Operations On Regions: Delete()

• Marking table cells as deleted

• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted

31
HBase: Joins
• HBase does not support joins

• Can be done in the application layer

• Using scan() and get() operations

32
Logging Operations

33
HBase Deployment

Master
node

Slave
nodes

34
HBase vs. RDBMS

35
When to use HBase

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Apache Hive Tutorial
No ratings yet
Apache Hive Tutorial
139 pages
Foss Service Manual
100% (2)
Foss Service Manual
212 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
HBase
No ratings yet
HBase
31 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Hadoop HBase Notes-Abhijit-Nagargoje
No ratings yet
Hadoop HBase Notes-Abhijit-Nagargoje
24 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
1 Apache Zookeeper
No ratings yet
1 Apache Zookeeper
7 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
Interview
No ratings yet
Interview
86 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Hbase PDF
No ratings yet
Hbase PDF
8 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Hadoop Commands Cheat Sheet
No ratings yet
Hadoop Commands Cheat Sheet
1 page
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
9 Sqoop Notes
No ratings yet
9 Sqoop Notes
17 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Edureka Interview Questions - HDFS
No ratings yet
Edureka Interview Questions - HDFS
4 pages
Hadoop Module 3.2
100% (1)
Hadoop Module 3.2
57 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
BCA 428 Oracle
No ratings yet
BCA 428 Oracle
142 pages
Kafka Cheat Sheets
No ratings yet
Kafka Cheat Sheets
1 page
Week-11 - 12-Hivepdf - 2023 - 11 - 10 - 12 - 47 - 43
No ratings yet
Week-11 - 12-Hivepdf - 2023 - 11 - 10 - 12 - 47 - 43
8 pages
SQL Tuning Basic Part II
0% (1)
SQL Tuning Basic Part II
16 pages
DDL Commands
No ratings yet
DDL Commands
65 pages
Datatypes in Hive
No ratings yet
Datatypes in Hive
31 pages
100 Interview Questions On Hadoop PDF
No ratings yet
100 Interview Questions On Hadoop PDF
24 pages
Hadoop I/O: Jaeyong Choi
No ratings yet
Hadoop I/O: Jaeyong Choi
36 pages
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Leetcode Preparation
No ratings yet
Leetcode Preparation
14 pages
Data Base Complete
No ratings yet
Data Base Complete
75 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Flume User Guide
No ratings yet
Flume User Guide
48 pages
DBMS in 5 Hours
100% (2)
DBMS in 5 Hours
332 pages
DBMS SQL Practice Questions Shivani
No ratings yet
DBMS SQL Practice Questions Shivani
10 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Apache Pig
100% (2)
Apache Pig
80 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
SQL Server Theory
No ratings yet
SQL Server Theory
2 pages
Map Reduce
No ratings yet
Map Reduce
10 pages
SHIVA KUMARA - JavaArchitect
No ratings yet
SHIVA KUMARA - JavaArchitect
9 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
Scala Basic Interview Questions
No ratings yet
Scala Basic Interview Questions
16 pages
Documentation
No ratings yet
Documentation
105 pages
Sqoop Cheatsheet
No ratings yet
Sqoop Cheatsheet
3 pages
Spark Sample Resume 2
100% (1)
Spark Sample Resume 2
7 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
Core Java Syllabus
No ratings yet
Core Java Syllabus
11 pages
Unit-7 Transaction Processing
No ratings yet
Unit-7 Transaction Processing
107 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
39 pages
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
No ratings yet
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
10 pages
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Java servlet Second Edition
From Everand
Java servlet Second Edition
Gerardus Blokdyk
No ratings yet
SCRUM: Mastering Agile Project Management for Exceptional Results (2023 Guide for Beginners)
From Everand
SCRUM: Mastering Agile Project Management for Exceptional Results (2023 Guide for Beginners)
Whitney Soto
No ratings yet
Listing Program Delphi Data Base (Gina Pradina Irawan)
No ratings yet
Listing Program Delphi Data Base (Gina Pradina Irawan)
24 pages
Y11 Mock Mark Scheme 2023
No ratings yet
Y11 Mock Mark Scheme 2023
8 pages
Introduction To Networking 2023
No ratings yet
Introduction To Networking 2023
27 pages
o Level Igcse Computer p2 Workbook by Inqilab Patel
100% (1)
o Level Igcse Computer p2 Workbook by Inqilab Patel
210 pages
Spring Boot Junit With Mockito: 1. Application-Test - Properties
No ratings yet
Spring Boot Junit With Mockito: 1. Application-Test - Properties
9 pages
Calling RFC Function Modules in ABAP
No ratings yet
Calling RFC Function Modules in ABAP
81 pages
Java Mission Control 6.0 Tutorial: Consulting Member of Technical Staff
No ratings yet
Java Mission Control 6.0 Tutorial: Consulting Member of Technical Staff
83 pages
D-PDD-DY-23 Updated Dumps - Dell PowerProtect DD Deploy 2023
No ratings yet
D-PDD-DY-23 Updated Dumps - Dell PowerProtect DD Deploy 2023
10 pages
AN1310 - Booltloader
100% (1)
AN1310 - Booltloader
24 pages
MUX74HC4067 - Codebender
No ratings yet
MUX74HC4067 - Codebender
8 pages
Intel GMA 950 Graphics: Visual Excitement From Your PC!
No ratings yet
Intel GMA 950 Graphics: Visual Excitement From Your PC!
2 pages
Springer Exploring macOS (001-225)
No ratings yet
Springer Exploring macOS (001-225)
225 pages
Learning Outcome Task/Activity Required Date Accomplishe D Instructor Remarks
No ratings yet
Learning Outcome Task/Activity Required Date Accomplishe D Instructor Remarks
11 pages
Evolution of Computer
No ratings yet
Evolution of Computer
10 pages
Constructer & Destructer
No ratings yet
Constructer & Destructer
12 pages
Crash 1
No ratings yet
Crash 1
146 pages
CC3021w (V1), CC3011 (V2) Operating Systems SP2021: Course Introduction
No ratings yet
CC3021w (V1), CC3011 (V2) Operating Systems SP2021: Course Introduction
18 pages
Seat HOW TO UPDATE YOUR NAVIGATION SYSTEM SEAT
No ratings yet
Seat HOW TO UPDATE YOUR NAVIGATION SYSTEM SEAT
8 pages
Restaurar Visor de Imagenes Clasico Windows 10 CAMBIA ESTE .TXT A - Reg
No ratings yet
Restaurar Visor de Imagenes Clasico Windows 10 CAMBIA ESTE .TXT A - Reg
4 pages
Implementing Cisco Application Centric Infrastructure (DCACI) v1.1
No ratings yet
Implementing Cisco Application Centric Infrastructure (DCACI) v1.1
4 pages
CORE JAVA SYLLABUS Tcs
No ratings yet
CORE JAVA SYLLABUS Tcs
5 pages
LSMW Batch Scheduling
No ratings yet
LSMW Batch Scheduling
10 pages
Placement Program Coding Club India
No ratings yet
Placement Program Coding Club India
8 pages
IN_DT_Product_Guide (1)
No ratings yet
IN_DT_Product_Guide (1)
70 pages
An Introduction To The Service Broker
No ratings yet
An Introduction To The Service Broker
5 pages
TalendOpenStudio DI IG Windows 6.5.1 EN
No ratings yet
TalendOpenStudio DI IG Windows 6.5.1 EN
19 pages
Report Part 2
No ratings yet
Report Part 2
4 pages
MB5100ser OnlineManual Win en V01
No ratings yet
MB5100ser OnlineManual Win en V01
1,356 pages
Hardware-Software-Firmware-and-Humanware (3) - 093604
100% (1)
Hardware-Software-Firmware-and-Humanware (3) - 093604
8 pages

Large-Scale Data Management: Hbase

Uploaded by

Large-Scale Data Management: Hbase

Uploaded by

Large-Scale Data Management

• HBase is an Apache open source project whose goal

• Data is logically organized into tables, rows and

• Hive is an SQL-like engine that runs MapReduce jobs,

• HBase is a NoSQL key/value database on Hadoop.

• Just like Google can be used for search and Facebook

HBase is built on top of HDFS

HBase files are

• HDFS is good for batch processing (scans over big files)

• HBase updates are done by creating new versions of

If application has neither random reads or writes  Stick to HDFS

Each row has a Key

Each column family consists of one or more Columns

• Unique within each “<html>

• HBase has Dynamic Columns

“Roles” column family

• Table can be very sparse

• Keys are indexed as the primary key

• Empty cells are not stored

Each will be one region

• The HBase client

• Regions are vertically divided by column families into

• HBase has three major components: the client library, a

• Region servers can be added or removed as per

• Regions are nothing but tables that are

• Communicate with the client and

• Handle read and write requests for all

• When we take a deeper look into the

• Anything that is entered into the

• Later, the data is transferred and saved

t10 “anchor:apache.com” “APACHE”

t10 “anchor:apache.com” “APACHE”

• Marking table cells as deleted

• Can be done in the application layer

You might also like