Apache HBase 1.0 Release

Apache
HBase
1.0
Release

Nick
Dimiduk,
Hortonworks

@xefyr

n10k.com

February
20,
2015

Release
1.0

“The
theme
of
(eventual)
1.0
release
is
to
become
a
stable

base
for
future
1.x
series
of
releases.
1.0
release
will
aim
to

achieve
at
least
the
same
level
of
stability
of
0.98
releases

without
introducing
too
many
new
features.”

Enis
Söztutar

HBase
1.0
Release
Manager

Agenda

•  A
Brief
History
of
HBase

•  What
is
HBase

•  Major
Changes
for
1.0

•  Upgrade
Path

A
BRIEF
HISTORY
OF
HBASE

How
we
got
here

The
Early
Years

•  2006:
BigTable
paper
published
by
Google

•  2006:
HBase
development
starts

•  2007:
HBase
added
Hadoop
contrib

•  2007:
Release
Hadoop
0.15.0

•  2008:
Hadoop
graduates
Incubator

•  2008:
HBase
becomes
Hadoop
sub-‐project

•  2008:
Release
HBase
0.18.1

•  2009:
Release
HBase
0.19.0

•  2009:
Release
HBase
0.20.0

Into
Produc_on

•  2010:
HBase
becomes
Apache
top-‐level
project

•  2011:
Release
HBase
0.90.0

•  2011:
Release
HBase
0.92.0

•  2011:
HBase:
The
Deﬁni1ve
Guide
published

•  2012:
Release
HBase
0.94.0

•  2012:
First
HBaseCon

•  2012:
HBase
Administra1on
Cookbook
published

•  2012:
HBase
In
Ac1on
published

Modern
HBase

•  2013:
HBaseCon
2013

•  2013:
Release
HBase
0.96.0

•  2013:
Apache
Phoenix
enters
Incubator

•  2014:
Release
HBase
0.98.0

•  2014:
HBaseCon
2014

•  2014:
Apache
Phoenix
graduates
Incubator

•  2015:
Release
HBase
1.0

…

•  2016:
Release
HBase
2.0?

WHAT
IS
HBASE

HBase
architecture
in
5
minutes
or
less

Data
Model

1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey
column
family
column
qualiﬁer
timestamp value
Rows
Column Families

Logical
Architecture

a
b
d
c
e
f
h
g
i
j
l
k
m
n
p
o
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116

Physical
Architecture

system and can therefore host any region (figure 3.8). By physically collocating Data
Nodes and RegionServers, you can use the data locality property; that is, RegionServ
ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
HBase deployments, the MapReduce framework isn’t deployed at all if the workload i
primarily random reads and writes. In other deployments, where the MapReduce pro
cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region
Servers can run together.
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on the same host
system and can therefore host any region (figure 3.8)
Nodes and RegionServers, you can use the data locali
ers can theoretically read and write to the local DataN
You may wonder where the TaskTrackers are in t
HBase deployments, the MapReduce framework isn’t d
primarily random reads and writes. In other deployme
cessing is also a part of the workloads, TaskTrackers,
DataNode RegionServer DataNode RegionServer
Figure 3.7 HBase RegionServer and HDFS DataNode processes are
system and can therefore host any region (figure 3.8). By physically colloca
Nodes and RegionServers, you can use the data locality property; that is, R
ers can theoretically read and write to the local DataNode as the primary D
You may wonder where the TaskTrackers are in this scheme of thing
HBase deployments, the MapReduce framework isn’t deployed at all if the w
primarily random reads and writes. In other deployments, where the MapR
cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBa
DataNode RegionServer DataNode RegionServer DataNode Reg
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically collocated on th
system and can therefore host any region (figure 3.8). By physica
Nodes and RegionServers, you can use the data locality property
ers can theoretically read and write to the local DataNode as the p
You may wonder where the TaskTrackers are in this scheme
HBase deployments, the MapReduce framework isn’t deployed at
primarily random reads and writes. In other deployments, where
cessing is also a part of the workloads, TaskTrackers, DataNodes
DataNode RegionServer DataNode RegionServer Dat
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically col
Region
Server
Data
Node
Region
Server
Data
Node
Region
Server
Data
Node
Region
Server
Data
Node
...
Nodes and RegionServers, you can use th
ers can theoretically read and write to the
You may wonder where the TaskTrac
HBase deployments, the MapReduce fram
primarily random reads and writes. In oth
cessing is also a part of the workloads, T
DataNode RegionServer DataNode
Figure 3.7 HBase RegionServer and HDFS DataNo
Master
Zoo
Keeper
Given that the underlying data is stored in HDFS, which is available to all clients as
a single namespace, all RegionServers have access to the same persisted files in the file
system and can therefore host any region (figure 3.8). By physically collocating Data-
Nodes and RegionServers, you can use the data locality property; that is, RegionServ-
ers can theoretically read and write to the local DataNode as the primary DataNode.
You may wonder where the TaskTrackers are in this scheme of things. In some
HBase deployments, the MapReduce framework isn’t deployed at all if the workload is
primarily random reads and writes. In other deployments, where the MapReduce pro-
cessing is also a part of the workloads, TaskTrackers, DataNodes, and HBase Region-
DataNode RegionServer DataNode RegionServer DataNode RegionServer
Name
Node
cessing is also a part of the workloads, TaskTrackers, DataNode
DataNode RegionServer DataNode RegionServer Da
Figure 3.7 HBase RegionServer and HDFS DataNode processes are typically co
Licensed to Nick Dimiduk <ndimiduk@gmail.com>
HBase
Client
HDFS
HBase

MAJOR
CHANGES
FOR
1.0

What’s
all
the
excitement
about?

Stability:
Co-‐Locate
Meta
with
Master

•  Simplify,
Improve
region
assignment
reliability

–  Fewer
components
involved
in
upda_ng
“truth”

•  Master
embeds
a
RegionServer

–  Will
host
only
system
tables

–  Baby
step
towards
combining
RS/Master
into
a
single
hbase
daemon

•  Backup
masters
unchanged

–  Can
be
conﬁgured
to
host
user
tables
while
in
standby

•  Plumbing
is
all
there,
oﬀ
by
default

hip://issues.apache.org/jira/browse/HBASE-‐10569

Availability:
Region
Replicas

•  Mul_ple
RegionServers
host
a
Region

–  One
is
“primary”,
others
are
“replicas”

–  Only
primary
accepts
writes

•  Client
reads
against
primary
only
or
any

–  Results
marked
as
appropriate

•  Baby
step
toward
quorum
reads,
writes


hip://www.slideshare.net/HBaseCon/features-‐session-‐1

Usability:
Client
API
Cleanup

•  Improved
self-‐consistency

•  Simpler
seman_cs

•  Easier
to
maintain

•  Obvious
@InterfaceAudience
annota_ons


hip://s.apache.org/hbase-‐1.0-‐api

hips://github.com/ndimiduk/hbase-‐1.0-‐api-‐examples

New
and
Noteworthy

•  Greatly
expanded
hbase.apache.org/book.html

•  Truncate
table
shell
command

•  Automa_c
tuning
of
global
MemStore
and
BlockCache
sizes

•  Basic
backpressure
mechanism

•  BucketCache
easier
to
conﬁgure

•  Compressed
BlockCache

•  Pluggable
replica_on
endpoint

•  A
Dockerﬁle
to
easily
run
HBase
from
source

Under
the
Covers

•  ZooKeeper
abstrac_ons

•  Meta
table
used
for
assignment

•  Cell-‐based
read/write
path

•  Combining
mvcc/seqid

•  Sundry
security,
tags,
labels
improvements

Groundwork
for
2.0

•  More,
Smaller
Regions

–  Millions,
1G
or
less

–  Less
write
ampliﬁca_on

–  Splinng
hbase:meta

•  Performance

–  More
oﬀ-‐heap

–  Less
resource
conten_on

–  Faster
region
failover/recovery

–  Mul_ple
WALs

–  QoS/Quotas/Mul_-‐tenancy

•  Rigging

–  Faster,
more
intelligent
assignment

–  Procedure
bus

–  Resumable,
query-‐able
opera_ons

•  Other
possibili_es

–  Quorum/consensus
reads,
writes?

–  Hydrabase,
mul_-‐DC
consensus?

–  Streaming
RPCs?

–  High
level
coprocessor
API

Seman_c
Versioning

•  Major/Minor/Patch
version
numbers

–  Only
major/minor
pre-‐1.0

•  Dimensions

–  Client/Server
wire
compa_bility

–  Server/Server
wire
and
feature
compa_bility

–  API
compa_bility

–  ABI
compa_bility

•  Proposal
up
for
a
vote

hip://s.apache.org/hbase-‐semver

UPGRADE
PATH

Tell
it
to
me
straight,
how
bad
is
it?

Online/Wire
Compa_bility

•  Direct
migra_on
from
0.94
supported

–  Looks
a
lot
like
upgrade
from
0.94
to
0.96:
requires
down_me

–  Not
tested
yet,
will
be
before
release

•  RPC
is
backward-‐compa_ble
to
0.96

–  Enabled
mixing
clients
and
servers
across
versions

–  So
long
as
no
new
features
are
enabled

•  Rolling
upgrade
"out
of
the
box"
from
0.98

•  Rolling
upgrade
"with
some
massaging"
from
0.96

–  IE,
0.96
cannot
read
HFileV3,
the
new
default

–  not
tested
yet,
will
be
before
release

Client
Applica_on
Compa_bility

•  API
is
backward
compa_ble
to
0.96

–  No
code
change
required

–  You’ll
start
genng
new
depreca_on
warnings

–  We
recommend
you
start
using
new
APIs

•  ABI
is
NOT
backward
compa_ble

–  Cannot
drop
current
applica_on
jars
onto
new
run_me

–  Recompile
your
applica_on
vs.
1.0
jars

–  Just
like
0.96
to
0.98
upgrade

Hadoop
Versions

•  Hadoop
1.x
is
NOT
supported

–  Bite
the
bullet;
you’ll
enjoy
the
performance
beneﬁts

•  Hadoop
2.x
only

–  Most
thoroughly
tested
on
2.4.x,
2.5.x

–  Probably
works
on
2.2.x,
2.3.x,
but
less
thoroughly
tested

hips://hbase.apache.org/book/conﬁgura_on.html#hadoop

Java
Versions

•  JDK
6
is
NOT
supported!

•  JDK
7
is
the
target
run_me

•  JDK
8
support
is
experimental

hips://hbase.apache.org/book/conﬁgura_on.html#hadoop

1.0.0
RCs
Available
Now!

•  Release
Candidate
vo_ng
has
commenced

•  Last
chance
to
catch
show-‐stopping
bugs

RELEASE
CANDIDATES
NOT
FOR
PRODUCTION
USE

•  Try
out
the
new
features

•  Help
us
test
your
upgrade
path

•  Be
a
part
of
history
in
the
making!

•  1.0.0rc5
available
2015-‐02-‐19

hip://search-‐hadoop.com/m/DHED40Ih5n

Thanks!

M A N N I N G
Nick Dimiduk
Amandeep Khurana
FOREWORD BY
Michael Stack
hbaseinac_on.com

Nick
Dimiduk

github.com/ndimiduk

@xefyr

n10k.com

hip://www.apache.org/dyn/closer.cgi/hbase/

Apache HBase 1.0 Release

Recommended

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Apache HBase 1.0 Release (20)

More from Nick Dimiduk (7)

Recently uploaded (20)

Apache HBase 1.0 Release

Editor's Notes