Tutorial Hbase
Tutorial Hbase
One option was to prune, by retaining only data for the last N days
Reduced I/O
Indeed, column values are often very similar and differ little
row-by-row
Normalize data
Use Indexes
The system also uownloaus the linkeu page in the Lackgiounu, anu extiacts, loi in-
stance, the TITLE tag liom the HTML, il piesent. The entiie page is saveu loi latei
piocessing with asynchionous Latch joLs, loi analysis puiposes. This is iepiesenteu Ly
the url taLle.
Eveiy linkeu page is only stoieu once, Lut since many useis may link to the same long
URL, yet want to maintain theii own uetails, such as the usage statistics, a sepaiate
entiy in the shorturl is cieateu. This links the url, shorturl, anu click taLles.
This also allows you to aggiegate statistics to the oiiginal shoit ID, refShortId, so that
you can see the oveiall usage ol any shoit URL to map to the same long URL. The
shortId anu refShortId aie the hasheu IDs assigneu uniguely to each shoiteneu URL.
Foi example, in
http://hush.li/a23eg
the ID is a23eg.
Figuie 1-3 shows how the same schema coulu Le iepiesenteu in HBase. Eveiy shoiteneu
URL is stoieu in a sepaiate taLle, shorturl, which also contains the usage statistics,
stoiing vaiious time ianges in sepaiate column lamilies, with uistinct tinc-to-|ivc
settings. The columns loim the actual counteis, anu theii name is a comLination ol the
uate, plus an optional uimensional postlixloi example, the countiy coue.
The uownloaueu page, anu the extiacteu uetails, aie stoieu in the url taLle. This taLle
uses compiession to minimize the stoiage ieguiiements, Lecause the pages aie mostly
HTML, which is inheiently veiLose anu contains a lot ol text.
The user-shorturl taLle acts as a lookup so that you can guickly linu all shoit IDs loi
a given usei. This is useu on the usei`s home page, once she has loggeu in. The user
taLle stoies the actual usei uetails.
Ve still have the same numLei ol taLles, Lut theii meaning has changeu: the clicks
taLle has Leen aLsoiLeu Ly the shorturl taLle, while the statistics columns use the uate
as theii key, loimatteu as YYYYMMDDloi instance, 20110502so that they can Le ac-
Iigurc 1-2. Thc Hush schcna cxprcsscd as an ERD
14 | Chapter 1:Introduction
Figure: The Hush Schema expressed as an ERD
Pietro Michiardi (Eurecom) Tutorial: HBase 8 / 100
Introduction The problem with RDBMS
The Problem with RDBMS
Find all short URLs for a given user
Referential Integrity
Pietro Michiardi (Eurecom) Tutorial: HBase 9 / 100
Introduction The problem with RDBMS
The Problem with RDBMS
Scaling up to tens of thousands of users
Master DB takes all the writes (which are fewer in the Hush
application)
Slaves DB replicate Master DB and serve all reads (but you need a
load balancer)
Pietro Michiardi (Eurecom) Tutorial: HBase 10 / 100
Introduction The problem with RDBMS
The Problem with RDBMS
Scaling up to hundreds of thousands
Schema de-normalization
Weak: no guarantee
Pietro Michiardi (Eurecom) Tutorial: HBase 14 / 100
Introduction NOSQL
Dimensions to classify NoSQL DBs
Data model
In-memory or persistent?
Strict or eventual?
This translates in how fast the system handles READS and WRITES
[2]
Pietro Michiardi (Eurecom) Tutorial: HBase 15 / 100
Introduction NOSQL
Dimensions to classify NoSQL DBs
Physical Model
Is locking available?
Denormalization
Duplication
Duplicate data in more than one table such that at READ time no
further aggregation is required
Next: an example based on Hush
statistics are stored with the date as the key, so that they can be
accessed sequentially
BigTable [3]
What is BigTable?
Each column may have multiple versions, with each distinct value
contained in a separate cell
Keys are compared on a binary level, byte by byte, from left to right
SortedMap<RowKey, List<SortedMap<Column,
List<Value, Timestamp>>>>
Regions are dynamically split by the system when they become too
large
Since HFiles have a block index, lookup can be done with a single
disk seek
Rolling mechanism
B+Trees
Once the log has the modication saved, data is pushed in memory
As ushes take place over time, a lot of store les are created
The more and the faster you insert data at random locations the
faster pages get fragmented
Updates and deletes are done at disk seek rates, rather than
transfer rates
LSM-Tree [7]
Work at disk transfer rate and scale better to huge amounts of data
HMaster
Low-level operations
Keeps metadata
Talks to ZooKeeper
HRegionServer
Recovers from ZooKeeper the server name that host the -ROOT-
region
Using the -ROOT- information the client retrieves the server name
that host the .META. table region
Contact the reported .META. server and retrieve the server name
that has the region containing the row key in question
Caching
A MemStore instance
.logs
.oldlogs
.hbase.id
.hbase.version
/example-table
Pietro Michiardi (Eurecom) Tutorial: HBase 52 / 100
Architecture Storage
Storage
HBase Files
/example-table
.tableinfo
.tmp
...Key1...
.oldlogs
.regioninfo
.tmp
colfam1/
colfam1/
....column-key1...
Pietro Michiardi (Eurecom) Tutorial: HBase 53 / 100
Architecture Storage
Storage
HBase: Root-level les
.logs directory
All regions from that HRegionServer share the same HLog les
.oldlogs directory
.tmp directory
The name of each of this dirs is the MD5 hash of a region name
Each column family directory holds the actual data les, namely
HFiles
.META. is updated
Daughter regions initially reside on the same server
Parent is cleaned up
.META. is updated
Master schedules new regions to be moved off to other
servers
Pietro Michiardi (Eurecom) Tutorial: HBase 57 / 100
Architecture Storage
Storage
HBase: Compaction
Process that takes care of re-organizing store les
index block: records the offsets of the data and meta blocks
This is 1,024 times the default HFile block size (64 KB)
There is no correlation between HDFS block and HFile sizes
Pietro Michiardi (Eurecom) Tutorial: HBase 60 / 100
Architecture Storage
Storage
The KeyValue Format
Each KeyValue in the HFile is a low-level byte array
This is useful to offset into the array to get direct access to the value,
ignoring the key
Key format
The -ROOT- table is used to refer to all regions in the .META. table
Three-level B+ Tree -like operation
This information is cached, but the rst time or when the cache is
stale or when there is a miss due to compaction, the following
procedure applies
Recursive discovery process
Ask the region server hosting the matching .META. table to retrieve
the row key address
Row key
Column key
Both can be used to convey meaning
Although cells are stored logically in a table format, rows are stored
as linear sets of the cells
Rows have a row key to address all columns of a single logical row
Pietro Michiardi (Eurecom) Tutorial: HBase 72 / 100
Key Design Concepts
Concepts
theieloie has to also stoie the iow key and column key with eveiy cell so that it can
ietain this vital piece ol inloimation.
In auuition, multiple veisions ol the same cell aie stoieu as sepaiate, consecutive cells,
auuing the ieguiieu tincstanp ol when the cell was stoieu. The cells aie soiteu in
uescenuing oiuei Ly that timestamp so that a ieauei ol the uata will see the newest
value liistwhich is the canonical access pattein loi the uata.
The entiie cell, with the auueu stiuctuial inloimation, is calleu KeyValue in HBase
teims. It has not just the column anu actual value, Lut also the iow key anu timestamp,
stoieu loi eveiy cell loi which you have set a value. The KeyValues aie soiteu Ly iow
key liist, anu then Ly column key in case you have moie than one cell pei iow in one
column lamily.
The lowei-iight pait ol the liguie shows the iesultant layout ol the logical taLle insiue
the physical stoiage liles. The HBase API has vaiious means ol gueiying the stoieu uata,
with uecieasing gianulaiity liom lelt to iight: you can select iows Ly iow keys anu
ellectively ieuuce the amount ol uata that neeus to Le scanneu when looking loi a
specilic iow, oi a iange ol iows. Specilying the column lamily as pait ol the gueiy can
eliminate the neeu to seaich the sepaiate stoiage liles. Il you only neeu the uata ol one
lamily, it is highly iecommenueu that you specily the lamily loi youi ieau opeiation.
Although the tincstanpoi vcrsionol a cell is laithei to the iight, it is anothei im-
poitant selection ciiteiion. The stoie liles ietain the timestamp iange loi all stoieu cells,
so il you aie asking loi a cell that was changeu in the past two houis, Lut a paiticulai
stoie lile only has uata that is loui oi moie houis olu it can Le skippeu completely. See
also Reau Path on page 3+2 loi uetails.
Iigurc 9-1. Rows storcd as |incar scts oj actua| cc||s, which contain a|| thc vita| injornation
358 | Chapter 9:Advanced Usage
Folding the Logical Layout (Top-Right)
The cells of each row are stored one after the other
This reduces the amount of data to scan for a row or a range of rows
Few columns
Many rows
Flat-Wide Tables
Many columns
Few rows
Given the query granularity explained before
Store parts of the cell data in the row key
You have all emails of a user in a single row (e.g. userID is the
row key)
If the messageID is in the column qualier or the row key, each cell
still contains a single email message
The table can be split easily and the query granularity is more
ne-grained
Pietro Michiardi (Eurecom) Tutorial: HBase 78 / 100
Key Design Partial Key Scans
Partial Key Scans
Partial Key Scans reinforce the concept of Tall-Narrow Tables
From the email example: assume you have a separate row per
message, across all users
The start key is set to the exact userID only, with the end key set
at userID+1
This triggers the internal lexicographic comparison mechanism
Since the table does not have an exact match, it positions the scan at:
<userID>:<lowest-messageID>
The scan will then iterate over all the messages of an exact user,
parse the row key and get the messageID
Pietro Michiardi (Eurecom) Tutorial: HBase 79 / 100
Key Design Partial Key Scans
Partial Key Scans
Composite keys and atomicity
Following the email example: a single user inbox now spans many
rows
Such data is a time series The row key represents the event
time
HBase will store all rows sorted in a distinct range, namely regions
with specic start and stop keys
Sequential monotonously increasing nature of time series
data
All incoming data is written to the same region (and hence the
same server)
Regions become HOT!
Move the timestamp led of the row key or prex it with another
eld
This gives you a random distribution of the row key across all
available region servers
- Less than ideal for range scans
+ Since you can re-hash the timestamp, this solution is good for
random access
Pietro Michiardi (Eurecom) Tutorial: HBase 84 / 100
Key Design Time Series Data
Time Series Data
Summary
Using the salteu oi piomoteu lielu keys can stiike a goou Lalance ol uistiiLution loi
wiite peiloimance, anu seguential suLsets ol keys loi ieau peiloimance. Il you aie only
uoing ianuom ieaus, it makes most sense to use ianuom keys: this will avoiu cieating
iegion hot-spots.
Time-Ordered Relations
In oui pieceuing uiscussion, the time seiies uata uealt with inseiting new events as
sepaiate iows. Howevei, you can also stoie ielateu, time-oiueieu uata: using the col-
umns ol a taLle. Since all ol the columns aie soiteu pei column lamily, you can tieat
this soiting as a ieplacement loi a seconuaiy inuex, as availaLle in RDBMSes. Multiple
seconuaiy inuexes can Le emulateu Ly using multiple column lamiliesalthough that
is not the iecommenueu way ol uesigning a schema. But loi a small numLei ol inuexes,
this might Le what you neeu.
Consiuei the eailiei example ol the usei inLox, which stoies all ol the emails ol a usei
in a single iow. Since you want to uisplay the emails in the oiuei they weie ieceiveu,
Lut, loi example, also soiteu Ly suLject, you can make use ol column-Laseu soiting to
achieve the uilleient views ol the usei inLox.
Given the auvice to keep the numLei ol column lamilies in a taLle low
especially when mixing laige lamilies with small ones (in teims ol stoieu
uata)you coulu stoie the inLox insiue one taLle, anu the seconuaiy
inuexes in anothei taLle. The uiawLack is that you cannot make use ol
the pioviueu pei-taLle iow-level atomicity. Also see Seconuaiy In-
uexes on page 370 loi stiategies to oveicome this limitation.
The liist uecision to make conceins what the piimaiy soiting oiuei is, in othei woius,
how the majoiity ol useis have set the view ol theii inLox. Assuming they have set the
Iigurc 9-3. Iinding thc right ba|ancc bctwccn scqucntia| rcad and writc pcrjornancc
Key Design | 367
Pietro Michiardi (Eurecom) Tutorial: HBase 85 / 100
MapReduce Integration
MapReduce Integration
Pietro Michiardi (Eurecom) Tutorial: HBase 86 / 100
MapReduce Integration Recap
Introduction
In the following we review the main classes involved in
reading and writing data from/to an underlying data store
For MapReduce to work with HBase, some more practical
issues have to be addressed
Splits the table into proper blocks and hand them to the
MapReduce process
Must supply a Scan instance to interact with a table
Single instance that takes the output record from each reducer
subsequently
Details
The data node compares the server name of the writer with its own