SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix + Apache HBase
An Enterprise Grade Data Warehouse
Ankit Singhal , Rajeshbabu , Josh Elser
June, 30 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About us!!
– Committer and member of Apache Phoenix PMC
– MTS at Hortonworks.
Ankit Singhal
– Committer and member of Apache Phoenix PMC
– Committer in Apache HBase
– MTS at Hortonworks.
RajeshBabu
– Committer in Apache Phoenix
– Committer and Member of Apache Calcite PMC
– MTS at Hortonworks.
Josh Elser
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Optimizations
Phoenix Query server
Q&A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Warehouse
EDW helps organize and aggregate analytical data from various functional domains and
serves as a critical repository for organizations’ operations.
STAGING
Files
IOT
data
Data Warehouse
Mart
OLTP
ETL Visualization
or BI
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix Offerings and Interoperability:-
ETL Data Warehouse Visualization & BI
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Table,a,123
Table,,123
RegionServer
HDFS
HBase client
Phoenix client
Phx coproc
ZooKeeper
Table,b,123
Table,a,123
Phx coproc
Table,c,123
Table,b,123
Phx coproc
RegionServer RegionServer
Application
HBase & Phoenix
HBase , a distributed NoSQL store
Phoenix , provides OLTP and Analytics over HBase
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Data Warehouse
Hardware cost
Softwarecost
Specialized H/WCommodity H/W
LicensingcostNoCost SMPMPP
Open
Source MPP
HBase+
Phoenix
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Architecture
Run on
commodity
H/W
True MPP
O/S and
H/W
flexibility
Support
OLTP and
ROLAP
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Scalability
Linear
scalability
for storage
Linear
scalability
for memory
Open to
Third party
storage
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Reliability
Highly
Available
Replication
for disaster
recovery
Fully ACID
for Data
Integrity
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Phoenix & HBase as a Data Warehouse
Manageability
Performance
Tuning
Data
Modeling &
Schema
Evolution
Data
pruning
Online
expansion
Or upgrade
Data Backup
and recovery
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use cases
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Who uses Phoenix !!
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Analytics Use case - (Web Advertising company)
 Functional Requirements
– Create a single source of truth
– Cross dimensional query on 50+ dimension and 80+ metrics
– Support fast Top-N queries
 Non-functional requirements
– Less than 3 second Response time for slice and dice
– 250+ concurrent users
– 100k+ Analytics queries/day
– Highly available
– Linear scalability
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Warehouse Capacity
 Data Size(ETL Input)
– 24TB/day of raw data system wide
– 25 Billion of impressions
 HBase Input(cube)
– 6 Billion rows of aggregated data(100GB/day)
 HBase Cluster size
– 65 Nodes of HBase
– 520 TB of disk
– 4.1 TB of memory
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case Architecture
AdServer
Click Tracking
Kafka
Input
Kafka
Input
ETL Filter Aggregate
In- Memory
Store
ETL Filter Aggregate
Real-time
Kafka
CAMUS
HDFS
ETL
HDFS
Data
Uploader
D
A
T
A
A
P
I
HBase
Views
A
N
A
L
Y
T
I
C
S
UI
Batch Processing
Data Ingestion Analytics
Apache
Kafka
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cube
Generation
Cubes are stored in
HBase
A
N
A
L
Y
T
I
C
S
UI
Convert
slice and
dice query
to SQL
query
Data
API
Analytics Data Warehouse Architecture
Bulk
Load
HDFS
ETL
Backup
and
recovery
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series Use Case- (Apache Ambari)
 Functional requirements
– Store all cluster metrics collected every second(10k to 100k metrics/second)
– Optimize storage/access for time series data
 Non-functional requirements
– Near real time response time
– Scalable
– Real time ingestion
Ambari Metrics System (AMS)
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AMS architecture
Metric
Monitors
Hosts
Hadoop
Sinks
HBase
Phoenix
Metric
Collector
Ambari
Server
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Optimizations
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Most important criteria for driving overall performance of queries on the table
 Primary key should be composed from most-used predicate columns in the queries
 In most cases, leading part of primary key should help to convert queries into point
lookups or range scans in HBase
Primary key design
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Use salting to alleviate write hot-spotting
CREATE TABLE …(
…
) SALT_BUCKETS = N
– Number of buckets should be equal to number of RegionServers
 Otherwise, try to presplit the table if you know the row key data set
CREATE TABLE …(
…
) SPLITS(…)
Salting vs pre-split
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Use block encoding and/or compression for better performance
CREATE TABLE …(
…
) DATA_BLOCK_ENCODING= ‘FAST_DIFF’, COMPRESSION=‘SNAPPY’
 Use region replication for read high availability
CREATE TABLE …(
…
) “REGION_REPLICATION” = “2”
Table properties
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Set UPDATE_CACHE_FREQUENCY to bigger value to avoid frequently touching server for
metadata updates
CREATE TABLE …(
…
) UPDATE_CACHE_FREQUENCY = 300000
Table properties
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Design
 Divide columns into multiple column families if there are rarely accessed columns
– HBase reads only the files of column families specified in the query to reduce I/O
pk1 pk2
CF1 CF2
Col1 Col2 Col3 Col4 Col5 Col6 Col7
Frequently accessing columns Rarely accessing columns
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secondary Indexes
 Global indexes
– Optimized for read heavy use cases
CREATE INDEX idx on table(…)
 Local Indexes
– Optimized for write heavy and space constrained use cases
CREATE LOCAL INDEX idx on table(…)
 Functional indexes
– Allow you to create indexes on arbitrary expressions.
CREATE INDEX UPPER_NAME_INDEX ON EMP(UPPER(FIRSTNAME||’ ’|| LASTNAME ))
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secondary Indexes
 Use covered indexes to efficiently scan over the index table instead of primary table.
CREATE INDEX idx ON table(…) include(…)
 Pass index hint to guide query optimizer to select the right index for query
SELECT /*+INDEX(<table> <index>)*/..
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Timestamp Column
 Maps HBase native row timestamp to a Phoenix column
 Leverage optimizations provided by HBase like setting the minimum and maximum time
range for scans to entirely skip the store files which don’t fall in that time range.
 Perfect for time series use cases.
 Syntax
CREATE TABLE …(CREATED_DATE NOT NULL DATE
…
CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP…
)
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use of Statistics
Region A
Region F
Region L
Region R
Chunk A
Chunk C
Chunk F
Chunk I
Chunk L
Chunk O
Chunk R
Chunk U
A
F
R
L
A
F
R
L
C
I
O
U
Client Client
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Skip Scan
 Phoenix supports skip scan to jump to matching keys directly when the query has key
sets in predicate
SELECT * FROM METRIC_RECORD
WHERE METRIC_NAME LIKE 'abc%'
AND HOSTNAME in ('host1’, 'host2');
CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN
ON 2 RANGES OVER METRIC_RECORD
['abc','host1'] - ['abd','host2']
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
Skip scan
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Join optimizations
 Hash Join
– Hash join outperforms other types of join algorithms when one of the relations is smaller or
records matching the predicate should fit into memory
 Sort-Merge join
– When the relations are very big in size then use the sort-merge join algorithm
 NO_STAR_JOIN hint
– For multiple inner-join queries, Phoenix applies a star-join optimization by default. Use this hint in
the query if the overall size of all right-hand-side tables would exceed the memory size limit.
 NO_CHILD_PARENT_OPTIMIZATION hint
– Prevents the usage of child-parent-join optimization.
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Optimize Writes
 Upsert values
– Call it multiple times before commit for batching mutations
– Use prepared statement when you run the query multiple times
 Upsert select
– Configure phoenix.mutate.batchSize based on row size
– Set auto-commit to true for writing scan results directly to HBase.
– Set auto-commit to true while running upsert selects on the same table so that writes happen at
server.
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hints
 SERIAL SCAN, RANGE SCAN
 SERIAL
 SMALL SCAN
Some important hints
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Additional References
 For some more optimizations you can refer to these documents
– http://phoenix.apache.org/tuning.html
– https://hbase.apache.org/book.html#performance
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Optimizations
Phoenix Query Server
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix Query Server
 A standalone service that proxies user requests to HBase/Phoenix
– Optional
 Reference client implementation via JDBC
– ”Thick” versus “Thin”
 First introduced in Apache Phoenix 4.4.0
 Built on Apache Calcite’s Avatica
– ”A framework for building database drivers”
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Traditional Apache Phoenix RPC Model
Table,a,123
Table,,123
RegionServer
HDFS
HBase client
Phoenix client
Phx coproc
ZooKeeper
Table,b,123
Table,a,123
Phx coproc
Table,c,123
Table,b,123
Phx coproc
RegionServer RegionServer
Application
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Model
Table,a,123
Table,,123
RegionServer
HDFS
HBase client
Phoenix client
Phx coproc
ZooKeeper
Table,b,123
Table,a,123
Phx coproc
Table,d,123
Table,b,123
Phx coproc
RegionServer RegionServer
Query Server
Application
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Technology
 HTTP Server and wire API definition
 Pluggable serialization
– Google Protocol Buffers
 “Thin” JDBC Driver (over HTTP)
 Other goodies!
– Pluggable metrics system
– TCK (technology compatibility kit)
– SPNEGO for Kerberos authentication
– Horizontally scalable with load balancing
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Server Clients
 Go language database/sql/driver
– https://github.com/Boostport/avatica
 .NET driver
– https://github.com/Azure/hdinsight-phoenix-sharp
– https://www.nuget.org/packages/Microsoft.Phoenix.Client/1.0.0-preview
 ODBC
– Built by http://www.simba.com/, also available from Hortonworks
 Python DB API v2.0 (not “battle tested”)
– https://bitbucket.org/lalinsky/python-phoenixdb
Client enablement
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Phoenix & HBase as an Enterprise Data Warehouse
Use Cases
Optimizations
Phoenix Query Server
Q&A
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We hope to see you all migrating to Phoenix & HBase and expecting more questions on the user mailing
lists.
Get involved in mailing lists:-
user@phoenix.apache.org
user@hbase.apache.org
You can reach us on:-
ankit@apache.org
rajeshbabu@apache.org
elserj@apache.org
Phoenix & HBase
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Ad

More Related Content

What's hot (20)

Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Databricks
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Databricks
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 

Viewers also liked (20)

Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
Josh Elser
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
Taming HBase with Apache Phoenix and SQL
Taming HBase with Apache Phoenix and SQLTaming HBase with Apache Phoenix and SQL
Taming HBase with Apache Phoenix and SQL
HBaseCon
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
Yahoo Developer Network
 
Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020
CEW Georgetown
 
3 hard facts shaping higher education thinking and behavior
3 hard facts shaping higher education thinking and behavior3 hard facts shaping higher education thinking and behavior
3 hard facts shaping higher education thinking and behavior
Grant Thornton LLP
 
African Americans: College Majors and Earnings
African Americans: College Majors and Earnings African Americans: College Majors and Earnings
African Americans: College Majors and Earnings
CEW Georgetown
 
The Online College Labor Market
The Online College Labor MarketThe Online College Labor Market
The Online College Labor Market
CEW Georgetown
 
Game Based Learning for Language Learners
Game Based Learning for Language LearnersGame Based Learning for Language Learners
Game Based Learning for Language Learners
Shelly Sanchez Terrell
 
What's Trending in Talent and Learning for 2016?
What's Trending in Talent and Learning for 2016?What's Trending in Talent and Learning for 2016?
What's Trending in Talent and Learning for 2016?
Skillsoft
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
The French Revolution of 1789
The French Revolution of 1789The French Revolution of 1789
The French Revolution of 1789
Tom Richey
 
Digitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and IdentityDigitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and Identity
Paul Brown
 
GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom
Brian Housand
 
Connecting With the Disconnected
Connecting With the DisconnectedConnecting With the Disconnected
Connecting With the Disconnected
Chris Wejr
 
Responding to Academically Distressed Students
Responding to Academically Distressed StudentsResponding to Academically Distressed Students
Responding to Academically Distressed Students
Mr. Ronald Quileste, PhD
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
Josh Elser
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
Taming HBase with Apache Phoenix and SQL
Taming HBase with Apache Phoenix and SQLTaming HBase with Apache Phoenix and SQL
Taming HBase with Apache Phoenix and SQL
HBaseCon
 
Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020
CEW Georgetown
 
3 hard facts shaping higher education thinking and behavior
3 hard facts shaping higher education thinking and behavior3 hard facts shaping higher education thinking and behavior
3 hard facts shaping higher education thinking and behavior
Grant Thornton LLP
 
African Americans: College Majors and Earnings
African Americans: College Majors and Earnings African Americans: College Majors and Earnings
African Americans: College Majors and Earnings
CEW Georgetown
 
The Online College Labor Market
The Online College Labor MarketThe Online College Labor Market
The Online College Labor Market
CEW Georgetown
 
Game Based Learning for Language Learners
Game Based Learning for Language LearnersGame Based Learning for Language Learners
Game Based Learning for Language Learners
Shelly Sanchez Terrell
 
What's Trending in Talent and Learning for 2016?
What's Trending in Talent and Learning for 2016?What's Trending in Talent and Learning for 2016?
What's Trending in Talent and Learning for 2016?
Skillsoft
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
The French Revolution of 1789
The French Revolution of 1789The French Revolution of 1789
The French Revolution of 1789
Tom Richey
 
Digitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and IdentityDigitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and Identity
Paul Brown
 
GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom
Brian Housand
 
Connecting With the Disconnected
Connecting With the DisconnectedConnecting With the Disconnected
Connecting With the Disconnected
Chris Wejr
 
Responding to Academically Distressed Students
Responding to Academically Distressed StudentsResponding to Academically Distressed Students
Responding to Academically Distressed Students
Mr. Ronald Quileste, PhD
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
Ad

Similar to Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse (20)

Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
DataWorks Summit
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
 
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Ankit Singhal
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
Carolyn Duby
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
Rajeshbabu Chintaguntla
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
DataWorks Summit
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
 
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Ankit Singhal
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
DataWorks Summit/Hadoop Summit
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
Carolyn Duby
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
DataWorks Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
Rajeshbabu Chintaguntla
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
 
Ad

More from Josh Elser (9)

Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Josh Elser
 
Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo Iterators
Josh Elser
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
Josh Elser
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
Josh Elser
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo Iterators
Josh Elser
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java API
Josh Elser
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache Accumulo
Josh Elser
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
Josh Elser
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
Josh Elser
 
Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo Iterators
Josh Elser
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
Josh Elser
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
Josh Elser
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo Iterators
Josh Elser
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java API
Josh Elser
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache Accumulo
Josh Elser
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
Josh Elser
 

Recently uploaded (20)

Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 

Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Phoenix + Apache HBase An Enterprise Grade Data Warehouse Ankit Singhal , Rajeshbabu , Josh Elser June, 30 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About us!! – Committer and member of Apache Phoenix PMC – MTS at Hortonworks. Ankit Singhal – Committer and member of Apache Phoenix PMC – Committer in Apache HBase – MTS at Hortonworks. RajeshBabu – Committer in Apache Phoenix – Committer and Member of Apache Calcite PMC – MTS at Hortonworks. Josh Elser
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query server Q&A
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Warehouse EDW helps organize and aggregate analytical data from various functional domains and serves as a critical repository for organizations’ operations. STAGING Files IOT data Data Warehouse Mart OLTP ETL Visualization or BI
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix Offerings and Interoperability:- ETL Data Warehouse Visualization & BI
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,c,123 Table,b,123 Phx coproc RegionServer RegionServer Application HBase & Phoenix HBase , a distributed NoSQL store Phoenix , provides OLTP and Analytics over HBase
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Data Warehouse Hardware cost Softwarecost Specialized H/WCommodity H/W LicensingcostNoCost SMPMPP Open Source MPP HBase+ Phoenix
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Architecture Run on commodity H/W True MPP O/S and H/W flexibility Support OLTP and ROLAP
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Scalability Linear scalability for storage Linear scalability for memory Open to Third party storage
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Reliability Highly Available Replication for disaster recovery Fully ACID for Data Integrity
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Phoenix & HBase as a Data Warehouse Manageability Performance Tuning Data Modeling & Schema Evolution Data pruning Online expansion Or upgrade Data Backup and recovery
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use cases
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who uses Phoenix !!
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Analytics Use case - (Web Advertising company)  Functional Requirements – Create a single source of truth – Cross dimensional query on 50+ dimension and 80+ metrics – Support fast Top-N queries  Non-functional requirements – Less than 3 second Response time for slice and dice – 250+ concurrent users – 100k+ Analytics queries/day – Highly available – Linear scalability
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Warehouse Capacity  Data Size(ETL Input) – 24TB/day of raw data system wide – 25 Billion of impressions  HBase Input(cube) – 6 Billion rows of aggregated data(100GB/day)  HBase Cluster size – 65 Nodes of HBase – 520 TB of disk – 4.1 TB of memory
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case Architecture AdServer Click Tracking Kafka Input Kafka Input ETL Filter Aggregate In- Memory Store ETL Filter Aggregate Real-time Kafka CAMUS HDFS ETL HDFS Data Uploader D A T A A P I HBase Views A N A L Y T I C S UI Batch Processing Data Ingestion Analytics Apache Kafka
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cube Generation Cubes are stored in HBase A N A L Y T I C S UI Convert slice and dice query to SQL query Data API Analytics Data Warehouse Architecture Bulk Load HDFS ETL Backup and recovery
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series Use Case- (Apache Ambari)  Functional requirements – Store all cluster metrics collected every second(10k to 100k metrics/second) – Optimize storage/access for time series data  Non-functional requirements – Near real time response time – Scalable – Real time ingestion Ambari Metrics System (AMS)
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AMS architecture Metric Monitors Hosts Hadoop Sinks HBase Phoenix Metric Collector Ambari Server
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Most important criteria for driving overall performance of queries on the table  Primary key should be composed from most-used predicate columns in the queries  In most cases, leading part of primary key should help to convert queries into point lookups or range scans in HBase Primary key design
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Use salting to alleviate write hot-spotting CREATE TABLE …( … ) SALT_BUCKETS = N – Number of buckets should be equal to number of RegionServers  Otherwise, try to presplit the table if you know the row key data set CREATE TABLE …( … ) SPLITS(…) Salting vs pre-split
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Use block encoding and/or compression for better performance CREATE TABLE …( … ) DATA_BLOCK_ENCODING= ‘FAST_DIFF’, COMPRESSION=‘SNAPPY’  Use region replication for read high availability CREATE TABLE …( … ) “REGION_REPLICATION” = “2” Table properties
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Set UPDATE_CACHE_FREQUENCY to bigger value to avoid frequently touching server for metadata updates CREATE TABLE …( … ) UPDATE_CACHE_FREQUENCY = 300000 Table properties
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Design  Divide columns into multiple column families if there are rarely accessed columns – HBase reads only the files of column families specified in the query to reduce I/O pk1 pk2 CF1 CF2 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Frequently accessing columns Rarely accessing columns
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Secondary Indexes  Global indexes – Optimized for read heavy use cases CREATE INDEX idx on table(…)  Local Indexes – Optimized for write heavy and space constrained use cases CREATE LOCAL INDEX idx on table(…)  Functional indexes – Allow you to create indexes on arbitrary expressions. CREATE INDEX UPPER_NAME_INDEX ON EMP(UPPER(FIRSTNAME||’ ’|| LASTNAME ))
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Secondary Indexes  Use covered indexes to efficiently scan over the index table instead of primary table. CREATE INDEX idx ON table(…) include(…)  Pass index hint to guide query optimizer to select the right index for query SELECT /*+INDEX(<table> <index>)*/..
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row Timestamp Column  Maps HBase native row timestamp to a Phoenix column  Leverage optimizations provided by HBase like setting the minimum and maximum time range for scans to entirely skip the store files which don’t fall in that time range.  Perfect for time series use cases.  Syntax CREATE TABLE …(CREATED_DATE NOT NULL DATE … CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP… )
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use of Statistics Region A Region F Region L Region R Chunk A Chunk C Chunk F Chunk I Chunk L Chunk O Chunk R Chunk U A F R L A F R L C I O U Client Client
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Skip Scan  Phoenix supports skip scan to jump to matching keys directly when the query has key sets in predicate SELECT * FROM METRIC_RECORD WHERE METRIC_NAME LIKE 'abc%' AND HOSTNAME in ('host1’, 'host2'); CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 RANGES OVER METRIC_RECORD ['abc','host1'] - ['abd','host2'] Region1 Region2 Region3 Region4 Client RS3RS2RS1 Skip scan
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Join optimizations  Hash Join – Hash join outperforms other types of join algorithms when one of the relations is smaller or records matching the predicate should fit into memory  Sort-Merge join – When the relations are very big in size then use the sort-merge join algorithm  NO_STAR_JOIN hint – For multiple inner-join queries, Phoenix applies a star-join optimization by default. Use this hint in the query if the overall size of all right-hand-side tables would exceed the memory size limit.  NO_CHILD_PARENT_OPTIMIZATION hint – Prevents the usage of child-parent-join optimization.
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Optimize Writes  Upsert values – Call it multiple times before commit for batching mutations – Use prepared statement when you run the query multiple times  Upsert select – Configure phoenix.mutate.batchSize based on row size – Set auto-commit to true for writing scan results directly to HBase. – Set auto-commit to true while running upsert selects on the same table so that writes happen at server.
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hints  SERIAL SCAN, RANGE SCAN  SERIAL  SMALL SCAN Some important hints
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional References  For some more optimizations you can refer to these documents – http://phoenix.apache.org/tuning.html – https://hbase.apache.org/book.html#performance
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query Server
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Phoenix Query Server  A standalone service that proxies user requests to HBase/Phoenix – Optional  Reference client implementation via JDBC – ”Thick” versus “Thin”  First introduced in Apache Phoenix 4.4.0  Built on Apache Calcite’s Avatica – ”A framework for building database drivers”
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Traditional Apache Phoenix RPC Model Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,c,123 Table,b,123 Phx coproc RegionServer RegionServer Application
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Model Table,a,123 Table,,123 RegionServer HDFS HBase client Phoenix client Phx coproc ZooKeeper Table,b,123 Table,a,123 Phx coproc Table,d,123 Table,b,123 Phx coproc RegionServer RegionServer Query Server Application
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Technology  HTTP Server and wire API definition  Pluggable serialization – Google Protocol Buffers  “Thin” JDBC Driver (over HTTP)  Other goodies! – Pluggable metrics system – TCK (technology compatibility kit) – SPNEGO for Kerberos authentication – Horizontally scalable with load balancing
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Server Clients  Go language database/sql/driver – https://github.com/Boostport/avatica  .NET driver – https://github.com/Azure/hdinsight-phoenix-sharp – https://www.nuget.org/packages/Microsoft.Phoenix.Client/1.0.0-preview  ODBC – Built by http://www.simba.com/, also available from Hortonworks  Python DB API v2.0 (not “battle tested”) – https://bitbucket.org/lalinsky/python-phoenixdb Client enablement
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Phoenix & HBase as an Enterprise Data Warehouse Use Cases Optimizations Phoenix Query Server Q&A
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We hope to see you all migrating to Phoenix & HBase and expecting more questions on the user mailing lists. Get involved in mailing lists:- [email protected] [email protected] You can reach us on:- [email protected] [email protected] [email protected] Phoenix & HBase
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You