SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Operations in Apache Hive
DataWorks Summit, San Jose 2018
• Eugene Koifman
2 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• A bit of history
• Current Functionality
• Design
• Future Plans
• Closing Remarks
3 © Hortonworks Inc. 2011–2018. All rights reserved
Early Hive
• Transactions
• ACID: Atomicity, Consistency, Isolation, Durability
• Atomicity - Rely on File System ‘rename’
• Insert into T partition(p=1) select …. - OK
• Dynamic Partition Write – not OK
• Multi-Insert statement – not OK
• FROM <expr> Insert into A select … Insert Into B select …
• Isolation - Lock Manager
• S/X locks – not good for long running analytics
4 © Hortonworks Inc. 2011–2018. All rights reserved
Early Hive – Changing Existing Data
• Drop <…>
• Insert Overwrite = Truncate + Insert
• Gets expensive if done often on small % of data
5 © Hortonworks Inc. 2011–2018. All rights reserved
Goals
• Support ACID properties
• Support SQL Update/Delete/Merge
• Low rate of transactions
• Not OLTP
• Not a replacement for MySql or HBase
6 © Hortonworks Inc. 2011–2018. All rights reserved
Features – Hive 3
7 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables
• Not all tables support transactional semantics
• Managed Tables
• No External tables or Storage Handler (Hbase, Druid, etc)
• Fully ACID compliant
• Single statement transactions
• Cross partition/cross table transactions
• Snapshot Isolation
• Between Serializable and Repeatable Read
8 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Full CRUD
 Supports Update/Delete/Merge
 CREATE TABLE T(a int, b int) STORED AS ORC TBLPROPERTIES ('transactional'='true');
• Restrictions
• Managed Table
• Table cannot be sorted
• Currently requires ORC File but anything implementing
• AcidInputFormat/AcidOutputFormat
• Bucketing is optional!
• If upgrading from Hive 2
• Requires Major Compaction before Upgrading
9 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Insert only
 CREATE TABLE T(a int, b int) TBLPROPERTIES ('transactional'='true’,
‘transactional_properties’=‘insert_only’);
• Managed Table
• Any storage format
10 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Convert from flat tables
 ALTER TABLE T SET TBLPROPERTIES ('transactional'='true')
 ALTER TABLE T(a int, b int) SET TBLPROPERTIES ('transactional'='true’,
‘transactional_properties’=‘true’);
• Metadata Only operation
• Compaction will eventually rewrite the table
11 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables - New In Hive 3
• Alter Table Add Partition…
• Alter Table T Concatenate
• Alter Table T Rename To….
• Export/Import Table
• Non-bucketed tables
• Load Data… Into Table …
• Insert Overwrite
• Fully Vectorized
• Create Table As …
• LLAP Cache
• Predicate Push Down
12 © Hortonworks Inc. 2011–2018. All rights reserved
Design – Hive 3
13 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Tables – Insert Only
• Transaction Manager
• Begin transaction and obtain a Transaction ID
• For each table, get a Write ID – determines location to write to
create table TM (a int, b int) TBLPROPERTIES
('transactional'='true',
'transactional_properties'='insert_only');
insert into TM values(1,1);
insert into TM values(2,2);
insert into TM values(3,3);
tm
── delta_0000001_0000001_0000
└── 000000_0
── delta_0000002_0000002_0000
└── 000000_0
── delta_0000003_0000003_0000
└── 000000_0
14 © Hortonworks Inc. 2011–2018. All rights reserved
Transaction Manager
• Transaction State
• Open, Committed, Aborted
• Reader at Snapshot Isolation
• A snapshot is the state of all transactions
• High Water Mark + List of Exceptions
tm
── delta_0000001_0000001_0000
└── 000000_0
── delta_0000002_0000002_0000
└── 000000_0
── delta_0000003_0000003_0000
└── 000000_0
 Atomicity & Isolation
15 © Hortonworks Inc. 2011–2018. All rights reserved
Full CRUD
• No in-place Delete - Append-only file system
• Isolate readers from writers
16 © Hortonworks Inc. 2011–2018. All rights reserved
ROW__ID
• CREATE TABLE acidtbl (a INT, b STRING) STORED AS ORC TBLPROPERTIES
('transactional'='true');
Metadata Columns original_write_id
bucket_id
row_id
current_write_id
User Columns col_1:
a : INT
col_2:
b : STRING
ROW__ID
17 © Hortonworks Inc. 2011–2018. All rights reserved
Create
• INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”);
ROW__ID a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
delta_00001_00001/bucket_0000
18 © Hortonworks Inc. 2011–2018. All rights reserved
Delete
• DELETE FROM acidTbl where a = 200;
ROW__ID a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ROW__ID a b
{ 1, 0, 1 } null null
delta_00001_00001/bucket_0000
delete_delta_00002_00002/bucket_0000
 Readers skip deleted rows
19 © Hortonworks Inc. 2011–2018. All rights reserved
Update
• Update = delete + insert
 UPDATE acidTbl SET b = “bar” where a = 300;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
delta_00001_00001/bucket_0000
ACID_PK a b
{ 2, 0, 0 } 300 “bar”
ACID_PK a b
{ 1, 0, 2 } null null
delta_00003_00003/bucket_0000 delete_delta_00003_00003/bucket_0000
20 © Hortonworks Inc. 2011–2018. All rights reserved
Read
• Ask Transaction Manager for Snapshot Information
• Decide which deltas are relevant
• Take all the files in delta_x_x/ and split them into chunks for each processing Task to
work with
• Localize all delete events from each delete_deleta_x_x/ to each task
• Highly Compressed with ORC
• Filter out all Insert events that have matching delete events
• Requires an Acid aware reader – thus AcidInputFormat
21 © Hortonworks Inc. 2011–2018. All rights reserved
Design - Compactor
• More Update operations = more delete events – make reads more expensive
• Insert operations don’t add read overhead
22 © Hortonworks Inc. 2011–2018. All rights reserved
Design - Compactor
• Compactor rewrites the table in the background
• Minor compaction - merges delta files into fewer deltas
• Major compactor merges deltas with base - more expensive
• This amortizes the cost of updates and self tunes the tables
• Makes ORC more efficient - larger stripes, better compression
• Compaction can be triggered automatically or on demand
• There are various configuration options to control when the process kicks in.
• Compaction itself is a Map-Reduce job
 Key design principle is that compactor does not affect readers/writers
• Cleaner process – removes obsolete files
• Requires Standalone metastore
23 © Hortonworks Inc. 2011–2018. All rights reserved
Merge Statement – SQL Standard 2011 (Hive 2.2)
ID State County Value
1 CA LA 19.0
2 MA Norfolk 15.0
7 MA Suffolk 50.15
16 CA Orange 9.1
ID State Value
1 20.0
7 80.0
100 NH 6.0
MERGE INTO TARGET T
USING SOURCE S ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET T.Value=S.Value
WHEN NOT MATCHED
INSERT (ID,State,Value)
VALUES(S.ID, S.State, S.Value)
ID State County Value
1 CA LA 20.0
2 MA Norfolk 15.0
7 MA Suffolk 80.0
16 CA Orange 9.1
100 NH null 6.0
24 © Hortonworks Inc. 2011–2018. All rights reserved
SQL Merge
Target
Source
ACID_PK ID Stat
e
County Value
{ 2, 0, 1 } 1 CA LA 20.0
{ 2, 0, 2 } 7 MA Suffolk 80.0
ACID_PK ID State County Value
{ 2, 0, 1 } 100 NH 6.0
delta_00002_00002/bucket_0000
delta_00002_00002_001/bucket_0000
Right Outer Join
ON T.ID=S.ID
ACID_PK Data
{ 1, 0, 1 } null
{ 1, 0, 3 } null
delete_delta_00002_00002/bucket_0000
WHEN MATCHED
WHEN NOT MATCHED
25 © Hortonworks Inc. 2011–2018. All rights reserved
Merge Statement Optimizations
• Semi Join Reduction
• aka Dynamic Runtime Filtering
• On Tez only
T.ID=S.ID
Target Source
ID in (1,7,100)
T.ID=S.ID
Target Source
26 © Hortonworks Inc. 2011–2018. All rights reserved
Design - Concurrency
• Inserts are never in conflict since Hive does not enforce unique constraints
• Write Set tracking to prevent Write-Write conflicts in concurrent transactions
• Lock Manager
• DDL operations acquire eXclusive locks – metadata operations
• Read operations acquire Shared locks
27 © Hortonworks Inc. 2011–2018. All rights reserved
Tooling
• SHOW COMPACTIONS
• Hadoop Job ID
• SHOW TRANSACTIONS
• SHOW LOCKS
• What a lock is blocked on
• ABORT TRANSACTIONS txnid1, txnid2….
28 © Hortonworks Inc. 2011–2018. All rights reserved
Other Subsystems
• Result Set Caching
• Is it valid for current reader?
• Materialized Views
• Incremental View Manitenance
• Spark
• HiveWarehouseConnector: HS2 + LLAP
29 © Hortonworks Inc. 2011–2018. All rights reserved
Streaming Ingest API
• Connection – Hive Table
• Begin transaction
• Commit/Abort transaction
• org.apache.hive.streaming.StreamingConnection
• Writer
• Write records
• org.apache.hive.streaming.RecordWriter
• Append Only via this API
• Update/Delete via SQL
• Optimized for Write operations
• Requires more aggressive Compaction for efficient reads
• Supports dynamic partitioning in a single transaction
30 © Hortonworks Inc. 2011–2018. All rights reserved
Limitations
• Transaction Manager
• State is persisted in the metastore RDBMS
• Begin/Commit/Abort
• Metastore calls
31 © Hortonworks Inc. 2011–2018. All rights reserved
Future
32 © Hortonworks Inc. 2011–2018. All rights reserved
Future Work
• Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK
• Performance
• Smarter Compaction
• Finer grained concurrency management/conflict detection
• Read Committed w/Lock Based scheduling
• Better Monitoring/Alerting
• User define Primary Key
• Transactional Tables sorted on PK
33 © Hortonworks Inc. 2011–2018. All rights reserved
Further Reading
34 © Hortonworks Inc. 2011–2018. All rights reserved
Etc
• Documentation
• https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
• https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2
• Follow/Contribute
• https://issues.apache.org/jira/browse/HIVE-
14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions
• user@hive.apache.org
• dev@hive.apache.org
35 © Hortonworks Inc. 2011–2018. All rights reserved
Credits
• Alan Gates
• Sankar Hariappan
• Prasanth Jayachandran
• Eugene Koifman
• Owen O’Malley
• Saket Saurabh
• Sergey Shelukhin
• Gopal Vijayaraghavan
• Wei Zheng
36 © Hortonworks Inc. 2011–2018. All rights reserved
Thank You
Ad

More Related Content

What's hot (20)

Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
ELK Stack
ELK StackELK Stack
ELK Stack
Phuc Nguyen
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
DataWorks Summit
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 

Similar to Transactional operations in Apache Hive: present and future (20)

Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
Abdelkrim Hadjidj
 
ACID Transactions in Hive
ACID Transactions in HiveACID Transactions in Hive
ACID Transactions in Hive
Eugene Koifman
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
HiveACIDPublic
HiveACIDPublicHiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
Artem Ervits
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
Yahoo Developer Network
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
ACID Transactions in Hive
ACID Transactions in HiveACID Transactions in Hive
ACID Transactions in Hive
Eugene Koifman
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
Artem Ervits
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 

Transactional operations in Apache Hive: present and future

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Operations in Apache Hive DataWorks Summit, San Jose 2018 • Eugene Koifman
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • A bit of history • Current Functionality • Design • Future Plans • Closing Remarks
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Early Hive • Transactions • ACID: Atomicity, Consistency, Isolation, Durability • Atomicity - Rely on File System ‘rename’ • Insert into T partition(p=1) select …. - OK • Dynamic Partition Write – not OK • Multi-Insert statement – not OK • FROM <expr> Insert into A select … Insert Into B select … • Isolation - Lock Manager • S/X locks – not good for long running analytics
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Early Hive – Changing Existing Data • Drop <…> • Insert Overwrite = Truncate + Insert • Gets expensive if done often on small % of data
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Goals • Support ACID properties • Support SQL Update/Delete/Merge • Low rate of transactions • Not OLTP • Not a replacement for MySql or HBase
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Features – Hive 3
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables • Not all tables support transactional semantics • Managed Tables • No External tables or Storage Handler (Hbase, Druid, etc) • Fully ACID compliant • Single statement transactions • Cross partition/cross table transactions • Snapshot Isolation • Between Serializable and Repeatable Read
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Full CRUD  Supports Update/Delete/Merge  CREATE TABLE T(a int, b int) STORED AS ORC TBLPROPERTIES ('transactional'='true'); • Restrictions • Managed Table • Table cannot be sorted • Currently requires ORC File but anything implementing • AcidInputFormat/AcidOutputFormat • Bucketing is optional! • If upgrading from Hive 2 • Requires Major Compaction before Upgrading
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Insert only  CREATE TABLE T(a int, b int) TBLPROPERTIES ('transactional'='true’, ‘transactional_properties’=‘insert_only’); • Managed Table • Any storage format
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Convert from flat tables  ALTER TABLE T SET TBLPROPERTIES ('transactional'='true')  ALTER TABLE T(a int, b int) SET TBLPROPERTIES ('transactional'='true’, ‘transactional_properties’=‘true’); • Metadata Only operation • Compaction will eventually rewrite the table
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables - New In Hive 3 • Alter Table Add Partition… • Alter Table T Concatenate • Alter Table T Rename To…. • Export/Import Table • Non-bucketed tables • Load Data… Into Table … • Insert Overwrite • Fully Vectorized • Create Table As … • LLAP Cache • Predicate Push Down
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Design – Hive 3
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Transactional Tables – Insert Only • Transaction Manager • Begin transaction and obtain a Transaction ID • For each table, get a Write ID – determines location to write to create table TM (a int, b int) TBLPROPERTIES ('transactional'='true', 'transactional_properties'='insert_only'); insert into TM values(1,1); insert into TM values(2,2); insert into TM values(3,3); tm ── delta_0000001_0000001_0000 └── 000000_0 ── delta_0000002_0000002_0000 └── 000000_0 ── delta_0000003_0000003_0000 └── 000000_0
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Transaction Manager • Transaction State • Open, Committed, Aborted • Reader at Snapshot Isolation • A snapshot is the state of all transactions • High Water Mark + List of Exceptions tm ── delta_0000001_0000001_0000 └── 000000_0 ── delta_0000002_0000002_0000 └── 000000_0 ── delta_0000003_0000003_0000 └── 000000_0  Atomicity & Isolation
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Full CRUD • No in-place Delete - Append-only file system • Isolate readers from writers
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved ROW__ID • CREATE TABLE acidtbl (a INT, b STRING) STORED AS ORC TBLPROPERTIES ('transactional'='true'); Metadata Columns original_write_id bucket_id row_id current_write_id User Columns col_1: a : INT col_2: b : STRING ROW__ID
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Create • INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”); ROW__ID a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” delta_00001_00001/bucket_0000
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Delete • DELETE FROM acidTbl where a = 200; ROW__ID a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” ROW__ID a b { 1, 0, 1 } null null delta_00001_00001/bucket_0000 delete_delta_00002_00002/bucket_0000  Readers skip deleted rows
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Update • Update = delete + insert  UPDATE acidTbl SET b = “bar” where a = 300; ACID_PK a b { 1, 0, 0 } 100 “foo” { 1, 0, 1 } 200 “xyz” { 1, 0, 2 } 300 “bee” delta_00001_00001/bucket_0000 ACID_PK a b { 2, 0, 0 } 300 “bar” ACID_PK a b { 1, 0, 2 } null null delta_00003_00003/bucket_0000 delete_delta_00003_00003/bucket_0000
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Read • Ask Transaction Manager for Snapshot Information • Decide which deltas are relevant • Take all the files in delta_x_x/ and split them into chunks for each processing Task to work with • Localize all delete events from each delete_deleta_x_x/ to each task • Highly Compressed with ORC • Filter out all Insert events that have matching delete events • Requires an Acid aware reader – thus AcidInputFormat
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Design - Compactor • More Update operations = more delete events – make reads more expensive • Insert operations don’t add read overhead
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Design - Compactor • Compactor rewrites the table in the background • Minor compaction - merges delta files into fewer deltas • Major compactor merges deltas with base - more expensive • This amortizes the cost of updates and self tunes the tables • Makes ORC more efficient - larger stripes, better compression • Compaction can be triggered automatically or on demand • There are various configuration options to control when the process kicks in. • Compaction itself is a Map-Reduce job  Key design principle is that compactor does not affect readers/writers • Cleaner process – removes obsolete files • Requires Standalone metastore
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Merge Statement – SQL Standard 2011 (Hive 2.2) ID State County Value 1 CA LA 19.0 2 MA Norfolk 15.0 7 MA Suffolk 50.15 16 CA Orange 9.1 ID State Value 1 20.0 7 80.0 100 NH 6.0 MERGE INTO TARGET T USING SOURCE S ON T.ID=S.ID WHEN MATCHED THEN UPDATE SET T.Value=S.Value WHEN NOT MATCHED INSERT (ID,State,Value) VALUES(S.ID, S.State, S.Value) ID State County Value 1 CA LA 20.0 2 MA Norfolk 15.0 7 MA Suffolk 80.0 16 CA Orange 9.1 100 NH null 6.0
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved SQL Merge Target Source ACID_PK ID Stat e County Value { 2, 0, 1 } 1 CA LA 20.0 { 2, 0, 2 } 7 MA Suffolk 80.0 ACID_PK ID State County Value { 2, 0, 1 } 100 NH 6.0 delta_00002_00002/bucket_0000 delta_00002_00002_001/bucket_0000 Right Outer Join ON T.ID=S.ID ACID_PK Data { 1, 0, 1 } null { 1, 0, 3 } null delete_delta_00002_00002/bucket_0000 WHEN MATCHED WHEN NOT MATCHED
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Merge Statement Optimizations • Semi Join Reduction • aka Dynamic Runtime Filtering • On Tez only T.ID=S.ID Target Source ID in (1,7,100) T.ID=S.ID Target Source
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Design - Concurrency • Inserts are never in conflict since Hive does not enforce unique constraints • Write Set tracking to prevent Write-Write conflicts in concurrent transactions • Lock Manager • DDL operations acquire eXclusive locks – metadata operations • Read operations acquire Shared locks
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved Tooling • SHOW COMPACTIONS • Hadoop Job ID • SHOW TRANSACTIONS • SHOW LOCKS • What a lock is blocked on • ABORT TRANSACTIONS txnid1, txnid2….
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Other Subsystems • Result Set Caching • Is it valid for current reader? • Materialized Views • Incremental View Manitenance • Spark • HiveWarehouseConnector: HS2 + LLAP
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Streaming Ingest API • Connection – Hive Table • Begin transaction • Commit/Abort transaction • org.apache.hive.streaming.StreamingConnection • Writer • Write records • org.apache.hive.streaming.RecordWriter • Append Only via this API • Update/Delete via SQL • Optimized for Write operations • Requires more aggressive Compaction for efficient reads • Supports dynamic partitioning in a single transaction
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Limitations • Transaction Manager • State is persisted in the metastore RDBMS • Begin/Commit/Abort • Metastore calls
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Future
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved Future Work • Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK • Performance • Smarter Compaction • Finer grained concurrency management/conflict detection • Read Committed w/Lock Based scheduling • Better Monitoring/Alerting • User define Primary Key • Transactional Tables sorted on PK
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved Further Reading
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved Etc • Documentation • https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions • https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2 • Follow/Contribute • https://issues.apache.org/jira/browse/HIVE- 14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions • [email protected][email protected]
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Credits • Alan Gates • Sankar Hariappan • Prasanth Jayachandran • Eugene Koifman • Owen O’Malley • Saket Saurabh • Sergey Shelukhin • Gopal Vijayaraghavan • Wei Zheng
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Thank You

Editor's Notes

  • #21: Similar to LSM
  • #24: Target is the table inside the Warehouse Source table contains the changes to apply
  • #25: Update this for split update Runtime filtering, etc