SlideShare a Scribd company logo
Apache Hive on ACID
Alan Gates
Hive PMC Member
Co-founder Hortonworks
May 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History
 Hive only updated partitions
– INSERT...OVERWRITE rewrote an entire partition
– Forced daily or even hourly partitions
– Could add files to partition directory, file compaction was manual
 What about concurrent readers?
– Ok for inserts, but overwrite caused races
– There is a zookeeper lock manager, but…
 No way to delete or update rows
 No INSERT INTO T VALUES…
– Breaks some tools
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Do You Need ACID?
 Hadoop and Hive have always…
– Just said no to ACID
– Perceived as tradeoff for performance
 But, your data isn’t static
– It changes daily, hourly, or faster
– Sometimes it needs restated (late arriving data) or facts change (e.g. a user’s physical address)
– Loading data into Hive every hour is so 2010; data should be available in Hive as soon as it arrives
 We saw users implementing ad hoc solutions
– This is a lot of work and hard to get right
– Hive should support this as a first class feature
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
When Should You Use Hive’s ACID?
 NOT OLTP!!!
 Updating a Dimension Table
– Changing a customer’s address
 Delete Old Records
– Remove records for compliance
 Update/Restate Large Fact Tables
– Fix problems after they are in the warehouse
 Streaming Data Ingest
– A continual stream of data coming in
– Typically from Flume or Storm
 NOT OLTP!!!
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SQL Changes for ACID
 Since Hive 0.14
 New DML
– INSERT INTO T VALUES(1, ‘fred’, ...);
– UPDATE T SET (x = 5[, ...]) [WHERE ...]
– DELETE FROM T [WHERE ...]
– Supports partitioned and non-partitioned tables, WHERE clause can specify partition but not required
 Restrictions
– Table must have format that extends AcidInputFormat
• currently ORC
• work started on Parquet (HIVE-8123)
– Table must be bucketed and not sorted
• can use 1 bucket but this will restrict write parallelism
– Table must be marked transactional
• create table T(...) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES
('transactional'='true');
• Existing ORC tables that are bucketed can be marked transactional via ALTER
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ingesting Data Into Hive From a Stream
 Data is flowing in from generators in a stream
 Without this, you have to add it to Hive in batches, often every hour
– Thus your users have to wait an hour before they can see their data
 New interface in hive.hcatalog.streaming lets applications write small batches of
records and commit them
– Users can now see data within a few seconds of it arriving from the data generators
 Available for Apache Flume and Apache Storm
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design
 HDFS does not allow arbitrary writes
– Store changes as delta files
– Stitched together by client on read
 Writes get a transaction ID
– Sequentially assigned by metastore
 Reads get highest committed transaction & list of open/aborted transactions
– Provides snapshot consistency
– No exclusive locks required
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Not HBase
 Good
– Handles compactions for us
– Already has similar data model with LSM
 Bad
– When we started this there were no transaction managers for HBase, this requires transactions
– Hfile is column family based rather than columnar
– HBase focused on point lookups and range scans
• Warehousing requires full scans
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stitching Buckets Together
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Layout
 Partition locations remain unchanged
– Still warehouse/$db/$tbl/$part
 Bucket Files Structured By Transactions
– Base files $part/base_$tid/bucket_*
– Delta files $part/delta_$tid_$tid/bucket_*
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Input and Output Formats
 Created new AcidInput/OutputFormat
– Unique key is original transaction id, bucket, row id
 Reader returns correct version of row based on transaction state
 Also added raw API for compactor
– Provides previous events as well
 ORC implements new API
– Extends records with change metadata
• Add operation (d, u, i), latest transaction id, and key
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Transaction Manager
 Existing lock managers
– In memory - not durable
– ZooKeeper - requires additional components to install, administer, etc.
 Locks need to be integrated with transactions
– commit/rollback must atomically release locks
 We sort of have this database lying around which has ACID characteristics (metastore)
 Transactions and locks stored in metastore
 Uses metastore DB to provide unique, ascending ids for transactions and locks
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Transaction & Locking Model
 DML statements are auto-commit
 Snapshot isolation
– Reader will see consistent data for the duration of a query
 Current transactions can be displayed using SHOW TRANSACTIONS
 Three types of locks
– shared read
– shared write (can co-exist with shared read, but not other shared write)
– exclusive
 Operations require different locks
– SELECT, INSERT – shared read (inserts cannot conflict because there is no primary key)
– UPDATE, DELETE – shared write
– DROP, INSERT OVERWRITE – exclusive
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compaction
 Each transaction (or batch of transactions in streaming) creates a new delta directory
 Too many files = NameNode  and poor read performance due to fan in on merge
 Need to automatically compact files
– Initiated by metastore server, run as MR jobs in the cluster
– Can be manually initiated by user via ALTER TABLE COMPACT
 Minor compaction merges many deltas into one
– Run when there are more than 10 delta directories (configurable)
 Major compaction merges deltas with base and rewrites base
– Run when size of the deltas > 10% of the size of the base (configurable)
 Old files kept around until all readers are done with their snapshots, then cleaned up
– Compaction and data read/writes can be done in parallel with no need to pause the world
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Issues Found and (Some) Fixed
 Not GA ready in Hive 1.2 or 2.0, hope to have GA ready by 1.3 and 2.1
 Deadlocks in the RDBMS
– The way the Hive metastore used the RDBMS caused a lot of deadlocks – greatly improved
 Usability
– SHOW COMPACTIONS and SHOW LOCKS did not give users/admins enough information to successfully
determine who was blocking whom or what was getting compacted – improved, some work still to do
here
 Resilience
– System was easy to knock over when clients did silly things (like open 1M+ transactions) – improved,
though I am sure there are still some ways to kill it
– Initially compactor threads only run in 1 metastore instance – resolved, now can run in multiple instances
 Correctness
– Streaming ingest did not enforce proper bucket spraying – resolved
– Initial versions of the compactor had a race condition that resulted in record loss – resolved
– Adding a column to a table or changing a column’s type caused read time errors - resolved
– Updates can get lost when overlapping transactions update the same partition – HIVE-13395
 Performance
– Some work done here (e.g. making predicate push down work, efficient split combinations)
– Much still to be done
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Next: MERGE
 Standard SQL, added in SQL 2003
 Problem, today each UPDATE requires a scan of the partition or table
– There is no way to apply separate updates in a batch
 Allows upserts
 Use case:
– bring in batch from transactional/front end systems
– Apply as insert or updates (as appropriate) in one read/write pass
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Work
 Multi-statement transactions (BEGIN, COMMIT, ROLLBACK)
 Integration with LLAP
– Figure out how MVCC works with LLAP’s caching
– Build a write path through LLAP
 Lower the user burden
– Make the bucketing automatic so the user does not have to be aware of it
– Allow user to determine sort order of the table
– Eventually remove the transactional/non-transactional distinction in tables
 Improve monitoring and alerting facilities
– Make is easier for an admin to determine when the system is in trouble, e.g. the compactor is not
running or is failing on every run, there are too many open transactions, etc.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot (20)

PPTX
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
 
PPTX
Transactional SQL in Apache Hive
DataWorks Summit
 
PPTX
Hive acid-updates-summit-sjc-2014
alanfgates
 
PPTX
Ozone- Object store for Apache Hadoop
Hortonworks
 
PPTX
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
PPTX
ORC File - Optimizing Your Big Data
DataWorks Summit
 
PDF
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
PPTX
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
PPTX
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
 
PPTX
Apache Phoenix Query Server
Josh Elser
 
PDF
What is new in Apache Hive 3.0?
DataWorks Summit
 
PPTX
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
PPTX
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
DataWorks Summit
 
PPTX
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Mingliang Liu
 
PPTX
Transactional operations in Apache Hive: present and future
DataWorks Summit
 
PPTX
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
 
PDF
You Can't Search Without Data
Bryan Bende
 
PPTX
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
PPTX
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
 
Transactional SQL in Apache Hive
DataWorks Summit
 
Hive acid-updates-summit-sjc-2014
alanfgates
 
Ozone- Object store for Apache Hadoop
Hortonworks
 
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
 
Apache Phoenix Query Server
Josh Elser
 
What is new in Apache Hive 3.0?
DataWorks Summit
 
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
DataWorks Summit
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Mingliang Liu
 
Transactional operations in Apache Hive: present and future
DataWorks Summit
 
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
 
You Can't Search Without Data
Bryan Bende
 
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
 
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

PPTX
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
PPTX
Keynote apache bd-eu-nov-2016
alanfgates
 
PPTX
Hortonworks apache training
alanfgates
 
PPTX
Machine Learning in Big Data
DataWorks Summit/Hadoop Summit
 
PDF
Strata Stinger Talk October 2013
alanfgates
 
PPTX
Introduction to Hive
Uday Vakalapudi
 
PDF
Apache Spark Usage in the Open Source Ecosystem
Databricks
 
PPTX
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
PDF
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PDF
PySpark Best Practices
Cloudera, Inc.
 
PPTX
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
PPT
Hive Training -- Motivations and Real World Use Cases
nzhang
 
PDF
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
PDF
Python and Bigdata - An Introduction to Spark (PySpark)
hiteshnd
 
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
 
PDF
Hive Quick Start Tutorial
Carl Steinbach
 
PDF
Architecting a Next Generation Data Platform
hadooparchbook
 
PDF
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
Spark Summit
 
PDF
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
Spark Summit
 
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Keynote apache bd-eu-nov-2016
alanfgates
 
Hortonworks apache training
alanfgates
 
Machine Learning in Big Data
DataWorks Summit/Hadoop Summit
 
Strata Stinger Talk October 2013
alanfgates
 
Introduction to Hive
Uday Vakalapudi
 
Apache Spark Usage in the Open Source Ecosystem
Databricks
 
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PySpark Best Practices
Cloudera, Inc.
 
Harnessing Hadoop Distuption: A Telco Case Study
DataWorks Summit
 
Hive Training -- Motivations and Real World Use Cases
nzhang
 
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Python and Bigdata - An Introduction to Spark (PySpark)
hiteshnd
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
 
Hive Quick Start Tutorial
Carl Steinbach
 
Architecting a Next Generation Data Platform
hadooparchbook
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
Spark Summit
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
Spark Summit
 
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit
 
Ad

Similar to Hive ACID Apache BigData 2016 (20)

PPTX
ACID Transactions in Hive
Eugene Koifman
 
PPTX
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
PPTX
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
PDF
What is New in Apache Hive 3.0?
DataWorks Summit
 
PPTX
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
 
PPTX
LLAP: Building Cloud First BI
DataWorks Summit
 
PDF
Apache Hudi: The Path Forward
Alluxio, Inc.
 
PPTX
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
 
PPTX
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
 
PPTX
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
PPTX
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
PPTX
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
ACID Transactions in Hive
Eugene Koifman
 
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
What is New in Apache Hive 3.0?
DataWorks Summit
 
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
 
LLAP: Building Cloud First BI
DataWorks Summit
 
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
 
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
 
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
Ad

Recently uploaded (20)

PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Executive Business Intelligence Dashboards
vandeslie24
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 

Hive ACID Apache BigData 2016

  • 1. Apache Hive on ACID Alan Gates Hive PMC Member Co-founder Hortonworks May 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History  Hive only updated partitions – INSERT...OVERWRITE rewrote an entire partition – Forced daily or even hourly partitions – Could add files to partition directory, file compaction was manual  What about concurrent readers? – Ok for inserts, but overwrite caused races – There is a zookeeper lock manager, but…  No way to delete or update rows  No INSERT INTO T VALUES… – Breaks some tools
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Do You Need ACID?  Hadoop and Hive have always… – Just said no to ACID – Perceived as tradeoff for performance  But, your data isn’t static – It changes daily, hourly, or faster – Sometimes it needs restated (late arriving data) or facts change (e.g. a user’s physical address) – Loading data into Hive every hour is so 2010; data should be available in Hive as soon as it arrives  We saw users implementing ad hoc solutions – This is a lot of work and hard to get right – Hive should support this as a first class feature
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved When Should You Use Hive’s ACID?  NOT OLTP!!!  Updating a Dimension Table – Changing a customer’s address  Delete Old Records – Remove records for compliance  Update/Restate Large Fact Tables – Fix problems after they are in the warehouse  Streaming Data Ingest – A continual stream of data coming in – Typically from Flume or Storm  NOT OLTP!!!
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SQL Changes for ACID  Since Hive 0.14  New DML – INSERT INTO T VALUES(1, ‘fred’, ...); – UPDATE T SET (x = 5[, ...]) [WHERE ...] – DELETE FROM T [WHERE ...] – Supports partitioned and non-partitioned tables, WHERE clause can specify partition but not required  Restrictions – Table must have format that extends AcidInputFormat • currently ORC • work started on Parquet (HIVE-8123) – Table must be bucketed and not sorted • can use 1 bucket but this will restrict write parallelism – Table must be marked transactional • create table T(...) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); • Existing ORC tables that are bucketed can be marked transactional via ALTER
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ingesting Data Into Hive From a Stream  Data is flowing in from generators in a stream  Without this, you have to add it to Hive in batches, often every hour – Thus your users have to wait an hour before they can see their data  New interface in hive.hcatalog.streaming lets applications write small batches of records and commit them – Users can now see data within a few seconds of it arriving from the data generators  Available for Apache Flume and Apache Storm
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Design  HDFS does not allow arbitrary writes – Store changes as delta files – Stitched together by client on read  Writes get a transaction ID – Sequentially assigned by metastore  Reads get highest committed transaction & list of open/aborted transactions – Provides snapshot consistency – No exclusive locks required
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Not HBase  Good – Handles compactions for us – Already has similar data model with LSM  Bad – When we started this there were no transaction managers for HBase, this requires transactions – Hfile is column family based rather than columnar – HBase focused on point lookups and range scans • Warehousing requires full scans
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stitching Buckets Together
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Layout  Partition locations remain unchanged – Still warehouse/$db/$tbl/$part  Bucket Files Structured By Transactions – Base files $part/base_$tid/bucket_* – Delta files $part/delta_$tid_$tid/bucket_*
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Input and Output Formats  Created new AcidInput/OutputFormat – Unique key is original transaction id, bucket, row id  Reader returns correct version of row based on transaction state  Also added raw API for compactor – Provides previous events as well  ORC implements new API – Extends records with change metadata • Add operation (d, u, i), latest transaction id, and key
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Transaction Manager  Existing lock managers – In memory - not durable – ZooKeeper - requires additional components to install, administer, etc.  Locks need to be integrated with transactions – commit/rollback must atomically release locks  We sort of have this database lying around which has ACID characteristics (metastore)  Transactions and locks stored in metastore  Uses metastore DB to provide unique, ascending ids for transactions and locks
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Transaction & Locking Model  DML statements are auto-commit  Snapshot isolation – Reader will see consistent data for the duration of a query  Current transactions can be displayed using SHOW TRANSACTIONS  Three types of locks – shared read – shared write (can co-exist with shared read, but not other shared write) – exclusive  Operations require different locks – SELECT, INSERT – shared read (inserts cannot conflict because there is no primary key) – UPDATE, DELETE – shared write – DROP, INSERT OVERWRITE – exclusive
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compaction  Each transaction (or batch of transactions in streaming) creates a new delta directory  Too many files = NameNode  and poor read performance due to fan in on merge  Need to automatically compact files – Initiated by metastore server, run as MR jobs in the cluster – Can be manually initiated by user via ALTER TABLE COMPACT  Minor compaction merges many deltas into one – Run when there are more than 10 delta directories (configurable)  Major compaction merges deltas with base and rewrites base – Run when size of the deltas > 10% of the size of the base (configurable)  Old files kept around until all readers are done with their snapshots, then cleaned up – Compaction and data read/writes can be done in parallel with no need to pause the world
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Issues Found and (Some) Fixed  Not GA ready in Hive 1.2 or 2.0, hope to have GA ready by 1.3 and 2.1  Deadlocks in the RDBMS – The way the Hive metastore used the RDBMS caused a lot of deadlocks – greatly improved  Usability – SHOW COMPACTIONS and SHOW LOCKS did not give users/admins enough information to successfully determine who was blocking whom or what was getting compacted – improved, some work still to do here  Resilience – System was easy to knock over when clients did silly things (like open 1M+ transactions) – improved, though I am sure there are still some ways to kill it – Initially compactor threads only run in 1 metastore instance – resolved, now can run in multiple instances  Correctness – Streaming ingest did not enforce proper bucket spraying – resolved – Initial versions of the compactor had a race condition that resulted in record loss – resolved – Adding a column to a table or changing a column’s type caused read time errors - resolved – Updates can get lost when overlapping transactions update the same partition – HIVE-13395  Performance – Some work done here (e.g. making predicate push down work, efficient split combinations) – Much still to be done
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Next: MERGE  Standard SQL, added in SQL 2003  Problem, today each UPDATE requires a scan of the partition or table – There is no way to apply separate updates in a batch  Allows upserts  Use case: – bring in batch from transactional/front end systems – Apply as insert or updates (as appropriate) in one read/write pass
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Future Work  Multi-statement transactions (BEGIN, COMMIT, ROLLBACK)  Integration with LLAP – Figure out how MVCC works with LLAP’s caching – Build a write path through LLAP  Lower the user burden – Make the bucketing automatic so the user does not have to be aware of it – Allow user to determine sort order of the table – Eventually remove the transactional/non-transactional distinction in tables  Improve monitoring and alerting facilities – Make is easier for an admin to determine when the system is in trouble, e.g. the compactor is not running or is failing on every run, there are too many open transactions, etc.
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You