OracleStore: A Highly Performant RawStore Implementation for Hive Metastore

ORACLESTORE:
A HIGHLY PERFORMANT
RAWSTORE
IMPLEMENTATION FOR
HIVE METASTORE
Chris Drome, Sr. Principal Engineer
Jin Sun, Principal Engineer

2
IT’S ABOUT SCALE
• 20+ clusters
• 2-4 dedicated metastore servers per cluster
• More including HS2 instances
• Large cluster:
• ~6-9M partitions over ~6K tables
• Largest table has ~1.1M partitions
• Medium cluster:
• ~3-6M partitions over ~3-4K tables
• Largest table has ~250K partitions

3
SO WHAT?
• Client-side timeouts
• Queries over large number of partitions
• socket.timeout=200s; socket.lifetime=300s
• Increased load and memory usage on metastore
• Concurrent connections across clients (server.max.threads=4000)
• Long running queries to retrieve large amounts of data
• Retrieve/convert/serialize duplicate data
• Abandoned/rerun operations
• Increased load on Oracle
• Concurrent connections across metastores
• Read/write operations on large tables takes time
• Queries start to back up

4
Final Thoughts
Background & Issues
Test Results
Implementation Details
Goals

5
METASTORE ARCHITECTURE
Oracle
SQL
Server
DerbyMySQL
Postgre
SQL
CLI CLI CLI
Thrift Server
Metastore Core Logic
ObjectStore/DirectSQL
DBCP
JDBC
ORM Model
DataNucleus

6
WHAT IS DATANUCLEUS?
• ORM framework
• ORM model classes
• ORM model object-relational mapping
• Executes queries via JDOQL
• Generates DB-specific SQL
• Black box limits control; hampers debugging
• Requires two sets of classes
• ORM classes to interact with the DB
• Thrift classes in core logic and over wire
• Conversion between the two
• Object-relational impedance mismatch
• Limits opportunity to optimize schema
• Tendency to duplicate data needlessly
• Limits control of SQL
• Limits opportunity to optimize queries
• Clean-up thread to identify abandoned ids
WHAT IT MEANS

7
WHAT IS DIRECTSQL?
• Custom code to improve performance
• Focus on get_partition operations
 Yahoo! added drop_partitions
• DB agnostic SQL instead of JDOQL
• Timeouts and fallback to ORM
• Works through DataNucleus
• Adds tables to evaluate filters
• Batch retrieval of lists of objects
• Greatly improves performance
• Large amount of code for specific
functionality
• Requires code to identify underlying DB
• Failure/timeout results in long latency
• Constrained by DataNucleus
• Does not address fundamental issues
• Requires self-JOIN for each filter condition
• Deep table (# partitions x avg # partition
columns)
WHAT IT MEANS

8
Final Thoughts
Background & Issues
Test Results
Goals

9
GOALS
• Reduce load on Oracle
• Fix database schema inefficiencies
 Foundation for more performant queries
 Reduce the storage/computational requirements of data
 Better utilize native constructs and SQL
• Fix database layer inefficiencies
 Improve performance characteristics of SQL
 Improve maintainability of code
• Address repetitive/redundant requests for data (future)
• Reduce payload over wire
• Optimize client-server communication protocol (future)

10
Final Thoughts
Background & Issues
Test Results
Goals

11
WHAT SHOULD WE DO?
• Recognize there is a problem
• Understand effects of original schema on query performance
• Identify pain points and areas for improvement
• Design more performant schema
• Write OracleStore
• Leverage lessons learned from DirectSQL
• Create migration and validation toolset
• How to migrate existing data?
• How to ensure data integrity?
• How to rollback if necessary?
• Deploy it!

12
LESSONS LEARNED
• Object structure should not dictate schema
• Operations on Tables/Partitions are king
• Most frequent and most expensive operations
• Group data specific to Tables/Partitions
• Promote table columns to TBLS/PARTITIONS
• Direct references to all satellite data
• Invert relationships between member objects
• Merge tables
• Gets rid of needless JOINs
• Deduplicate, deduplicate, deduplicate

13
LESSONS LEARNED
Table ObjectStore OracleStore +/- Comment
SDS ~3.3M 6 n/a Restructure; Dedup
SERDES ~3.3M 11 n/a Merge; Dedup
SDS JOIN SERDES ~3.3M 15 -100% Dedup
COLUMNS_V2 ~24.9M ~0.7M -97.2% Dedup
TABLE_PARAMS ~0.06M n/a n/a Merge; Dedup
PARTITION_PARAMS ~11.5M ~0.1M -98.9% Merge; Dedup
SD_PARAMS 0 0 0.0%
SERDE_PARAMS ~5.4M ~0.02M -99.7% Dedup

14
SCHEMA REDESIGN
• OracleStore tables should co-exist with ObjectStore tables
• Utilize native constructs and features
• SEQUENCE
• FOREIGN KEY … CASCADE
• LIMIT
• Oracle built-in functions
• One degree of separation from TBLS/PARTITIONS
• Attribute tables
• PARAM tables
• De-emphasize importance of SDS
• Promote SDS.LOCATION, SDS.CD_ID to TBLS/PARTITIONS
• No indexes on Oracle tables yet

15
ORACLESTORE IMPLEMENTATION
• OracleStore implements RawStore
• HBaseStore (HIVE-9453)
• OracleStore co-exists with ObjectStore
• Code changes are additive
 Annotations to identify read/write operations
 Log messages display performance numbers
 HybridRawStoreProxy for tee’d reads/writes
• Aggressive deduplication of data
• Batched retrieval of lists of objects

16
METASTORE ARCHITECTURE
Oracle
CLI CLI CLI
Thrift Server
Metastore Core Logic
SQL
Server
DerbyMySQL
Postgre
SQL
DBCP
JDBC
ORM Model
DataNucleus
DBCP
DataNucleus
OracleStore
JDBC

17
Final Thoughts
Background & Issues
Test Results
Goals

18
METASTORE OPERATIONS
Operation % of Total
get_table 54.1%
get_database 10.0%
get_function 6.9%
get_partitions_by_filter 6.7%
add_partitions 3.9%
get_delegation_token 3.6%
drop_partitions 3.0%
get_partitions_with_auth 3.0%
get_all_databases 2.4%
get_partitions_by_expr 2.1%
other 4.3%

19
• # databases: 643
• # functions: 6
• # tables: 12170

20
• # partitions (total): 3337925
• # partitions (table): 88261
• # partition columns: 6
• dt=20170101/p1=a/p2=b/p3=c/p4=d/p5=e
• Range query on dt
• 1 hour and 4 hour increments
• OracleStore 2971ms
latency on 13K partitions
• 46x faster than ObjectStore
• 13x faster than DirectSQL

21
• dt=20170101/p1=a/p2=b/p3=c/p4=d/p5=e
• Range query on dt and
equality filter on p1=abc
• OracleStore latency is
relatively constant at this scale

22
• dt=20170101/p1=a/p2=b/p3=c/p4=d/p5=e
• Range query on dt and
equality filter on p1=abc
and p2=xyz
• OracleStore latency is
relatively constant at this scale

23
METASTORE OPERATIONS (AUDIT)
get_table
get_function
get_database
get_partition_with_auth
get_all_databases
get_partitions_ps_with_auth
get_partitions_names_ps
get_partitions_by_filter
get_index_names
get_partition
get_indexes
get_all_tables
get_multi_table
get_partitions
get_partition_names
get_tables
get_databases
get_table_statistics_req

24
get_partitions
get_table
get_partitions_by_filter
get_partition_with_auth
get_multi_table
get_partitions_ps_with_auth
get_function
get_partitions_names_ps
get_all_databases
get_database
get_partition
get_partition_names
get_indexes
get_index_names
get_all_tables
get_tables
get_databases
get_table_statistics_req

25

26

27
Final Thoughts
Background & Issues
Test Results
Goals

28
SHIP IT
• Deployed to 4 clusters
• Configured with HybridRawStoreProxy
• Tee writes to both sets of tables
• Provides rollback path
• Scheduled validation process
• Verifies data integrity
• Compares every object and reports differences
• HIVE-14870

29
FUTURE WORK
• Reduce impact of redundant calls
• HIVE-9453 introduces per query object cache
• Optimize Thrift layer communication protocol
• Lessons learned from schema redesign
• Thrift objects should promote deduplication of data

OracleStore: A Highly Performant RawStore Implementation for Hive Metastore

31
/* CODE COMMENTS */
• Use sparingly, we don't want to devolve into another DataNucleus...
• Get partition objects for the query using direct SQL queries, to avoid bazillion queries created by
DN retrieving stuff for each object individually.
• Essentially it's an object join. DN could do this for us, but it issues queries separately for every
object, which is suboptimal.
• Makes shallow copy of a list to avoid DataNucleus mucking with our objects.
• DataNucleus objects get detached all over the place for no (real) reason. So let's not use them
anywhere unless absolutely necessary.
• We have to get mtable again because DataNucleus.
• We need Partition-s for firing events and for result; DN needs MPartition-s to drop. Great... Maybe
we could bypass fetching MPartitions by issuing direct SQL deletes.

32
• job_ts, dt
• Query for specific values of
job_ts, dt

33
OBJECTSTORE TBLS
CREATE TABLE TBLS (
TBL_ID NUMBER
CREATE_TIME NUMBER
DB_ID NUMBER
LAST_ACCESS_TIME NUMBER
OWNER VARCHAR
RETENTION NUMBER
SD_ID NUMBER
TBL_NAME VARCHAR
TBL_TYPE VARCHAR
VIEW_EXPANDED_TEXT CLOB
VIEW_ORIGINAL_TEXT CLOB
)
CREATE TABLE V2_TBLS (
TBL_ID NUMBER
DB_ID NUMBER
SD_ID NUMBER
CD_ID NUMBER
SD_PARAM_ID NUMBER
SERDE_PARAM_ID NUMBER
NAME VARCHAR
TYPE VARCHAR
OWNER_NAME VARCHAR
LOCATION VARCHAR
RETENTION NUMBER
CREATION_TIME NUMBER
LAST_MODIFIED_TIME NUMBER
LAST_ACCESS_TIME NUMBER
BUCKET_ID NUMBER
NUM_BUCKETS NUMBER
VIEW_EXPANDED_TEXT CLOB
VIEW_ORIGINAL_TEXT CLOB
)
ORACLESTORE V2_TBLS

34
OBJECTSTORE SDS
CREATE TABLE SDS (
SD_ID NUMBER
CD_ID NUMBER
INPUT_FORMAT VARCHAR
IS_COMPRESSED NUMBER
LOCATION VARCHAR
NUM_BUCKETS NUMBER
OUTPUT_FORMAT VARCHAR
SERDE_ID NUMBER
IS_STOREDASSUBDIRECTORIES NUMBER
)
CREATE TABLE SERDES (
SERDE_ID NUMBER
NAME VARCHAR
SLIB VARCHAR
)
CREATE TABLE V2_SDS (
SD_ID NUMBER
HASHCODE NUMBER
IS_COMPRESSED NUMBER
IS_STOREDASSUBDIRECTORIES NUMBER
INPUT_FORMAT VARCHAR
OUTPUT_FORMAT VARCHAR
SERDE_NAME VARCHAR
SERDE_LIB VARCHAR
)
ORACLESTORE V2_SDS

35
OBJECTSTORE GET_TABLE
SELECT DISTINCT ... FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON
A0.DB_ID = B0.DB_ID WHERE A0.TBL_NAME = ? AND B0."NAME" = ?
SELECT ... FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON A0.DB_ID =
B0.DB_ID LEFT OUTER JOIN SDS C0 ON A0.SD_ID = C0.SD_ID WHERE
A0.TBL_ID = ?
SELECT ... FROM TABLE_PARAMS A0 WHERE A0.TBL_ID = ?
SELECT ... FROM PARTITION_KEYS A0 WHERE A0.TBL_ID = ? AND
A0.INTEGER_IDX >= 0 ORDER BY NUCORDER0
SELECT B0.CD_ID FROM SDS A0 LEFT OUTER JOIN CDS B0 ON A0.CD_ID =
B0.CD_ID WHERE A0.SD_ID = ?
SELECT COUNT(*) FROM COLUMNS_V2 THIS WHERE CD_ID=?
SELECT ... FROM COLUMNS_V2 A0 WHERE A0.CD_ID = ? ORDER BY
NUCORDER0
SELECT ... FROM SDS A0 LEFT OUTER JOIN SERDES B0 ON A0.SERDE_ID =
B0.SERDE_ID WHERE A0.SD_ID = ?
SELECT ... FROM SERDE_PARAMS A0 WHERE A0.SERDE_ID = ?
...
SELECT ... FROM V2_TBLS WHERE DB_ID = ? AND NAME = ?
SELECT ... FROM V2_PARTITION_COLS WHERE TBL_ID = ? ORDER BY
POSITION ASC
SELECT ... FROM V2_SDS WHERE SD_ID = ?
SELECT ... FROM V2_TBL_COLS WHERE CD_ID = ? ORDER BY POSITION ASC
SELECT ... FROM V2_SD_PARAMS WHERE SD_PARAM_ID = ?
SELECT ... FROM V2_SERDE_PARAMS WHERE SERDE_PARAM_ID = ?
SELECT ... FROM V2_TBL_PARAMS WHERE TBL_ID = ?
ORACLESTORE GET_TABLE

36
Object-Relational Mapping is the
Vietnam of our industry
- Ted Neward

37
ORM ALTERNATIVES
… nice quick initial development, and a big drain on your resources further on in the
project when tracking ORM related bugs and inefficiencies
• Hibernate
• HQL; generated SQL; generated skeleton code; de facto driver of JPA
• jOOq
• Light database mapping; jOOq DSL; generated SQL; generated code
• Apache Cayenne
• GUI; generated SQL; generated skeleton code; nested contexts
• TopLink
• Commercial

38
GET_TABLE ISSUES
• get_table request is non-trivial
• Requires 19 queries to create one Table object
 6 multi-table JOIN queries with tables containing millions of records
 3 queries to populate auxiliary parameters
 4 COUNT queries to determine existence
• get_table is called multiple times during plan generation
• Unnecessarily query the same data
• Potential data consistency problems

39
THRIFT OBJECT ISSUES
• Thrift objects map to ORM objects one-to-one
• Inherits inefficiencies in the ORM model
• Duplicates data across lists of objects
• Requires conversion between model and Thrift objects

OracleStore: A Highly Performant RawStore Implementation for Hive Metastore

Recommended

More Related Content

What's hot (20)

Similar to OracleStore: A Highly Performant RawStore Implementation for Hive Metastore (20)

More from DataWorks Summit (20)

Recently uploaded (20)

OracleStore: A Highly Performant RawStore Implementation for Hive Metastore

Editor's Notes