SlideShare a Scribd company logo
High Scale Relational Storage at
Salesforce built with Apache HBase
and Apache Phoenix
โ€‹โ€ฏAndrew Purtell
โ€‹โ€ฏArchitect, Cloud Storage
apurtell@salesforce.com
โ€‹โ€ฏ@akpurtell
โ€‹โ€ฏ
v3
โ€‹โ€ฏSafe harbor statement under the Private Securities Litigation Reform Act of 1995:
โ€‹โ€ฏThis presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could di๏ฌ€er materially from the results expressed
or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-
looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other ๏ฌnancial items and any
statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new,
planned, or upgraded services or technology developments and customer contracts or use of our services.
โ€‹โ€ฏThe risks and uncertainties referred to above include โ€“ but are not limited to โ€“ risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible ๏ฌ‚uctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our
service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger
enterprise customers. Further information on potential factors that could a๏ฌ€ect the ๏ฌnancial results of salesforce.com, inc. is included in our
annual report on Form 10-K for the most recent ๏ฌscal year and in our quarterly report on Form 10-Q for the most recent ๏ฌscal quarter.
These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section
of our Web site.
โ€‹โ€ฏAny unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available
and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features
that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Safe Harbor
โ€‹โ€ฏArchitect, Cloud Storage at Salesforce.com
โ€ขโ€ฏ Data Platform Team
โ€‹โ€ฏ
Open Source Contributor, since 2007
โ€ขโ€ฏ Committer, PMC, and Project Chair, Apache HBase
โ€ขโ€ฏ Committer and PMC, Apache Phoenix
โ€ขโ€ฏ Committer, PMC, and Project Chair, Apache Bigtop
โ€ขโ€ฏ Member, Apache Software Foundation
Distributed Systems Nerd, since 1997
whoami
โ€‹โ€ฏMotivation
โ€‹โ€ฏOpen Source and the Data Platform Team
โ€‹โ€ฏWhat are Apache HBase and Apache Phoenix?
โ€‹โ€ฏHBase@Salesforce
โ€ขโ€ฏ The View from 30Kft
โ€ขโ€ฏ Keeping Up Appearances
โ€ขโ€ฏ Engineering The Whole Stack Holistically
โ€‹โ€ฏQ&A
Agenda
Motivation
The Data Management Challenge
Scale data requires a single platform for analysis
(โ€œdata gravityโ€)
Salesforce already manages over 100B records of
customer data today
More than 3 billion transactions per day on the
platform
Exponential growth rate
โ€‹โ€ฏBatch and stream compute data locality
requirements
โ€‹โ€ฏService continuity requirements
โ€‹โ€ฏSystems of record provide reliable, highly
available, and secure data storage
โ€ขโ€ฏ SObjects: Traditional Salesforce platform objects
โ€ขโ€ฏ Salesforce Files: Blob storage
โ€ขโ€ฏ BigObjects: Scale out storage for immutable data
BigObjects
Systems of Record
โ€ขโ€ฏ Transactional data. Rows can be added and
updated
โ€ขโ€ฏ Example: Accounts, Contacts, Custom Objects
SObjects
BigObjects
Salesforce
Files
โ€ขโ€ฏ New Object type for immutable data.
โ€ขโ€ฏ Optimized for large volumes of data
โ€ขโ€ฏ Example: event data, purchase history, product
usage data
โ€ขโ€ฏ Blob storage for semi- or un-structured data
โ€ขโ€ฏ Example: CSV extracts from external systems,
weblogs, monitoring logs
Platform Connect
External Object
โ€ขโ€ฏ New proxy object connected to an external
oData source
Data Pipelines
Snapshot data from
SObjects / External Objects
Manipulate and analyze data
sets
โ€ขโ€ฏ Transformations
โ€ขโ€ฏ Joins
โ€ขโ€ฏ Calculations and enrichment
Data
Snapshot
External
Objects
Salesforce
Files
BigObjects
Data
Pipeline
Data Pipelines
Snapshot data from
SObjects / External Objects
Manipulate and analyze data
sets
โ€ขโ€ฏ Transformations
โ€ขโ€ฏ Joins
โ€ขโ€ฏ Calculations and enrichment
Apache Pig + Hadoop for
processing framework and
scripting language
Apache HBase for BigObjects
persistence
Apache Phoenix for indexing
and relational access
Data
Snapshot
External
Objects
Salesforce
Files
BigObjects
Data
Pipeline
Data Management Services
Customer 360
Enrich customer
pro๏ฌle w/ data from
external systems
Audit & TrackingData Retention Data Quality Connected
Products
Track what users are
doing on platform
Compliance
Keep data for audit
purposes
Maintain scalability
of operational
systems
Data enrichment
Data integration
Data cleansing
Event stream archive
O๏ฌ„ine analysis and
data mining
Salesforce Shield
Persist the complete ๏ฌeld
change history for up to ten
years with Field Audit Trail
Persist user event capture for
Event Monitoring and Audit
Support scale out batch and
stream compute for Backup,
DR, Threat Detection, etc.
Infrastructure Services
Network Services
Application Services
Secure
Data
Centers
Backup and
Disaster
Recovery
HTTPS
Encryption
Penetration
Testing
Advanced
Threat Detection
Identity &
Single Sign On
Two Factor
Authentication
User Roles &
Permissions
Field & Row
Level Security
Secure
Firewalls
Real-time
replication
Password
Policies
Third Party
Certi๏ฌcations
IP Login
Restrictions
Customer
Audits
Salesforce Shield
Platform
Encryption
Event
Monitoring
Field Audit Trail
Open Source and the Data Platform Team
The things we build are better
โ€ขโ€ฏ Building software with Open Source in mind is inherently bene๏ฌcial (encourages loose coupling,
cohesiveness, quality)
โ€ขโ€ฏ Positive external pressure to make the right software engineering decisions
โ€‹โ€ฏThe things we don't build are better
โ€ขโ€ฏ The only thing better than writing a great component is not writing it
โ€ขโ€ฏ Smart engineers avoid โ€œNot-Invented-Hereโ€ syndrome
โ€‹โ€ฏWe are happier
โ€ขโ€ฏ Open Source extends pride in ownership and visibility beyond the company walls
โ€ขโ€ฏ We attract better coders who gravitate towards companies that do open source work
โ€‹โ€ฏWe make the world a better place
How Open Source Helps Salesforce
โ€‹โ€ฏNo forking
โ€ขโ€ฏ Fork de๏ฌned as: a departure from the open source repository signi๏ฌcant enough to prevent us from
contributing patches back or applying patches from the upstream open source repository
โ€‹โ€ฏInternal repositories as change bu๏ฌ€er with local release numbering
โ€ขโ€ฏ Local repos for HBase, Phoenix, all dependencies (Hadoop, ZooKeeper, etc.)
โ€ขโ€ฏ Fast forwarded to new upstream releases after consideration
โ€ขโ€ฏ Updates to local repos are all manual by design
โ€ขโ€ฏ Hadoop is a special snow๏ฌ‚ake
โ€‹โ€ฏChange the upstream repositories ๏ฌrst
โ€ขโ€ฏ Almost all changes begin as patches developed using upstream repositories
โ€ขโ€ฏ Only critical bug ๏ฌxes are an exception, or where we require a change locally meantime while working with
a slow moving upstream community
How We Contribute To And Use Open Source
โ€‹โ€ฏDistributed systems engineers
โ€‹โ€ฏStorage systems architects
โ€‹โ€ฏOpen source contributors
โ€ขโ€ฏ Project Chairs: Apache HBase, Phoenix, Bigtop
โ€ขโ€ฏ Committers and PMC: Apache HBase, Phoenix, Pig, Bigtop, Incubator
โ€ขโ€ฏ Mentors: Apache Phoenix, NiFi, Trafodion
โ€ขโ€ฏ Hundreds of commits per year
Who We Are
Apache HBase and Apache Phoenix
A new scale out relational storage option
โ€‹โ€ฏA high performance horizontally scalable datastore engine for Big Data suitable as the store of
record for mission critical data
Apache HBase
โ€‹โ€ฏAn emerging platform for scale out relational datastores
Apache HBase
Apache Kylin
Apache HBase
โ€‹โ€ฏA founding member of the
Hadoop pantheon
โ€ขโ€ฏ Introduced as a Hadoop โ€˜contribโ€™
module in 2007
โ€‹โ€ฏTablespaces
โ€‹โ€ฏNot like a spreadsheet, a โ€œsparse, consistent, distributed, multi-dimensional, sorted mapโ€
HBase Data Model
HBase Scalability
RegionServers
Table A
Table B
Splits
Assignments
Regions
How is HBase Di๏ฌ€erent from a RDBMS?
RDBMS HBase
Data layout Row oriented Column oriented
Transactions Multi-row ACID
Single row or adjacent row
groups only
Query language SQL None (API access)
Joins Yes No
Indexes On arbitrary columns Single row index only
Max data size Terabytes Petabytes*
R/W throughput limits
1000s of operations per
second
Millions of operations per
second*
* - No architectural upper bound on data size or aggregate throughput
โ€‹โ€ฏ1969: CODASYL (network database)
โ€‹โ€ฏ1979: First commercial SQL RDBMs
โ€‹โ€ฏ1990: Transaction processing on SQL now popular
โ€‹โ€ฏ1993: Multidimensional databases
โ€‹โ€ฏ1996: Enterprise Data Warehouses
โ€‹โ€ฏ2006: Hadoop and other โ€œbig dataโ€ technologies
โ€‹โ€ฏ2008: NoSQL
โ€‹โ€ฏ2011: SQL on Hadoop
โ€‹โ€ฏ2014: Interactive analytics on Hadoop and NoSQL with SQL
โ€‹โ€ฏWhy?
SQL: In and Out of Fashion
From โ€œSQL On Everything, In Memoryโ€ by Julian Hyde, Strata NYC 2014
โ€‹โ€ฏImplementing structured queries well is hard
โ€ขโ€ฏ Systems cannot just โ€œrun the queryโ€ as written
โ€ขโ€ฏ Relational systems require the algebraic operators, a query planner, an optimizer, metadata, statistics, etc.
โ€‹โ€ฏHowever, the result is very useful to non-technical users
โ€ขโ€ฏ Dumb queries (e.g. tool generated) can still get high performance
โ€ขโ€ฏ Adding new algorithms or reorganizations of physical data layouts or migrations from one data store to
another are transparent
โ€‹โ€ฏWhat about blending the scale and performance of non-relational scale out stores with the ease
of use of SQL?
SQL: In and Out of Fashion
โ€‹โ€ฏA relational system built on HBase
โ€ขโ€ฏ Reintroduces the familiar declarative SQL interface to data
โ€ขโ€ฏ Reintroduces typed data and query optimizations possible with it
โ€ขโ€ฏ Secondary indexes, joins, query optimization, statistics, โ€ฆ
โ€ขโ€ฏ Integrates with just about everything as a JDBC data source
โ€‹โ€ฏWith some new advantages
โ€ขโ€ฏ Dynamic columns extend schema at runtime
โ€ขโ€ฏ Schema is versioned โ€“ for free by HBase โ€“ allowing ๏ฌ‚ashback queries using prior versions of metadata
โ€‹โ€ฏAnd new challenges
โ€ขโ€ฏ Distributed transactions are hard; a work in progress building on Tephra*
โ€œPutting the SQL back into NoSQLโ€
Apache Phoenix
* - Tephra open source transaction engine: http://blog.cask.co/2014/07/meet-tephra-an-open-source-transaction-engine-2/
Apache Phoenix
Supported?
CREATE / DROP / ALTER TABLE Yes
UPSERT / DELETE Yes
SELECT Yes
WHERE / HAVING Yes
GROUP BY / ORDER BY Yes
LIMIT Yes
JOIN Yes, with limitations
Views Yes
Secondary indexes Yes
Statistics and query optimization Yes
Transactions No, work in progress
Phoenix Integration With HBase
RegionServers
Regions
JDBC
driver
Coprocessors
(HBase server
side extensions)
SELECT * FROM users WHERE
id = โ€˜xxxabcโ€™
โ€‹โ€ฏPhoenix maps the HBase data model to the relational world
Phoenix Data Model
โ€‹โ€ฏPhoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
โ€‹โ€ฏPhoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
Primary key constraint
โ€‹โ€ฏPhoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
Primary key constraint Columns
โ€‹โ€ฏPhoenix maps the HBase data model to the relational world
Phoenix Data Model
Phoenix table
Primary key constraint Columns
Multiple
versions
โ€‹โ€ฏDynamic columns
โ€ขโ€ฏ Specify a subset of columns at CREATE time; the remainder can be optionally speci๏ฌed at query time
โ€ขโ€ฏ Surfaces HBaseโ€™s schema ๏ฌ‚exibility
Extras
CREATE TABLE โ€œtโ€ (
K VARCHAR PRIMARY KEY,
โ€œf1โ€.โ€col1โ€ VARCHAR);
SELECT * FROM โ€œtโ€ (โ€œf1โ€.โ€col2โ€ VARCHAR);
Dynamic column
โ€‹โ€ฏNative multitenancy
โ€ขโ€ฏ Multitenant isolation via a combination of multitenant tables and tenant-speci๏ฌc connections
โ€ขโ€ฏ Tenant-speci๏ฌc connections only access data that belongs to the tenant
โ€ขโ€ฏ Tenants can create their own schema addendums (views, columns, indexes)
Extras
CREATE TABLE event (
tenant_id VARCHAR,
type CHAR(1),
event_id BIGINT,
โ€ฆ
CONSTRAINT pk PRIMARY KEY (tenant_id, type, event_id))
MULTI_TENANT=true;
First PK column identi๏ฌes tenant ID
Tenant-speci๏ฌc connection
DriverManager.connect(โ€œjdbc:phoenix:localhost;tenantId=meโ€);
โ€‹โ€ฏThree index types
โ€ขโ€ฏ Mutable indexes
โ€ขโ€ฏ Global mutable indexes
โ€‹โ€ฏServer side intercepts primary table updates, builds and writes entries to secondary index tables
โ€‹โ€ฏFor read heavy, low write uses cases
โ€ขโ€ฏ Local mutable indexes
โ€‹โ€ฏIndex data and primary data are placed together (index in shadow column family)
โ€‹โ€ฏFor write heavy, space constrained use cases
โ€ขโ€ฏ Immutable indexes
โ€‹โ€ฏManaged entirely by the client, writes scale best
โ€‹โ€ฏFor use cases where rows are immutable after write
Secondary Indexes
โ€‹โ€ฏโ€ฆ
โ€ขโ€ฏ Global mutable indexes โ€“ covered indexes
โ€ขโ€ฏ Data in covered columns will be copied into the index
โ€ขโ€ฏ This allows a global index to be used more frequently, as a global index will only be used if all columns
referenced in the query are contained by it
Note: Rebuilds must currently be done using o๏ฌ„ine tooling (MapReduce)
Secondary Indexes
CREATE TABLE t (k VARCHAR PRIMARY KEY,
v1 VARCHAR, v2 INTEGER);
CREATE INDEX i ON t (v1) INCLUDE (v2);
Covered column
โ€‹โ€ฏStandard join syntax is supported, with some limitations
Joins
โ€ขโ€ฏ Only equality (=) is supported in joining
conditions
โ€ขโ€ฏ No restriction on other predicates in the ON
clause
โ€ขโ€ฏ Some queries may exceed server side
resources and require rewrites by hand
โ€ขโ€ฏ Work in progress
โ€‹โ€ฏClient side rewriting
โ€ขโ€ฏ Parallel scanning with ๏ฌnal client side merge sort
โ€ขโ€ฏ RPC batching
โ€ขโ€ฏ Use secondary indexes if available
โ€ขโ€ฏ Rewrites for multitenant tables
โ€‹โ€ฏStatistics
โ€ขโ€ฏ Use guideposts to increase intra-region
parallelism
โ€‹โ€ฏCurrent Work-in-Progress
โ€ขโ€ฏ Integration with Apache Calcite
โ€ขโ€ฏ ~120 rewrite rules
Query Optimization
โ€‹โ€ฏServer side push down
โ€ขโ€ฏ Filters
โ€ขโ€ฏ Skip scans
โ€ขโ€ฏ Partial aggregation
โ€ขโ€ฏ TopN
โ€ขโ€ฏ Hash joins
Query Optimization
Example query plan for a 32 region table
Query Optimization
With a secondary index on RESPONSE_TIME
HBase @ Salesforce.com
โ€‹โ€ฏDurability
โ€‹โ€ฏConsistency
โ€‹โ€ฏModeling constructs
โ€‹โ€ฏAtomicity
โ€‹โ€ฏConcurrency
โ€‹โ€ฏQueryability
โ€‹โ€ฏSchema mutability
โ€‹โ€ฏData size
โ€‹โ€ฏData portability and backup
โ€‹โ€ฏScalability
โ€‹โ€ฏReliability
โ€‹โ€ฏLatency
โ€‹โ€ฏRecoverability
โ€‹โ€ฏData Integrity
โ€‹โ€ฏRequired Downtime
โ€‹โ€ฏManageability
โ€‹โ€ฏCost
โ€‹โ€ฏSupport
โ€‹โ€ฏComprehensibility
โ€‹โ€ฏLicense
โ€‹โ€ฏCommunity
So Why HBase?
The View from 30Kft
โ€‹โ€ฏDurability
โ€‹โ€ฏConsistency
โ€‹โ€ฏModeling constructs
โ€‹โ€ฏAtomicity
โ€‹โ€ฏConcurrency
โ€‹โ€ฏQueryability
โ€‹โ€ฏSchema mutability
โ€‹โ€ฏData size
โ€‹โ€ฏData portability and backup
โ€‹โ€ฏScalability
โ€‹โ€ฏReliability
โ€‹โ€ฏLatency
โ€‹โ€ฏRecoverability
โ€‹โ€ฏData Integrity
โ€‹โ€ฏRequired Downtime
โ€‹โ€ฏManageability
โ€‹โ€ฏCost
โ€‹โ€ฏSupport
โ€‹โ€ฏComprehensibility
โ€‹โ€ฏLicense
โ€‹โ€ฏCommunity
So Why HBase?
The View from 30Kft
โ€‹โ€ฏScalability
โ€‹โ€ฏConsistency
โ€‹โ€ฏOperations
โ€‹โ€ฏLicense
โ€‹โ€ฏCommunity
โ€‹โ€ฏDurability
โ€‹โ€ฏConsistency
โ€‹โ€ฏModeling constructs
โ€‹โ€ฏAtomicity
โ€‹โ€ฏConcurrency
โ€‹โ€ฏQueryability
โ€‹โ€ฏSchema mutability
โ€‹โ€ฏData size
โ€‹โ€ฏData portability and backup
โ€‹โ€ฏScalability
โ€‹โ€ฏReliability
โ€‹โ€ฏLatency
โ€‹โ€ฏRecoverability
โ€‹โ€ฏData Integrity
โ€‹โ€ฏRequired Downtime
โ€‹โ€ฏManageability
โ€‹โ€ฏCost
โ€‹โ€ฏSupport
โ€‹โ€ฏComprehensibility
โ€‹โ€ฏLicense
โ€‹โ€ฏCommunity
So Why HBase?
The View from 30Kft
โ€‹โ€ฏScalability
โ€‹โ€ฏConsistency
โ€‹โ€ฏOperations
โ€‹โ€ฏLicense
โ€‹โ€ฏCommunity
โ€‹โ€ฏHBase
โ€‹โ€ฏInstances are identical collections of hardware and software that support a discrete subset of our
customers
โ€‹โ€ฏAn instance group is a collection of instances, in one location, con๏ฌgured as a single resource
pool and failure domain
โ€‹โ€ฏFor every instance or instance group, there is an identical mirror available in another datacenter
โ€‹โ€ฏHBase is currently operating both in instance and instance group con๏ฌgurations
โ€ขโ€ฏ We started with smaller con๏ฌgurations (~10s of nodes)
โ€ขโ€ฏ Recently we have begun migrating to larger con๏ฌgurations (~100s of nodes)
โ€ขโ€ฏ Total ๏ฌ‚eet size is a โ€œfew thousandโ€ servers
Instances and instance groups
The View from 30Kft
โ€‹โ€ฏApache Phoenix was motivated by the impedance mismatch between the HBase API and the
expectations of platform developers
Keeping Up Appearances
SELECT * FROM foo WHERE bar > 30 HTable t = new HTable(โ€œfooโ€);
RegionScanner s = t.getScanner(new Scan(โ€ฆ,
new ValueFilter(CompareOp.GT,
new CustomTypedComparator(30)), โ€ฆ));
while ((Result r = s.next()) != null) {
// blah blah blah Java Java Java
}
s.close();
t.close();
vs.
Engineering The Whole Stack Holistically
Engineering The Whole Stack Holistically
Kernel tunables, ๏ฌlesystem, krb, nscd
Engineering The Whole Stack Holistically
Kernel tunables, ๏ฌlesystem, krb, nscd
GC tuning, local patches
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, ๏ฌlesystem, krb, nscd
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, ๏ฌlesystem, krb, nscd
Many bug ๏ฌxes and enhancements,
tuning
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, ๏ฌlesystem, krb, nscd
Many bug ๏ฌxes and enhancements,
tuning
Relational engine built on HBase
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, ๏ฌlesystem, krb, nscd
Many bug ๏ฌxes and enhancements,
tuning
Relational engine built on HBase
Data access layer for internal users
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Engineering The Whole Stack Holistically
GC tuning, local patches
Kernel tunables, ๏ฌlesystem, krb, nscd
Multi-standby (HDFS-6440), fsync,
tuning, custom UGI
Many bug ๏ฌxes and enhancements,
tuning
Relational engine built on HBase
Data access layer for internal users
Circuit breaker, relogin thread
Thank you
Q&A
Ad

More Related Content

What's hot (20)

Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
Michael Zhang
ย 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache Spark
DataWorks Summit
ย 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
ย 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
MIJIN AN
ย 
How to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherHow to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with Panther
Panther Labs
ย 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
ย 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
ย 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
ย 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
MariaDB plc
ย 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
ย 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
ย 
ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...
Databricks
ย 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
ย 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
ย 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
ย 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
ย 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
ย 
MariaDB High Availability
MariaDB High AvailabilityMariaDB High Availability
MariaDB High Availability
MariaDB plc
ย 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
haroonm
ย 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
ย 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
Michael Zhang
ย 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache Spark
DataWorks Summit
ย 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
ย 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
MIJIN AN
ย 
How to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherHow to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with Panther
Panther Labs
ย 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
ย 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
ย 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
ย 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
MariaDB plc
ย 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
ย 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
ย 
ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lakeโ€”An Overview of Table Formats for Large Scal...
Databricks
ย 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc
ย 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
ย 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
ย 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
ย 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
ย 
MariaDB High Availability
MariaDB High AvailabilityMariaDB High Availability
MariaDB High Availability
MariaDB plc
ย 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
haroonm
ย 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
ย 

Viewers also liked (19)

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Trieu Nguyen
ย 
Salesforce External Objects for Big Data
Salesforce External Objects for Big DataSalesforce External Objects for Big Data
Salesforce External Objects for Big Data
Sumit Sarkar
ย 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
Salesforce Engineering
ย 
HBase Secondary Indexing
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing
Gino McCarty
ย 
How Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the CloudHow Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the Cloud
Salesforce Developers
ย 
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache PhoenixSalesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce Engineering
ย 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
Salesforce Engineering
ย 
Basics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.comBasics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.com
Deepu S Nath
ย 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon
ย 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
ย 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
ย 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
ย 
Generic Roadmap Slide
Generic Roadmap SlideGeneric Roadmap Slide
Generic Roadmap Slide
Salesforce Partners
ย 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon
ย 
Salesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting PartnerSalesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Partners
ย 
How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?
Suyati Technologies
ย 
Salesforce CRM
Salesforce CRMSalesforce CRM
Salesforce CRM
cagoncevatt
ย 
Salesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care ExperienceSalesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care Experience
Dreamforce
ย 
Salesforce.com Overview
Salesforce.com OverviewSalesforce.com Overview
Salesforce.com Overview
Edureka!
ย 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Trieu Nguyen
ย 
Salesforce External Objects for Big Data
Salesforce External Objects for Big DataSalesforce External Objects for Big Data
Salesforce External Objects for Big Data
Sumit Sarkar
ย 
HBase Secondary Indexing
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing
Gino McCarty
ย 
How Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the CloudHow Salesforce.com R&D Delivers the Cloud
How Salesforce.com R&D Delivers the Cloud
Salesforce Developers
ย 
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache PhoenixSalesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce's Trusted Enterprise Platform and Apache Phoenix
Salesforce Engineering
ย 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
Salesforce Engineering
ย 
Basics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.comBasics of cloud computing & salesforce.com
Basics of cloud computing & salesforce.com
Deepu S Nath
ย 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon
ย 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
ย 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
ย 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon
ย 
Salesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting PartnerSalesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Jumpstart: Getting Started as a Consulting Partner
Salesforce Partners
ย 
How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?How Salesforce CRM works & who should use it?
How Salesforce CRM works & who should use it?
Suyati Technologies
ย 
Salesforce CRM
Salesforce CRMSalesforce CRM
Salesforce CRM
cagoncevatt
ย 
Salesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care ExperienceSalesforce Health Cloud and Partners: Improving the Care Experience
Salesforce Health Cloud and Partners: Improving the Care Experience
Dreamforce
ย 
Salesforce.com Overview
Salesforce.com OverviewSalesforce.com Overview
Salesforce.com Overview
Edureka!
ย 
Ad

Similar to High Scale Relational Storage at Salesforce Built with Apache HBase and Apache Phoenix (20)

How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.com
Salesforce Engineering
ย 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Developers
ย 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We Do
Salesforce Developers
ย 
Enterprise API New Features and Roadmap
Enterprise API New Features and RoadmapEnterprise API New Features and Roadmap
Enterprise API New Features and Roadmap
Salesforce Developers
ย 
Heroku - developer playground
Heroku - developer playground Heroku - developer playground
Heroku - developer playground
Troy Sellers
ย 
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsOur API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Dreamforce
ย 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Salesforce Partners
ย 
Processing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App CloudProcessing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App Cloud
Salesforce Developers
ย 
Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Vineeth Mylapur
ย 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
ย 
Developing a Documentation Portal on Heroku
Developing a Documentation Portal on HerokuDeveloping a Documentation Portal on Heroku
Developing a Documentation Portal on Heroku
Salesforce Developers
ย 
Manage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with GovernanceManage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with Governance
Salesforce Admins
ย 
Open Source at Salesforce.com
Open Source at Salesforce.comOpen Source at Salesforce.com
Open Source at Salesforce.com
Salesforce Developers
ย 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
IBM
ย 
Integrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-insIntegrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-ins
Salesforce Developers
ย 
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
US-Analytics
ย 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
Salesforce Developers
ย 
Level Up โ€“ How to Achieve Hadoop Acceleration
Level Up โ€“ How to Achieve Hadoop AccelerationLevel Up โ€“ How to Achieve Hadoop Acceleration
Level Up โ€“ How to Achieve Hadoop Acceleration
Inside Analysis
ย 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
Rohit Jain
ย 
Cloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuCloud Integration with Database.com and Heroku
Cloud Integration with Database.com and Heroku
Salesforce Developers
ย 
How Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.comHow Open Source Embiggens Salesforce.com
How Open Source Embiggens Salesforce.com
Salesforce Engineering
ย 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Developers
ย 
Understanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We DoUnderstanding the Salesforce Architecture: How We Do the Magic We Do
Understanding the Salesforce Architecture: How We Do the Magic We Do
Salesforce Developers
ย 
Enterprise API New Features and Roadmap
Enterprise API New Features and RoadmapEnterprise API New Features and Roadmap
Enterprise API New Features and Roadmap
Salesforce Developers
ย 
Heroku - developer playground
Heroku - developer playground Heroku - developer playground
Heroku - developer playground
Troy Sellers
ย 
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsOur API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Dreamforce
ย 
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Salesforce Partners
ย 
Processing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App CloudProcessing Big Data At-Scale in the App Cloud
Processing Big Data At-Scale in the App Cloud
Salesforce Developers
ย 
Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Donโ€™t Struggle with Complex and Rigid Data Migrations, Leverage API Wizard to...
Vineeth Mylapur
ย 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
ย 
Developing a Documentation Portal on Heroku
Developing a Documentation Portal on HerokuDeveloping a Documentation Portal on Heroku
Developing a Documentation Portal on Heroku
Salesforce Developers
ย 
Manage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with GovernanceManage Salesforce Like a Pro with Governance
Manage Salesforce Like a Pro with Governance
Salesforce Admins
ย 
Open Source at Salesforce.com
Open Source at Salesforce.comOpen Source at Salesforce.com
Open Source at Salesforce.com
Salesforce Developers
ย 
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
IBM
ย 
Integrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-insIntegrating Salesforce with Microsoft Office through Add-ins
Integrating Salesforce with Microsoft Office through Add-ins
Salesforce Developers
ย 
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
DRM Webinar Series, PART 3: Will DRM Integrate With Our Applications?
US-Analytics
ย 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
Salesforce Developers
ย 
Level Up โ€“ How to Achieve Hadoop Acceleration
Level Up โ€“ How to Achieve Hadoop AccelerationLevel Up โ€“ How to Achieve Hadoop Acceleration
Level Up โ€“ How to Achieve Hadoop Acceleration
Inside Analysis
ย 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
Rohit Jain
ย 
Cloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuCloud Integration with Database.com and Heroku
Cloud Integration with Database.com and Heroku
Salesforce Developers
ย 
Ad

More from Salesforce Engineering (20)

Locker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With WebpackLocker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With Webpack
Salesforce Engineering
ย 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
Salesforce Engineering
ย 
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the CloudTechniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Salesforce Engineering
ย 
Predictive System Performance Data Analysis
Predictive System Performance Data AnalysisPredictive System Performance Data Analysis
Predictive System Performance Data Analysis
Salesforce Engineering
ย 
Apache HBase State of the Project
Apache HBase State of the ProjectApache HBase State of the Project
Apache HBase State of the Project
Salesforce Engineering
ย 
Hit the Trail with Trailhead
Hit the Trail with TrailheadHit the Trail with Trailhead
Hit the Trail with Trailhead
Salesforce Engineering
ย 
HBase/PHOENIX @ Scale
HBase/PHOENIX @ ScaleHBase/PHOENIX @ Scale
HBase/PHOENIX @ Scale
Salesforce Engineering
ย 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
Salesforce Engineering
ย 
Containers and Security for DevOps
Containers and Security for DevOpsContainers and Security for DevOps
Containers and Security for DevOps
Salesforce Engineering
ย 
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already HaveAspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Salesforce Engineering
ย 
Monitoring @ Scale in Salesforce
Monitoring @ Scale in SalesforceMonitoring @ Scale in Salesforce
Monitoring @ Scale in Salesforce
Salesforce Engineering
ย 
Performance Tuning with XHProf
Performance Tuning with XHProfPerformance Tuning with XHProf
Performance Tuning with XHProf
Salesforce Engineering
ย 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
Salesforce Engineering
ย 
Implementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 MilesImplementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 Miles
Salesforce Engineering
ย 
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief OverviewSalesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Engineering
ย 
Koober Preduction IO Presentation
Koober Preduction IO PresentationKoober Preduction IO Presentation
Koober Preduction IO Presentation
Salesforce Engineering
ย 
Finding Security Issues Fast!
Finding Security Issues Fast!Finding Security Issues Fast!
Finding Security Issues Fast!
Salesforce Engineering
ย 
Microservices
MicroservicesMicroservices
Microservices
Salesforce Engineering
ย 
Global State Management of Micro Services
Global State Management of Micro ServicesGlobal State Management of Micro Services
Global State Management of Micro Services
Salesforce Engineering
ย 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
Salesforce Engineering
ย 
Locker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With WebpackLocker Service Ready Lightning Components With Webpack
Locker Service Ready Lightning Components With Webpack
Salesforce Engineering
ย 
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the CloudTechniques to Effectively Monitor the Performance of Customers in the Cloud
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Salesforce Engineering
ย 
Predictive System Performance Data Analysis
Predictive System Performance Data AnalysisPredictive System Performance Data Analysis
Predictive System Performance Data Analysis
Salesforce Engineering
ย 
Apache HBase State of the Project
Apache HBase State of the ProjectApache HBase State of the Project
Apache HBase State of the Project
Salesforce Engineering
ย 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
Salesforce Engineering
ย 
Containers and Security for DevOps
Containers and Security for DevOpsContainers and Security for DevOps
Containers and Security for DevOps
Salesforce Engineering
ย 
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already HaveAspect Oriented Programming: Hidden Toolkit That You Already Have
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Salesforce Engineering
ย 
Monitoring @ Scale in Salesforce
Monitoring @ Scale in SalesforceMonitoring @ Scale in Salesforce
Monitoring @ Scale in Salesforce
Salesforce Engineering
ย 
Performance Tuning with XHProf
Performance Tuning with XHProfPerformance Tuning with XHProf
Performance Tuning with XHProf
Salesforce Engineering
ย 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
Salesforce Engineering
ย 
Implementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 MilesImplementing a Content Strategy Is Like Running 100 Miles
Implementing a Content Strategy Is Like Running 100 Miles
Salesforce Engineering
ย 
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief OverviewSalesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Engineering
ย 
Koober Preduction IO Presentation
Koober Preduction IO PresentationKoober Preduction IO Presentation
Koober Preduction IO Presentation
Salesforce Engineering
ย 
Finding Security Issues Fast!
Finding Security Issues Fast!Finding Security Issues Fast!
Finding Security Issues Fast!
Salesforce Engineering
ย 
Global State Management of Micro Services
Global State Management of Micro ServicesGlobal State Management of Micro Services
Global State Management of Micro Services
Salesforce Engineering
ย 

Recently uploaded (20)

Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
ย 
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
ย 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
ย 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
ย 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
ย 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
ย 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
ย 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
ย 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
ย 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
ย 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
ย 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
ย 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
ย 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
ย 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
ย 
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything โ€“ Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
ย 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
ย 
Rock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning JourneyRock, Paper, Scissors: An Apex Map Learning Journey
Rock, Paper, Scissors: An Apex Map Learning Journey
Lynda Kane
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
ย 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
ย 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
ย 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
ย 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
ย 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
ย 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
ย 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
ย 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
ย 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
ย 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
ย 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
ย 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
ย 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
ย 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
ย 

High Scale Relational Storage at Salesforce Built with Apache HBase and Apache Phoenix

  • 1. High Scale Relational Storage at Salesforce built with Apache HBase and Apache Phoenix โ€‹โ€ฏAndrew Purtell โ€‹โ€ฏArchitect, Cloud Storage [email protected] โ€‹โ€ฏ@akpurtell โ€‹โ€ฏ v3
  • 2. โ€‹โ€ฏSafe harbor statement under the Private Securities Litigation Reform Act of 1995: โ€‹โ€ฏThis presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could di๏ฌ€er materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward- looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other ๏ฌnancial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. โ€‹โ€ฏThe risks and uncertainties referred to above include โ€“ but are not limited to โ€“ risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible ๏ฌ‚uctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could a๏ฌ€ect the ๏ฌnancial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent ๏ฌscal year and in our quarterly report on Form 10-Q for the most recent ๏ฌscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. โ€‹โ€ฏAny unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements. Safe Harbor
  • 3. โ€‹โ€ฏArchitect, Cloud Storage at Salesforce.com โ€ขโ€ฏ Data Platform Team โ€‹โ€ฏ Open Source Contributor, since 2007 โ€ขโ€ฏ Committer, PMC, and Project Chair, Apache HBase โ€ขโ€ฏ Committer and PMC, Apache Phoenix โ€ขโ€ฏ Committer, PMC, and Project Chair, Apache Bigtop โ€ขโ€ฏ Member, Apache Software Foundation Distributed Systems Nerd, since 1997 whoami
  • 4. โ€‹โ€ฏMotivation โ€‹โ€ฏOpen Source and the Data Platform Team โ€‹โ€ฏWhat are Apache HBase and Apache Phoenix? โ€‹โ€ฏHBase@Salesforce โ€ขโ€ฏ The View from 30Kft โ€ขโ€ฏ Keeping Up Appearances โ€ขโ€ฏ Engineering The Whole Stack Holistically โ€‹โ€ฏQ&A Agenda
  • 6. The Data Management Challenge Scale data requires a single platform for analysis (โ€œdata gravityโ€) Salesforce already manages over 100B records of customer data today More than 3 billion transactions per day on the platform Exponential growth rate โ€‹โ€ฏBatch and stream compute data locality requirements โ€‹โ€ฏService continuity requirements
  • 7. โ€‹โ€ฏSystems of record provide reliable, highly available, and secure data storage โ€ขโ€ฏ SObjects: Traditional Salesforce platform objects โ€ขโ€ฏ Salesforce Files: Blob storage โ€ขโ€ฏ BigObjects: Scale out storage for immutable data BigObjects Systems of Record โ€ขโ€ฏ Transactional data. Rows can be added and updated โ€ขโ€ฏ Example: Accounts, Contacts, Custom Objects SObjects BigObjects Salesforce Files โ€ขโ€ฏ New Object type for immutable data. โ€ขโ€ฏ Optimized for large volumes of data โ€ขโ€ฏ Example: event data, purchase history, product usage data โ€ขโ€ฏ Blob storage for semi- or un-structured data โ€ขโ€ฏ Example: CSV extracts from external systems, weblogs, monitoring logs Platform Connect External Object โ€ขโ€ฏ New proxy object connected to an external oData source
  • 8. Data Pipelines Snapshot data from SObjects / External Objects Manipulate and analyze data sets โ€ขโ€ฏ Transformations โ€ขโ€ฏ Joins โ€ขโ€ฏ Calculations and enrichment Data Snapshot External Objects Salesforce Files BigObjects Data Pipeline
  • 9. Data Pipelines Snapshot data from SObjects / External Objects Manipulate and analyze data sets โ€ขโ€ฏ Transformations โ€ขโ€ฏ Joins โ€ขโ€ฏ Calculations and enrichment Apache Pig + Hadoop for processing framework and scripting language Apache HBase for BigObjects persistence Apache Phoenix for indexing and relational access Data Snapshot External Objects Salesforce Files BigObjects Data Pipeline
  • 10. Data Management Services Customer 360 Enrich customer pro๏ฌle w/ data from external systems Audit & TrackingData Retention Data Quality Connected Products Track what users are doing on platform Compliance Keep data for audit purposes Maintain scalability of operational systems Data enrichment Data integration Data cleansing Event stream archive O๏ฌ„ine analysis and data mining
  • 11. Salesforce Shield Persist the complete ๏ฌeld change history for up to ten years with Field Audit Trail Persist user event capture for Event Monitoring and Audit Support scale out batch and stream compute for Backup, DR, Threat Detection, etc. Infrastructure Services Network Services Application Services Secure Data Centers Backup and Disaster Recovery HTTPS Encryption Penetration Testing Advanced Threat Detection Identity & Single Sign On Two Factor Authentication User Roles & Permissions Field & Row Level Security Secure Firewalls Real-time replication Password Policies Third Party Certi๏ฌcations IP Login Restrictions Customer Audits Salesforce Shield Platform Encryption Event Monitoring Field Audit Trail
  • 12. Open Source and the Data Platform Team
  • 13. The things we build are better โ€ขโ€ฏ Building software with Open Source in mind is inherently bene๏ฌcial (encourages loose coupling, cohesiveness, quality) โ€ขโ€ฏ Positive external pressure to make the right software engineering decisions โ€‹โ€ฏThe things we don't build are better โ€ขโ€ฏ The only thing better than writing a great component is not writing it โ€ขโ€ฏ Smart engineers avoid โ€œNot-Invented-Hereโ€ syndrome โ€‹โ€ฏWe are happier โ€ขโ€ฏ Open Source extends pride in ownership and visibility beyond the company walls โ€ขโ€ฏ We attract better coders who gravitate towards companies that do open source work โ€‹โ€ฏWe make the world a better place How Open Source Helps Salesforce
  • 14. โ€‹โ€ฏNo forking โ€ขโ€ฏ Fork de๏ฌned as: a departure from the open source repository signi๏ฌcant enough to prevent us from contributing patches back or applying patches from the upstream open source repository โ€‹โ€ฏInternal repositories as change bu๏ฌ€er with local release numbering โ€ขโ€ฏ Local repos for HBase, Phoenix, all dependencies (Hadoop, ZooKeeper, etc.) โ€ขโ€ฏ Fast forwarded to new upstream releases after consideration โ€ขโ€ฏ Updates to local repos are all manual by design โ€ขโ€ฏ Hadoop is a special snow๏ฌ‚ake โ€‹โ€ฏChange the upstream repositories ๏ฌrst โ€ขโ€ฏ Almost all changes begin as patches developed using upstream repositories โ€ขโ€ฏ Only critical bug ๏ฌxes are an exception, or where we require a change locally meantime while working with a slow moving upstream community How We Contribute To And Use Open Source
  • 15. โ€‹โ€ฏDistributed systems engineers โ€‹โ€ฏStorage systems architects โ€‹โ€ฏOpen source contributors โ€ขโ€ฏ Project Chairs: Apache HBase, Phoenix, Bigtop โ€ขโ€ฏ Committers and PMC: Apache HBase, Phoenix, Pig, Bigtop, Incubator โ€ขโ€ฏ Mentors: Apache Phoenix, NiFi, Trafodion โ€ขโ€ฏ Hundreds of commits per year Who We Are
  • 16. Apache HBase and Apache Phoenix A new scale out relational storage option
  • 17. โ€‹โ€ฏA high performance horizontally scalable datastore engine for Big Data suitable as the store of record for mission critical data Apache HBase
  • 18. โ€‹โ€ฏAn emerging platform for scale out relational datastores Apache HBase Apache Kylin
  • 19. Apache HBase โ€‹โ€ฏA founding member of the Hadoop pantheon โ€ขโ€ฏ Introduced as a Hadoop โ€˜contribโ€™ module in 2007
  • 20. โ€‹โ€ฏTablespaces โ€‹โ€ฏNot like a spreadsheet, a โ€œsparse, consistent, distributed, multi-dimensional, sorted mapโ€ HBase Data Model
  • 21. HBase Scalability RegionServers Table A Table B Splits Assignments Regions
  • 22. How is HBase Di๏ฌ€erent from a RDBMS? RDBMS HBase Data layout Row oriented Column oriented Transactions Multi-row ACID Single row or adjacent row groups only Query language SQL None (API access) Joins Yes No Indexes On arbitrary columns Single row index only Max data size Terabytes Petabytes* R/W throughput limits 1000s of operations per second Millions of operations per second* * - No architectural upper bound on data size or aggregate throughput
  • 23. โ€‹โ€ฏ1969: CODASYL (network database) โ€‹โ€ฏ1979: First commercial SQL RDBMs โ€‹โ€ฏ1990: Transaction processing on SQL now popular โ€‹โ€ฏ1993: Multidimensional databases โ€‹โ€ฏ1996: Enterprise Data Warehouses โ€‹โ€ฏ2006: Hadoop and other โ€œbig dataโ€ technologies โ€‹โ€ฏ2008: NoSQL โ€‹โ€ฏ2011: SQL on Hadoop โ€‹โ€ฏ2014: Interactive analytics on Hadoop and NoSQL with SQL โ€‹โ€ฏWhy? SQL: In and Out of Fashion From โ€œSQL On Everything, In Memoryโ€ by Julian Hyde, Strata NYC 2014
  • 24. โ€‹โ€ฏImplementing structured queries well is hard โ€ขโ€ฏ Systems cannot just โ€œrun the queryโ€ as written โ€ขโ€ฏ Relational systems require the algebraic operators, a query planner, an optimizer, metadata, statistics, etc. โ€‹โ€ฏHowever, the result is very useful to non-technical users โ€ขโ€ฏ Dumb queries (e.g. tool generated) can still get high performance โ€ขโ€ฏ Adding new algorithms or reorganizations of physical data layouts or migrations from one data store to another are transparent โ€‹โ€ฏWhat about blending the scale and performance of non-relational scale out stores with the ease of use of SQL? SQL: In and Out of Fashion
  • 25. โ€‹โ€ฏA relational system built on HBase โ€ขโ€ฏ Reintroduces the familiar declarative SQL interface to data โ€ขโ€ฏ Reintroduces typed data and query optimizations possible with it โ€ขโ€ฏ Secondary indexes, joins, query optimization, statistics, โ€ฆ โ€ขโ€ฏ Integrates with just about everything as a JDBC data source โ€‹โ€ฏWith some new advantages โ€ขโ€ฏ Dynamic columns extend schema at runtime โ€ขโ€ฏ Schema is versioned โ€“ for free by HBase โ€“ allowing ๏ฌ‚ashback queries using prior versions of metadata โ€‹โ€ฏAnd new challenges โ€ขโ€ฏ Distributed transactions are hard; a work in progress building on Tephra* โ€œPutting the SQL back into NoSQLโ€ Apache Phoenix * - Tephra open source transaction engine: http://blog.cask.co/2014/07/meet-tephra-an-open-source-transaction-engine-2/
  • 26. Apache Phoenix Supported? CREATE / DROP / ALTER TABLE Yes UPSERT / DELETE Yes SELECT Yes WHERE / HAVING Yes GROUP BY / ORDER BY Yes LIMIT Yes JOIN Yes, with limitations Views Yes Secondary indexes Yes Statistics and query optimization Yes Transactions No, work in progress
  • 27. Phoenix Integration With HBase RegionServers Regions JDBC driver Coprocessors (HBase server side extensions) SELECT * FROM users WHERE id = โ€˜xxxabcโ€™
  • 28. โ€‹โ€ฏPhoenix maps the HBase data model to the relational world Phoenix Data Model
  • 29. โ€‹โ€ฏPhoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table
  • 30. โ€‹โ€ฏPhoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table Primary key constraint
  • 31. โ€‹โ€ฏPhoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table Primary key constraint Columns
  • 32. โ€‹โ€ฏPhoenix maps the HBase data model to the relational world Phoenix Data Model Phoenix table Primary key constraint Columns Multiple versions
  • 33. โ€‹โ€ฏDynamic columns โ€ขโ€ฏ Specify a subset of columns at CREATE time; the remainder can be optionally speci๏ฌed at query time โ€ขโ€ฏ Surfaces HBaseโ€™s schema ๏ฌ‚exibility Extras CREATE TABLE โ€œtโ€ ( K VARCHAR PRIMARY KEY, โ€œf1โ€.โ€col1โ€ VARCHAR); SELECT * FROM โ€œtโ€ (โ€œf1โ€.โ€col2โ€ VARCHAR); Dynamic column
  • 34. โ€‹โ€ฏNative multitenancy โ€ขโ€ฏ Multitenant isolation via a combination of multitenant tables and tenant-speci๏ฌc connections โ€ขโ€ฏ Tenant-speci๏ฌc connections only access data that belongs to the tenant โ€ขโ€ฏ Tenants can create their own schema addendums (views, columns, indexes) Extras CREATE TABLE event ( tenant_id VARCHAR, type CHAR(1), event_id BIGINT, โ€ฆ CONSTRAINT pk PRIMARY KEY (tenant_id, type, event_id)) MULTI_TENANT=true; First PK column identi๏ฌes tenant ID Tenant-speci๏ฌc connection DriverManager.connect(โ€œjdbc:phoenix:localhost;tenantId=meโ€);
  • 35. โ€‹โ€ฏThree index types โ€ขโ€ฏ Mutable indexes โ€ขโ€ฏ Global mutable indexes โ€‹โ€ฏServer side intercepts primary table updates, builds and writes entries to secondary index tables โ€‹โ€ฏFor read heavy, low write uses cases โ€ขโ€ฏ Local mutable indexes โ€‹โ€ฏIndex data and primary data are placed together (index in shadow column family) โ€‹โ€ฏFor write heavy, space constrained use cases โ€ขโ€ฏ Immutable indexes โ€‹โ€ฏManaged entirely by the client, writes scale best โ€‹โ€ฏFor use cases where rows are immutable after write Secondary Indexes
  • 36. โ€‹โ€ฏโ€ฆ โ€ขโ€ฏ Global mutable indexes โ€“ covered indexes โ€ขโ€ฏ Data in covered columns will be copied into the index โ€ขโ€ฏ This allows a global index to be used more frequently, as a global index will only be used if all columns referenced in the query are contained by it Note: Rebuilds must currently be done using o๏ฌ„ine tooling (MapReduce) Secondary Indexes CREATE TABLE t (k VARCHAR PRIMARY KEY, v1 VARCHAR, v2 INTEGER); CREATE INDEX i ON t (v1) INCLUDE (v2); Covered column
  • 37. โ€‹โ€ฏStandard join syntax is supported, with some limitations Joins โ€ขโ€ฏ Only equality (=) is supported in joining conditions โ€ขโ€ฏ No restriction on other predicates in the ON clause โ€ขโ€ฏ Some queries may exceed server side resources and require rewrites by hand โ€ขโ€ฏ Work in progress
  • 38. โ€‹โ€ฏClient side rewriting โ€ขโ€ฏ Parallel scanning with ๏ฌnal client side merge sort โ€ขโ€ฏ RPC batching โ€ขโ€ฏ Use secondary indexes if available โ€ขโ€ฏ Rewrites for multitenant tables โ€‹โ€ฏStatistics โ€ขโ€ฏ Use guideposts to increase intra-region parallelism โ€‹โ€ฏCurrent Work-in-Progress โ€ขโ€ฏ Integration with Apache Calcite โ€ขโ€ฏ ~120 rewrite rules Query Optimization โ€‹โ€ฏServer side push down โ€ขโ€ฏ Filters โ€ขโ€ฏ Skip scans โ€ขโ€ฏ Partial aggregation โ€ขโ€ฏ TopN โ€ขโ€ฏ Hash joins
  • 39. Query Optimization Example query plan for a 32 region table
  • 40. Query Optimization With a secondary index on RESPONSE_TIME
  • 42. โ€‹โ€ฏDurability โ€‹โ€ฏConsistency โ€‹โ€ฏModeling constructs โ€‹โ€ฏAtomicity โ€‹โ€ฏConcurrency โ€‹โ€ฏQueryability โ€‹โ€ฏSchema mutability โ€‹โ€ฏData size โ€‹โ€ฏData portability and backup โ€‹โ€ฏScalability โ€‹โ€ฏReliability โ€‹โ€ฏLatency โ€‹โ€ฏRecoverability โ€‹โ€ฏData Integrity โ€‹โ€ฏRequired Downtime โ€‹โ€ฏManageability โ€‹โ€ฏCost โ€‹โ€ฏSupport โ€‹โ€ฏComprehensibility โ€‹โ€ฏLicense โ€‹โ€ฏCommunity So Why HBase? The View from 30Kft
  • 43. โ€‹โ€ฏDurability โ€‹โ€ฏConsistency โ€‹โ€ฏModeling constructs โ€‹โ€ฏAtomicity โ€‹โ€ฏConcurrency โ€‹โ€ฏQueryability โ€‹โ€ฏSchema mutability โ€‹โ€ฏData size โ€‹โ€ฏData portability and backup โ€‹โ€ฏScalability โ€‹โ€ฏReliability โ€‹โ€ฏLatency โ€‹โ€ฏRecoverability โ€‹โ€ฏData Integrity โ€‹โ€ฏRequired Downtime โ€‹โ€ฏManageability โ€‹โ€ฏCost โ€‹โ€ฏSupport โ€‹โ€ฏComprehensibility โ€‹โ€ฏLicense โ€‹โ€ฏCommunity So Why HBase? The View from 30Kft โ€‹โ€ฏScalability โ€‹โ€ฏConsistency โ€‹โ€ฏOperations โ€‹โ€ฏLicense โ€‹โ€ฏCommunity
  • 44. โ€‹โ€ฏDurability โ€‹โ€ฏConsistency โ€‹โ€ฏModeling constructs โ€‹โ€ฏAtomicity โ€‹โ€ฏConcurrency โ€‹โ€ฏQueryability โ€‹โ€ฏSchema mutability โ€‹โ€ฏData size โ€‹โ€ฏData portability and backup โ€‹โ€ฏScalability โ€‹โ€ฏReliability โ€‹โ€ฏLatency โ€‹โ€ฏRecoverability โ€‹โ€ฏData Integrity โ€‹โ€ฏRequired Downtime โ€‹โ€ฏManageability โ€‹โ€ฏCost โ€‹โ€ฏSupport โ€‹โ€ฏComprehensibility โ€‹โ€ฏLicense โ€‹โ€ฏCommunity So Why HBase? The View from 30Kft โ€‹โ€ฏScalability โ€‹โ€ฏConsistency โ€‹โ€ฏOperations โ€‹โ€ฏLicense โ€‹โ€ฏCommunity โ€‹โ€ฏHBase
  • 45. โ€‹โ€ฏInstances are identical collections of hardware and software that support a discrete subset of our customers โ€‹โ€ฏAn instance group is a collection of instances, in one location, con๏ฌgured as a single resource pool and failure domain โ€‹โ€ฏFor every instance or instance group, there is an identical mirror available in another datacenter โ€‹โ€ฏHBase is currently operating both in instance and instance group con๏ฌgurations โ€ขโ€ฏ We started with smaller con๏ฌgurations (~10s of nodes) โ€ขโ€ฏ Recently we have begun migrating to larger con๏ฌgurations (~100s of nodes) โ€ขโ€ฏ Total ๏ฌ‚eet size is a โ€œfew thousandโ€ servers Instances and instance groups The View from 30Kft
  • 46. โ€‹โ€ฏApache Phoenix was motivated by the impedance mismatch between the HBase API and the expectations of platform developers Keeping Up Appearances SELECT * FROM foo WHERE bar > 30 HTable t = new HTable(โ€œfooโ€); RegionScanner s = t.getScanner(new Scan(โ€ฆ, new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), โ€ฆ)); while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close(); vs.
  • 47. Engineering The Whole Stack Holistically
  • 48. Engineering The Whole Stack Holistically Kernel tunables, ๏ฌlesystem, krb, nscd
  • 49. Engineering The Whole Stack Holistically Kernel tunables, ๏ฌlesystem, krb, nscd GC tuning, local patches
  • 50. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, ๏ฌlesystem, krb, nscd Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 51. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, ๏ฌlesystem, krb, nscd Many bug ๏ฌxes and enhancements, tuning Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 52. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, ๏ฌlesystem, krb, nscd Many bug ๏ฌxes and enhancements, tuning Relational engine built on HBase Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 53. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, ๏ฌlesystem, krb, nscd Many bug ๏ฌxes and enhancements, tuning Relational engine built on HBase Data access layer for internal users Multi-standby (HDFS-6440), fsync, tuning, custom UGI
  • 54. Engineering The Whole Stack Holistically GC tuning, local patches Kernel tunables, ๏ฌlesystem, krb, nscd Multi-standby (HDFS-6440), fsync, tuning, custom UGI Many bug ๏ฌxes and enhancements, tuning Relational engine built on HBase Data access layer for internal users Circuit breaker, relogin thread