Hbase at Salesforce.com

HBase @ Salesforce
Lars Hofhansl
Architect, Father, Meditator,Aikido Blackbelt
http://hadoop-hbase.blogspot.com

Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our
service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions
based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these
forward-looking statements.
Safe Harbor

Why HBase?
• SAN
• RDBMS
• Transactions

Zookeeper?
Commodity
Hardware?
HBase?
HDFS?Unstructured
Data?

A. Why HBase?
B. Interacting with the open source community
C. HBase at Salesforce

Size Matters*
New Salesforce customer:
•“How many rows do you have?”
•We will turn folks away if they have too many!
Data Storage is expensive:
•SAN storage
•Relational Database
•Too many rows  Too expensive
* In a relational world

What if in the future we:
… and have cheaper storage?
… and never need to ask again
about the number of rows?
… grow with the data by just
adding more machines?
(Disclaimer: no transactions, no joins, no 2nd’ary indexes, …)

(A quick note about) Relational Databases
• We love them. They are core to our infrastructure.
• SQL and NoSQL NoACID are complementary.
• (Almost) everything we do is SQL based (see Phoenix – the SQL layer for HBase.)

The Search - Requirements
• Consistent
– “Eventually consistent stores are 100% consistent 99% of the time” – Ian Varley
• Scalable
– No “features” impeding horizontal scaling
• Persistent
– Duh...?
• Key lookups
• Range lookups
• Open source (ASL great, GPLv2 OK, GPLv3/AGPL not acceptable)

Enter HBase
“A Sparse, Consistent, Distributed,
Multidimensional, Persistent, Sorted Map”

Salesforce and the HBase Community

To Fork or not to Fork – that is the question
Fork - pros
• Agility. No waiting for community review. Just get stuff done
• Freedom. Patches that might not be acceptable to the community
Fork - cons
• Lose out on community work
• Patches not useful to other parties
There is no right or wrong. It’s a matter of choice, taste, and requirements.

HBase Development @ Salesforce
• No fork of HBase.
• No fork of HBase.
• Internal HBase/HDFS branch for possible emergency fixes
• All fixes are cleaned and contributed back
• We switch to the next open source point release periodically

PMC member, 2 committers, release manager, contributors
HBASE-11042 HBASE-11040 HBASE-11037 HBASE-11030 HBASE-11029 HBASE-11024 HBASE-11022 HBASE-
11010 HBASE-10996 HBASE-10989 HBASE-10988 HBASE-10987 HBASE-10982 HBASE-10969 HBASE-10847
HBASE-10805 HBASE-10722 HBASE-10706 HBASE-10642 HBASE-10594 HBASE-10562 HBASE-10551
HBASE-10058 HBASE-10057 HBASE-10015 HBASE-9993 HBASE-9971 HBASE-9956 HBASE-9915 HBASE-
9865 HBASE-9834 HBASE-9807 HBASE-9799 HBASE-9789 HBASE-9778 HBASE-9751 HBASE-9749 HBASE-
7047 HBASE-7021 HBASE-7010 HBASE-6996 HBASE-6974

PMC member, 2 committers, release manager, contributors
HBASE-6949 HBASE-6946 HBASE-6912 HBASE-6889 HBASE-6879 HBASE-6868 HBASE-6865 HBASE-6863
HBASE-3433 HBASE-3387 HBASE-2947 HBASE-2196 HBASE-2195 HDFS-3979 HDFS-744

Established monthly release train for 0.94

Contributed >300 of features, bug fixes, perf improvements

Reviewed 1000’s of open source patches

Open Sourced Apache Phoenix – SQL skin on HBase

Salesforce High-level Architecture

Salesforce is a Database
Query Parser
Query (SQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Evaluation Plan
Query Plan Evaluator
System
Catalog
Database
Stats
Tables
Columns
Indexes

Salesforce is a Database
Query Parser
Query (SOQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
System
Catalog
Oracle
Hinted Oracle SQL
Database
Stats
Objects
Fields
Indexes

…pod
Tenant A-D
pod
Tenant E-H
pod
Tenant I-O

pod = a database instance
•Oracle RAC
•AppServers
•Blob store servers
•Search servers
•Shared SAN storage
•SAN replication for DR
App
Server
App
Server
App
Server
App
Server
…
Oracle
Node
Oracle
Node
Oracle
Node
Oracle
Node…
Oracle RAC cluster
Primary Site
Secondary Site
SAN replication
SAN
SAN
SQL/JDBC

Oracle
Hinted Oracle SQL
Query Parser
Query (SOQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
System
Catalog
Database
Stats
Objects
Fields
Indexes
1. External Objects 2. Phoenix SQL
HBaseHBaseHBaseHBase
Where does HBase Fit?

Where does HBase Fit?
•Separate HBase per pod (close to 50 clusters)
•Logically co-located with Oracle
•Small clusters striped across five racks
•Each cluster’s master service on a different rack
•Identical cluster for DR
App
Server
App
Server
App
Server
App
Server
…
Oracle
Node
Oracle
Node
HBase
Node
HBase
Node…
Oracle Cluster
HBase
Node
HBase
Node
HBase
Node …
Primary Site
Secondary Site
DR HBase Cluster
Decentralized
HBase
Replication
SQL/JDBC
via Phoenix
HBase Cluster
…
SAN
SAN

1. Audit Trails (Entity History)
• Identity managed in RDBMS
• Indexed in HBase (Phoenix indexes)
• Historical, immutable data only
• No need to reason about updates, split identities, and transactions

2. Archiving (Data Lifecycle Management)
• Objects (rows) moved to HBase
• Identity managed in HBase after move
• Data immutable in HBase
• No Transactions

3. Live data in HBase (BigObjects)
• Mutable data (possibly)
• Everything managed in HBase
• Still no Transactions, yet
• Platform for other team to use

Merrill Lynch Rationalization Data Governance, Audit & Archive
• First Salesforce Enterprise Customer
• On PlatformArchival compelling versus On Premise
Solution from Informatica
• Retention Requirements for 7 Years
Merrill Lynch
“Data Audit, Governance & Lifecycle management is
critical for Merrill for the entire banking & financial
industry has become a benchmark requirement

Heating, ventilation, and air-conditioning in the EU
• Top 10 Platform Users
• Subject to highly variable data governance and
retention requirements
• Significant SAP footprint driving business rules –
need to connect that to Salesforce data for archival
and data retention needs
• Massive service workforce generates significant data
processing challenges
“The Salesforce.com Platform roadmap for Data Archive is
critical for future data management needs”
MichaelRoehr, CTO Vailliant

BMW Enriches Their Customer Perspective
• Sales Cloud available across all German Dealership
Franchises
• All customer data subject stringent & government
mandated protection, audit & retention
• Correlations with Car Builder App data enables more
contextual customer interactions
• Car Telemetry, used correctly help refine product
evolution and customer needs alignment
“Data driven customer engagement is a
key driver for our enhance customer
experience

System Of Record (SOR)
SOR = HA + DR + Backup + M&M
+ Security

Highly Available, Disaster Recovery
• Five peer Zookeeper Quorum
• Five Quorum Journals (for fs edits)
• Five HMasters
• Three NameNodes (yes, three, we made a patch to run more than one standby)
• HBase Replication to identical hot standby pod in a different data center
– In the event of a disaster we fail a complete pod to the secondary site
• Weekly automated, unattended rolling restarts

Replication
Backup High-level Architecture
Primary pod
HBase 48h
HDFS
Backup
per tenant
DR pod
HBase 48h
HDFS
Merkle Tree
Verification
Backup
per tenant

Monitoring & Management (M&M)
• Nagios alerts
• Trending via OpenTSDB.
Custom UI on top the time series data.
• Rolling upgrades
– Eventually scheduled and unattended
• Absolutely no unscheduled downtime.
Not even during a rack failure.

Lars Hofhansl
http://hadoop-hbase.blogspot.com

Hbase at Salesforce.com

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Hbase at Salesforce.com (20)

More from Salesforce Engineering (20)

Recently uploaded (20)

Hbase at Salesforce.com

Editor's Notes