SlideShare a Scribd company logo
Multi-tenant Apache HBase at Yahoo!
Francis Liu, Sumeet Singh
HBaseCon, June 13, 2013
Introduction
Yahoo! Presentation, Confidential
2
Sumeet Singh
Head of Products, Cloud Services &
Hadoop
Cloud Engineering Group
701 First Avenue
Sunnyvale, CA 94089 USA
Francis Liu
Principal Software Engineer, Hadoop
Cloud Engineering Group
§  Leads Apache HBase development at Yahoo!
§  Apache Hive contributor
§  Apache HCatalog committer
§  Developed a workflow management and
incremental processing platform
701 First Avenue
Sunnyvale, CA 94089 USA
§  Manages Cloud Services and Hadoop products
teams at Yahoo!
§  Responsible for Product Management, Customer
Engagements, Evangelism, and Program
Management
§  Pretends to be the Product Manager for Hadoop
at Yahoo!
§  Headed Strategy for Cloud Platform Group at
Yahoo!
Yahoo! Presentation, Confidential
Agenda
Yahoo! Presentation, Confidential 3
Apache HBase and Motivation for Multi-tenancy1
Apache HBase Use Cases at Yahoo!2
Integration with Other Apache Hadoop Projects3
Accomplishing Multi-tenancy in Apache HBase4
Experiences in deploying secure Apache HBase5
Multi-tenant Apache Hadoop at Yahoo!
Yahoo! Presentation, Confidential
0
50
100
150
200
250
300
350
400
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
2006 2007 2008 2009 2010 2011 2012
RawHDFSStorage(inPB)
NumberofNodes
Year
Nodes HDFS
Yahoo!
Commits to
Scaling Hadoop
for Production
Use
Research
Workloads
in Search and
Advertising
Production
(Modeling)
with machine
learning &
WebMap
Revenue
Systems with
Security, Multi-
tenancy, and
SLAs
Increased
User-base
with partitioned
namespaces
Open
Sourced with
Apache
Hortonworks
Spinoff for
Enterprise
hardening
Current
Team with
Y! focus
Footprint: 42,000 nodes, 365PB HDFS, 10M daily slot hours | Usage (Apr’13): 411 projects, 805 users, 13.2M jobs
4
HBase at Y! – Multi-tenancy at the Helm Again
Yahoo! Presentation, Confidential
0
1
2
3
4
5
0
500
1,000
1,500
2,000
2,500
3,000
2009 2010 2011 2012 2013
DataStored(inPB)
NumberofNodes
Year
Nodes Data Stored
Content
Personalization
Web Crawl
Cache
Grid Sandbox
Personalization
& Targeting
Phase I
Private deployments
Phase III
Prod. Multi-tenancy
Phase II
Grid Sandbox
5
Projected Footprint by 2013: 2,400 nodes, 4.6PB of data stored | Projected Usage by 2013: 18 projects
16 new projects
(4 on-boarded, 12 upcoming)
HBase Covers Most of Yahoo!’s Businesses
Yahoo! Presentation, Confidential
Yahoo!’s Global Businesses
Search
§  Web Cache
§  Query Analysis
§  Local Listings
§  Analytics
Y! Mail
§  Anti-spam
§  Log Analytics
§  Metadata Mgmt.
Cloud Platforms
§  Performance
§  Monitoring
§  OpenStack
Consumer Platforms
§  CMS
§  Social Data
Online Ads
§  Traffic Protection
§  Ads Data Mgmt.
P13N
§  Content P13N
§  Ad targeting
Mobile
§  Notifications
§  Flickr
Sales
§  eCommerce
6
The Appeal of Apache HBase to Yahoo!
Yahoo! Presentation, Confidential
Yahoo! needed a solution to store mutable data and support random access.
HBase was an obvious choice.
§  Native to the Hadoop ecosystem
§  Large Hadoop developer base
§  Integration with other Hadoop components popular at Yahoo! such as MapReduce, Pig, Oozie,
HCatalog, Hive
§  Vibrant/ active open source community of developers
§  Attractive throughputs, in particular, the write throughput
§  Acceptable latencies and scan performance
§  Support for Yahoo!’s scale
§  Support for bulk uploads
§  Easy on-boarding to the multi-tenant platform
§  Easy application development (lower time to market)
§  Support for dynamically adding columns, TTL, versioning and timestamps
7
Use Case 1 – Content Personalization (V1)
Yahoo! Presentation, Confidential
Y! Property
Server
Audience Events
User Activity
(User Pipeline)
Content
Content, Rules
(Content Pipeline)
Models
Index
Serving Scores
(User/ Content)
User Profiles | Content Index
8
Event
Collection
Content
Management
System
Serving
Systems
HDFS
MapReduce
Use Case 1 – Content Personalization (V2)
Yahoo! Presentation, Confidential
Y! Property
Server
Audience Events
User Activity
(User Pipeline)
Content
Content, Rules
(Content Pipeline)
Serving Scores
(User/ Content)
User Profiles | Content Index
9
Event
Collection
Content
Management
System
Serving
Systems
Storm
(Short-term
Pipeline)Models
Index
HDFS
MapReduce
HBase Helps Yahoo! Deliver Personalization
Yahoo! Presentation, Confidential 10
Poller
Use Case 2 – Web Crawl Cache
Yahoo! Presentation, Confidential 11
Fetcher
Ingestor Extruder
Processing
Cluster
Random
Read
End Users
Consumption
poll
fetch
launch
launch
write
read,
write
scan
Use Case 3 – Detecting Abusive Accounts
Yahoo! Presentation, Confidential 12
Outgoing Mail Server
Anti-spam
Rules Engine
User Feedback
Serving Maps
Machine Learning
Spam Detection Models
Spam
Patterns
Categorization
Models
Good, Offer CAPTCHA, Block
Near Real-time
Classifier
Compromised Acct.
Force pwd. Change/
deactivate acct.
Incoming Mail Server
Spam Filtering Abusive Account Detection
Use Case Pattern 1 – Persist Metadata/ State
Yahoo! Presentation, Confidential 13
write read
CF – MessageStore CF - Attributes
Rowkey MessageId Payload Colo
MD5 Hash
Mailbox Id # 1
1 “...” DC1
MD5 Hash
Mailbox Id # 2
2 “...” DC2
MD5 Hash
Mailbox Id # 3
3 “...” DC3
Use Case Pattern 2 – Metrics/ Analytics
Yahoo! Presentation, Confidential 14
Collector Collector Collector
Ingestion
Query Server
Use Case Pattern 3 – Dimension Store
Yahoo! Presentation, Confidential 15
MapReduce
Hive
Pig
Clickstream Ad Campaign
Use Case Pattern 4 – Incremental Processing
Yahoo! Presentation, Confidential 16
Collector
MapReduce
Storm
Serving
Stores
Batch (Long-term) Pipeline
Near Real-time (Short-term) Pipeline
Collection Off-stage Processing On-stage Serving
Files
Events
Search
Index
Use Case Patterns Summary
Yahoo! Presentation, Confidential 17
Use Case Persist Metadata/ State Metrics/ Analytics Dimension Store Incremental Processing
Search
Web Cache ✔
Query Analysis ✔
Local Business Listings ✔
Analytics ✔
Y! Mail
Anti-spam ✔
Log Analytics ✔
Metadata Management ✔
Cloud Platforms
Performance ✔
Online Monitoring ✔
OpenStack ✔
Consumer Platforms
Content Management ✔
Social Data ✔
Online Ads
Traffic Protection ✔
App Data Management ✔
P13N
Content Personalization ✔
Ad Targeting ✔
Mobile
Notifications ✔
Sales
eCommerce ✔
Integration With Other Hadoop Components
Yahoo! Presentation, Confidential 18
TableInputFormat
TableOutputFormat
HBaseStorage
HandlerHBaseStorage HBase Credential
Read/Write with MR Read/Write into existing tables Create/Drop/ Read/Write Secure HBase access
Apache Hadoop at Yahoo!
§  Hosted Multi-tenant Service
§  Security
§  Job Queues
§  HDFS Quota
Yahoo! Presentation, Confidential 19
Apache HBase at Yahoo!
§  Hosted Multi-tenant Service
§  Security
§  Isolated Deployment
§  Region Server Group (HBASE-6721)
§  Namespace (HBASE-8015)
Yahoo! Presentation, Confidential 20
Security
§  Authentication
§  Kerberos (users, processes)
§  Delegation Token (MapReduce, YARN, etc.)
§  Authorization
§  HBase ACLs (Read, Write, Create, Admin, Exec)
§  Grant permissions to User or Group
§  ACL for Table, Column Family or Column
§  Only Global Admin can create/drop tables
Yahoo! Presentation, Confidential 21
Security at Yahoo!
§  Used in a trusted environment
§  Prevent misuse vs. malicious use
§  Privacy
§  Auditing
§  Deployed in production for the last 8 months
§  No major issues
Yahoo! Presentation, Confidential 22
Security Issues
§  Unprotected APIs
§  bulkLoad (HBASE-5498)
§  stopRegionServer, regionClose, regionOpen (HBASE-7331)
§  WebUI
§  Compact & Split
§  Logs
§  Client Authentication not honored
§  Use hadoop-policy.xml
§  REST & Thrift interfaces have no authentication
Yahoo! Presentation, Confidential 23
Security Bugs
§  stopMaster (HBASE-7066)
§  Kerberos Replay
§  Ecosystem integration (Oozie, Pig, etc.)
§  Refreshing Proxy Users
§  Delegation Token (HBASE-6671, HBASE-7771, HBASE-7772)
Yahoo! Presentation, Confidential 24
Isolated Deployment
HBase
Client
HBase
Client
JobTracker Namenode
TaskTracker
DataNode
Namenode
RegionServer
DataNode
RegionServer
DataNode
RegionServer
DataNode
HBase MasterZookeeper
Quorum
HBase
Client
MR Client
M/R Task
TaskTracker
DataNode
M/R Task
TaskTracker
DataNode
MR Task
Compute Cluster HBase Cluster
Gateway/Launcher
Yahoo! Presentation, Confidential 25
Region Server Groups Overview
§  Member Region Servers
§  Member Tables
§  Resource Isolation
§  Flexibility with configuration
Group Bar
Region Server 5…8
Table3
Table4
Group Foo
Region Server 1…4
Table1
Table2
RS1
Table1
Table2
RS2
Table1
Table2
RS3
Table1
Table2
RS4 RS5
Table3
Table4
RS6
Table3
Table4
RS7
Table3
Table4
RS8
Yahoo! Presentation, Confidential 26
Region Server Groups APIs
§  group_add
§  group_remove
§  group_move_servers
§  group_move_table
§  group_balance
§  create … { …
CONFIGURATION=>{‘hbase.rsgroup.name’=>’my_group’}}
§  Requires Global Admin Privileges
Yahoo! Presentation, Confidential 27
Region Server Groups Implementation
LoadBalancer
GroupBasedLoadBalancer
GroupAdminEndpoint
GroupMasterObserver
HMaster
FilterBy
Group
foo
bar
GroupInfoManager
Group Table
Group
ZNode
Yahoo! Presentation, Confidential 28
Namespace
§  Analogous to Database
§  Full Table Name: <table namespace>.<table name>
§  i.e. my_ns.my_table
§  Reserved namespaces
§  Default – tables with no explicit namespace
§  System – tables are guaranteed to be assigned prior to user
tables
§  Table Path: /<hbaseRoot>/.data/<namespace>/
<tableName>
§  /hbase/data/my_ns/my_ns.my_table
Yahoo! Presentation, Confidential 29
Namespace + Security + Quota + Group
Namespace
Group Tables Quota ACL
Yahoo! Presentation, Confidential 30
Namespace + Security + Quota + Group
§  Default Region Server Group
§  Write privilege for table creation/deletion
§  Quota
§  Max Tables
§  Max Regions
§  Per Tenant
Namespace
Group Tables Quota ACL
Yahoo! Presentation, Confidential 31
Namespace + Quota
HMaster
TableNamespaceManager
Namespace
Table
Namespace
ZNodes
Namespace NamespaceController
ZKNamespaceManager
MasterCPHostRegionCPHost
Yahoo! Presentation, Confidential 32
Conclusion
§  Security
§  Isolation
§  Isolated Deployment
§  Region Server Groups
§  Resource Allocation
§  Namespace
Yahoo! Presentation, Confidential 33
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
Ad

More Related Content

What's hot (20)

Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Hortonworks
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
Sarvesh Meena
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
bazaarvoice_engineering
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
Anurag Shrivastava
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
Timothy Spann
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Kashif Khan
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
MapR Technologies
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
K.Mohamed Faizal
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Hortonworks
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
Sarvesh Meena
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
bazaarvoice_engineering
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
Anurag Shrivastava
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
MapR Technologies
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
K.Mohamed Faizal
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 

Similar to HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! (20)

OOP 2014
OOP 2014OOP 2014
OOP 2014
Emil Andreas Siemes
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
Pactera_US
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Jakub Kałużny
 
Big problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces securityBig problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces security
SecuRing
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Big data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.comBig data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.com
Data Con LA
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Luan Moreno Medeiros Maciel
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
Ashish Thapliyal
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
Sumeet Singh
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
Hortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Edureka!
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
Pactera_US
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Jakub Kałużny
 
Big problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces securityBig problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces security
SecuRing
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Big data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.comBig data and lynda_Subash_DSouza.com
Big data and lynda_Subash_DSouza.com
Data Con LA
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Luan Moreno Medeiros Maciel
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
Sumeet Singh
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
Hortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Edureka!
 
Ad

More from Sumeet Singh (15)

Hadoop Summit Kiosk Deck
Hadoop Summit Kiosk DeckHadoop Summit Kiosk Deck
Hadoop Summit Kiosk Deck
Sumeet Singh
 
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Sumeet Singh
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Sumeet Singh
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
Sumeet Singh
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Sumeet Singh
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Sumeet Singh
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Sumeet Singh
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Sumeet Singh
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Sumeet Singh
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Sumeet Singh
 
Hadoop Summit Kiosk Deck
Hadoop Summit Kiosk DeckHadoop Summit Kiosk Deck
Hadoop Summit Kiosk Deck
Sumeet Singh
 
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Sumeet Singh
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Sumeet Singh
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
Sumeet Singh
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Sumeet Singh
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Sumeet Singh
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Sumeet Singh
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Sumeet Singh
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Sumeet Singh
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Sumeet Singh
 
Ad

Recently uploaded (20)

Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 

HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!

  • 1. Multi-tenant Apache HBase at Yahoo! Francis Liu, Sumeet Singh HBaseCon, June 13, 2013
  • 2. Introduction Yahoo! Presentation, Confidential 2 Sumeet Singh Head of Products, Cloud Services & Hadoop Cloud Engineering Group 701 First Avenue Sunnyvale, CA 94089 USA Francis Liu Principal Software Engineer, Hadoop Cloud Engineering Group §  Leads Apache HBase development at Yahoo! §  Apache Hive contributor §  Apache HCatalog committer §  Developed a workflow management and incremental processing platform 701 First Avenue Sunnyvale, CA 94089 USA §  Manages Cloud Services and Hadoop products teams at Yahoo! §  Responsible for Product Management, Customer Engagements, Evangelism, and Program Management §  Pretends to be the Product Manager for Hadoop at Yahoo! §  Headed Strategy for Cloud Platform Group at Yahoo! Yahoo! Presentation, Confidential
  • 3. Agenda Yahoo! Presentation, Confidential 3 Apache HBase and Motivation for Multi-tenancy1 Apache HBase Use Cases at Yahoo!2 Integration with Other Apache Hadoop Projects3 Accomplishing Multi-tenancy in Apache HBase4 Experiences in deploying secure Apache HBase5
  • 4. Multi-tenant Apache Hadoop at Yahoo! Yahoo! Presentation, Confidential 0 50 100 150 200 250 300 350 400 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 2006 2007 2008 2009 2010 2011 2012 RawHDFSStorage(inPB) NumberofNodes Year Nodes HDFS Yahoo! Commits to Scaling Hadoop for Production Use Research Workloads in Search and Advertising Production (Modeling) with machine learning & WebMap Revenue Systems with Security, Multi- tenancy, and SLAs Increased User-base with partitioned namespaces Open Sourced with Apache Hortonworks Spinoff for Enterprise hardening Current Team with Y! focus Footprint: 42,000 nodes, 365PB HDFS, 10M daily slot hours | Usage (Apr’13): 411 projects, 805 users, 13.2M jobs 4
  • 5. HBase at Y! – Multi-tenancy at the Helm Again Yahoo! Presentation, Confidential 0 1 2 3 4 5 0 500 1,000 1,500 2,000 2,500 3,000 2009 2010 2011 2012 2013 DataStored(inPB) NumberofNodes Year Nodes Data Stored Content Personalization Web Crawl Cache Grid Sandbox Personalization & Targeting Phase I Private deployments Phase III Prod. Multi-tenancy Phase II Grid Sandbox 5 Projected Footprint by 2013: 2,400 nodes, 4.6PB of data stored | Projected Usage by 2013: 18 projects 16 new projects (4 on-boarded, 12 upcoming)
  • 6. HBase Covers Most of Yahoo!’s Businesses Yahoo! Presentation, Confidential Yahoo!’s Global Businesses Search §  Web Cache §  Query Analysis §  Local Listings §  Analytics Y! Mail §  Anti-spam §  Log Analytics §  Metadata Mgmt. Cloud Platforms §  Performance §  Monitoring §  OpenStack Consumer Platforms §  CMS §  Social Data Online Ads §  Traffic Protection §  Ads Data Mgmt. P13N §  Content P13N §  Ad targeting Mobile §  Notifications §  Flickr Sales §  eCommerce 6
  • 7. The Appeal of Apache HBase to Yahoo! Yahoo! Presentation, Confidential Yahoo! needed a solution to store mutable data and support random access. HBase was an obvious choice. §  Native to the Hadoop ecosystem §  Large Hadoop developer base §  Integration with other Hadoop components popular at Yahoo! such as MapReduce, Pig, Oozie, HCatalog, Hive §  Vibrant/ active open source community of developers §  Attractive throughputs, in particular, the write throughput §  Acceptable latencies and scan performance §  Support for Yahoo!’s scale §  Support for bulk uploads §  Easy on-boarding to the multi-tenant platform §  Easy application development (lower time to market) §  Support for dynamically adding columns, TTL, versioning and timestamps 7
  • 8. Use Case 1 – Content Personalization (V1) Yahoo! Presentation, Confidential Y! Property Server Audience Events User Activity (User Pipeline) Content Content, Rules (Content Pipeline) Models Index Serving Scores (User/ Content) User Profiles | Content Index 8 Event Collection Content Management System Serving Systems HDFS MapReduce
  • 9. Use Case 1 – Content Personalization (V2) Yahoo! Presentation, Confidential Y! Property Server Audience Events User Activity (User Pipeline) Content Content, Rules (Content Pipeline) Serving Scores (User/ Content) User Profiles | Content Index 9 Event Collection Content Management System Serving Systems Storm (Short-term Pipeline)Models Index HDFS MapReduce
  • 10. HBase Helps Yahoo! Deliver Personalization Yahoo! Presentation, Confidential 10
  • 11. Poller Use Case 2 – Web Crawl Cache Yahoo! Presentation, Confidential 11 Fetcher Ingestor Extruder Processing Cluster Random Read End Users Consumption poll fetch launch launch write read, write scan
  • 12. Use Case 3 – Detecting Abusive Accounts Yahoo! Presentation, Confidential 12 Outgoing Mail Server Anti-spam Rules Engine User Feedback Serving Maps Machine Learning Spam Detection Models Spam Patterns Categorization Models Good, Offer CAPTCHA, Block Near Real-time Classifier Compromised Acct. Force pwd. Change/ deactivate acct. Incoming Mail Server Spam Filtering Abusive Account Detection
  • 13. Use Case Pattern 1 – Persist Metadata/ State Yahoo! Presentation, Confidential 13 write read CF – MessageStore CF - Attributes Rowkey MessageId Payload Colo MD5 Hash Mailbox Id # 1 1 “...” DC1 MD5 Hash Mailbox Id # 2 2 “...” DC2 MD5 Hash Mailbox Id # 3 3 “...” DC3
  • 14. Use Case Pattern 2 – Metrics/ Analytics Yahoo! Presentation, Confidential 14 Collector Collector Collector Ingestion Query Server
  • 15. Use Case Pattern 3 – Dimension Store Yahoo! Presentation, Confidential 15 MapReduce Hive Pig Clickstream Ad Campaign
  • 16. Use Case Pattern 4 – Incremental Processing Yahoo! Presentation, Confidential 16 Collector MapReduce Storm Serving Stores Batch (Long-term) Pipeline Near Real-time (Short-term) Pipeline Collection Off-stage Processing On-stage Serving Files Events Search Index
  • 17. Use Case Patterns Summary Yahoo! Presentation, Confidential 17 Use Case Persist Metadata/ State Metrics/ Analytics Dimension Store Incremental Processing Search Web Cache ✔ Query Analysis ✔ Local Business Listings ✔ Analytics ✔ Y! Mail Anti-spam ✔ Log Analytics ✔ Metadata Management ✔ Cloud Platforms Performance ✔ Online Monitoring ✔ OpenStack ✔ Consumer Platforms Content Management ✔ Social Data ✔ Online Ads Traffic Protection ✔ App Data Management ✔ P13N Content Personalization ✔ Ad Targeting ✔ Mobile Notifications ✔ Sales eCommerce ✔
  • 18. Integration With Other Hadoop Components Yahoo! Presentation, Confidential 18 TableInputFormat TableOutputFormat HBaseStorage HandlerHBaseStorage HBase Credential Read/Write with MR Read/Write into existing tables Create/Drop/ Read/Write Secure HBase access
  • 19. Apache Hadoop at Yahoo! §  Hosted Multi-tenant Service §  Security §  Job Queues §  HDFS Quota Yahoo! Presentation, Confidential 19
  • 20. Apache HBase at Yahoo! §  Hosted Multi-tenant Service §  Security §  Isolated Deployment §  Region Server Group (HBASE-6721) §  Namespace (HBASE-8015) Yahoo! Presentation, Confidential 20
  • 21. Security §  Authentication §  Kerberos (users, processes) §  Delegation Token (MapReduce, YARN, etc.) §  Authorization §  HBase ACLs (Read, Write, Create, Admin, Exec) §  Grant permissions to User or Group §  ACL for Table, Column Family or Column §  Only Global Admin can create/drop tables Yahoo! Presentation, Confidential 21
  • 22. Security at Yahoo! §  Used in a trusted environment §  Prevent misuse vs. malicious use §  Privacy §  Auditing §  Deployed in production for the last 8 months §  No major issues Yahoo! Presentation, Confidential 22
  • 23. Security Issues §  Unprotected APIs §  bulkLoad (HBASE-5498) §  stopRegionServer, regionClose, regionOpen (HBASE-7331) §  WebUI §  Compact & Split §  Logs §  Client Authentication not honored §  Use hadoop-policy.xml §  REST & Thrift interfaces have no authentication Yahoo! Presentation, Confidential 23
  • 24. Security Bugs §  stopMaster (HBASE-7066) §  Kerberos Replay §  Ecosystem integration (Oozie, Pig, etc.) §  Refreshing Proxy Users §  Delegation Token (HBASE-6671, HBASE-7771, HBASE-7772) Yahoo! Presentation, Confidential 24
  • 25. Isolated Deployment HBase Client HBase Client JobTracker Namenode TaskTracker DataNode Namenode RegionServer DataNode RegionServer DataNode RegionServer DataNode HBase MasterZookeeper Quorum HBase Client MR Client M/R Task TaskTracker DataNode M/R Task TaskTracker DataNode MR Task Compute Cluster HBase Cluster Gateway/Launcher Yahoo! Presentation, Confidential 25
  • 26. Region Server Groups Overview §  Member Region Servers §  Member Tables §  Resource Isolation §  Flexibility with configuration Group Bar Region Server 5…8 Table3 Table4 Group Foo Region Server 1…4 Table1 Table2 RS1 Table1 Table2 RS2 Table1 Table2 RS3 Table1 Table2 RS4 RS5 Table3 Table4 RS6 Table3 Table4 RS7 Table3 Table4 RS8 Yahoo! Presentation, Confidential 26
  • 27. Region Server Groups APIs §  group_add §  group_remove §  group_move_servers §  group_move_table §  group_balance §  create … { … CONFIGURATION=>{‘hbase.rsgroup.name’=>’my_group’}} §  Requires Global Admin Privileges Yahoo! Presentation, Confidential 27
  • 28. Region Server Groups Implementation LoadBalancer GroupBasedLoadBalancer GroupAdminEndpoint GroupMasterObserver HMaster FilterBy Group foo bar GroupInfoManager Group Table Group ZNode Yahoo! Presentation, Confidential 28
  • 29. Namespace §  Analogous to Database §  Full Table Name: <table namespace>.<table name> §  i.e. my_ns.my_table §  Reserved namespaces §  Default – tables with no explicit namespace §  System – tables are guaranteed to be assigned prior to user tables §  Table Path: /<hbaseRoot>/.data/<namespace>/ <tableName> §  /hbase/data/my_ns/my_ns.my_table Yahoo! Presentation, Confidential 29
  • 30. Namespace + Security + Quota + Group Namespace Group Tables Quota ACL Yahoo! Presentation, Confidential 30
  • 31. Namespace + Security + Quota + Group §  Default Region Server Group §  Write privilege for table creation/deletion §  Quota §  Max Tables §  Max Regions §  Per Tenant Namespace Group Tables Quota ACL Yahoo! Presentation, Confidential 31
  • 32. Namespace + Quota HMaster TableNamespaceManager Namespace Table Namespace ZNodes Namespace NamespaceController ZKNamespaceManager MasterCPHostRegionCPHost Yahoo! Presentation, Confidential 32
  • 33. Conclusion §  Security §  Isolation §  Isolated Deployment §  Region Server Groups §  Resource Allocation §  Namespace Yahoo! Presentation, Confidential 33