SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Security with HDP/PHD
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under
development or may be under development in the future.
Technical feasibility, market demand, user feedback, and the Apache Software Foundation
community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a
contractual commitment from Hortonworks to deliver these features in any generally available
product.
Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
• Hadoop Security
• Kerberos
• Authorization and Auditing with Ranger
• Gateway Security with Knox
• Encryption
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
• Wire encryption
in Hadoop
• Native and
partner
encryption
• Centralized
audit reporting
w/ Apache
Ranger
• Fine grain access
control with
Apache Ranger
Security today in Hadoop with HDP/PHD
Authorization
What can I do?
Audit
What did I do?
Data Protection
Can data be encrypted
at rest and over the
wire?
• Kerberos
• API security with
Apache Knox
Authentication
Who am I/prove it?
HDPPHD
Centralized Security Administration
EnterpriseServices:Security
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Security needs are changing
Administration
Centrally management &
consistent security
Authentication
Authenticate users and systems
Authorization
Provision access to data
Audit
Maintain a record of data access
Data Protection
Protect data at rest and in motion
Security needs are changing
• YARN unlocks the data lake
• Multi-tenant: Multiple applications for data
access
• Different kinds of data
• Changing and complex compliance environment
2014
65% of clusters host
multiple workloads
Fall 2013
Largely silo’d deployments
with single workload clusters
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS
Typical Flow – Hive Access through Beeline client
HiveServer 2
A B C
Beeline
Client
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS
Typical Flow – Authenticate through Kerberos
HiveServer 2
A B C
KDC
Use Hive
Service T,icket
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN
Service Ticket
Client
• Requests a TGT
• Receives TGT
• Client dcrypts it with the password
hash
• Sends the TGT and receives a Service
Ticket
Beeline
Client
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS
Typical Flow – Add Authorization through Ranger(XA
Secure)
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Ranger
Client gets
service ticket for
Hive
Beeline
Client
Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS
Typical Flow – Firewall, Route through Knox
Gateway
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
Beeline
Client
Apache
Knox
Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS
Typical Flow – Add Wire and File Encryption
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
SSL
Beeline
Client
SSL SASL
SSL SSL
Apache
Knox
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Security Features
PHD/HDP Security
Authentication
Kerberos Support ✔
Perimeter Security – For services and rest API ✔
Authorizations
Fine grained access control HDFS, Hbase and Hive, Storm
and Knox
Role base access control ✔
Column level ✔
Permission Support Create, Drop, Index, lock, user
Auditing
Resource access auditing Extensive Auditing
Policy auditing ✔
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP/PHD Security w/ Ranger
Data Protection
Wire Encryption ✔
Volume Encryption TDE
File/Column Encryption HDFS TDE & Partners
Reporting
Global view of policies and audit data ✔
Manage
User/ Group mapping ✔
Global policy manager, Web UI ✔
Delegated administration ✔
Security Features
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Partner Integration
Security Integrations:
● Ranger plugins: centralize authorization/audit of 3rd party s/w in Ranger UI
● Via Custom Log4J appender, can stream audit events to INFA infrastructure
● Knox: Route partner APIs through Knox after validating compatibility
● Provide SSO capability to end users
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Authentication w/ Kerberos
Page 14
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Kerberos in the field
Kerberos no longer “too complex”. Adoption growing.
● Ambari helps automate and manage kerberos integration with cluster
Use: Active directory or a combine Kerberos/Active Directory
● Active Directory is seen most commonly in the field
● Many start with separate MIT KDC and then later grow into the AD KDC
Knox should be considered for API/Perimeter security
● Removes need for Kerberos for end users
● Enables integration with different authentication standards
● Single location to manage security for REST APIs & HTTP based services
● Tip: In DMZ
Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Authorization and Auditing
Apache Ranger
Page 22
Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Authorization and Audit
Authorization
Fine grain access control
• HDFS – Folder, File
• Hive – Database, Table, Column
• HBase – Table, Column Family, Column
• Storm, Knox and more
Audit
Extensive user access auditing in
HDFS, Hive and HBase
• IP Address
• Resource type/ resource
• Timestamp
• Access granted or denied
Control
access into
system
Flexibility
in defining
policies
Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Central Security Administration
Apache Ranger
• Delivers a ‘single pane of glass’ for
the security administrator
• Centralizes administration of
security policy
• Ensures consistent coverage across
the entire Hadoop stack
Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Setup Authorization Policies
25
file level
access
control,
flexible
definition
Control
permissions
Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Monitor through Auditing
26
Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Ranger Flow
Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Authorization and Auditing w/ Ranger
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Policy
Server
Ranger Audit
Server
Ranger
Plugin
HadoopComponentsEnterprise
Users
Ranger
Plugin
Ranger
Plugin
Legacy Tools
& Data
Governance
Integration APIHDFS
Knox
Storm
Ranger
Plugin
Ranger
Plugin
RDBMS
HDP 2.2 Additions Planned for 2015
TBD
EnterpriseServices:Security
Ranger
Plugin*
Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Installation Steps
• Install PHD 3.0
• Install Apache Ranger (https://tinyurl.com/mlgs3jy)
– Install Policy Manager
– Install User Sync
– Install Ranger Plugins
• Start Policy Manager
– service ranger-admin start
• Verify – http://<host>:6080/
- admin/admin
Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Ranger Plugins
• HDFS
• HIVE
• KNOX
• STORM
• HBASE
Steps to Enable plugins
1. Start the Policy Manager
2. Create the Plugin repository in the Policy Manager
3. Install the Plugin
• Edit the install.properties
• Execue ./enable-<plugin>.sh
4. Restart the plugin service (e.g. HDFS, Hive etc)
Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Ranger Console
31
• The Repository Manager Tab
• The Policy Manager Tab
• The User/Group Tab
• The Analytics Tab
• The Audit Tab
Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Repository Manager
32
• Add New Repository
• Edit Repository
• Delete Repository
Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo
33
Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
REST API Security through Knox
Securely share Hadoop Cluster
Page 34
Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Share Data Lake with everyone - Securely
• Simplifies access: Extends Hadoop’s REST/HTTP services by encapsulating Kerberos to within the
Cluster.
• Enhances security: Exposes Hadoop’s REST/HTTP services without revealing network details,
providing SSL out of the box.
• Centralized control: Enforces REST API security centrally, routing requests to multiple Hadoop
clusters.
• Enterprise integration: Supports LDAP, Active Directory, SSO, SAML and other authentication
systems.
Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Knox
Knox can be used with both unsecured Hadoop clusters, and Kerberos secured clusters. In an enterprise
solution that employs Kerberos secured clusters, the Apache Knox Gateway provides an enterprise security
solution that:
• Integrates well with enterprise identity management solutions
• Protects the details of the Hadoop cluster deployment (hosts and ports are hidden from end users)
• Simplifies the number of services with which a client needs to interact
Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Load Balancer
Extend Hadoop API reach with Knox
Hadoop Cluster
Application TierApp A App NApp B App C
Data Ingest
ETL
Admin/
Operators
Bastian Node
SSH
RPC Call
Falcon
Oozie
Scoop
Flume
Data
Operator
Business
User
Hadoop
Admin
JDBC/ODBCREST/HTTP
Knox
Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS
Typical Flow – Add Wire and File Encryption
HiveServer 2
A B C
KDC
Use Hive ST,
submit query
Hive gets
Namenode
(NN) service
ticket
Hive creates
map reduce
using NN ST
Ranger
Knox gets
service ticket for
Hive
Knox runs as proxy
user using Hive ST
Original
request w/user
id/password
Client gets
query result
SSL
Beeline
Client
SSL SASL
SSL SSL
Apache
Knox
Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why Knox?
Simplified Access
• Kerberos encapsulation
• Extends API reach
• Single access point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration
• Active Directory integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• SSL for non-SSL services
• WebApp vulnerability filter
Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop REST API with Knox
Service Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager
Masters could
be on many
different hosts
One hosts,
one port
Consistent
paths
SSL config
at one host
Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop REST API Security: Drill-Down
Page 41
REST
Client
Enterprise
Identity
Provider
LDAP/AD
Knox Gateway
GW
GW
Firewall
Firewall
DMZ
LB
Edge
Node/Hado
op CLIs RPC
HTTP
HTTP HTTP
LDAP
Hadoop Cluster 1
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
Hadoop Cluster 2
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBase
HBase
Page42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Knox –features in PHD
• Use Ambari for Install/start/stop/configuration
• Knox support for HDFS HA
• Support for YARN REST API
• Support for SSL to Hadoop Cluster Services (WebHDFS, HBase,
Hive & Oozie)
• Integration with Ranger for Knox Service Level Authorization
• Knox Management REST API
Page43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Installation
• Installed via Ambari
–This can be done manually
–Start the embeded ldap
• There is good examples in the Apache doc with groovy scripts
–https://knox.apache.org/books/knox-0-4-0/knox-0-4-0.html
Page44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Protection
Wire and data at rest encryption
Page 44
Page45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Data Protection
HDP allows you to apply data protection policy at
different layers across the Hadoop stack
Layer What? How ?
Storage and
Access
Encrypt data while it is at rest
Partners, HDFS Tech Preview, Hbase
encryption, OS level encrypt,
Transmission Encrypt data as it moves Supported from HDP 2.1
Page49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS Transparent Data Encryption (TDE) in 2.2
• Data encryption on a higher level than the OS one whilst remaining native
and transparent to Hadoop
• End-to-end: data can be both encrypted and decrypted by the clients
• Encryption/decryption using the usual HDFS functions from the client
• No need to requiring to change user application code
• No need to store data encryption keys on HDFS itself
• No need to unencrypted data.
• Data is effectively encrypted at rest, but since it is decrypted on the client
side, it means that it is also encrypted on the wire while being transmitted.
• HDFS file encryption/decryption is transparent to its client
• users can read/write files to/from encryption zone as long they have the permission to
access it
• Depends on installing a Key Management Server
Page53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS Transparent Data Encryption (TDE) in 2.2
• Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop
• End-to-end: data can be both encrypted and decrypted by the clients
• Encryption/decryption using the usual HDFS functions from the client
• No need to requiring to change user application code
• No need to store data encryption keys on HDFS itself
• No need to unencrypted data.
• Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also
encrypted on the wire while being transmitted.
• HDFS file encryption/decryption is transparent to its client
• users can read/write files to/from encryption zone as long they have the permission to access it
• Depends on installing a Key Management Server
Page54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDFS Transparent Data Encryption (TDE) - Steps
• Install and run KMS on top of HDP 2.2
• Change HDFS params via Ambari
• Create encryption key
• hadoop key create key1 -size 256
• hadoop key list –metadata
• Create an encryption zone using the key
• hdfs dfs -mkdir /zone1
• hdfs crypto -createZone -keyName key1 /zone1
• hdfs –listZones
– http://hortonworks.com/kb/hdfs-transparent-data-encryption/
Page55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You

More Related Content

PPTX
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
PPTX
Apache Ranger
Rommel Garcia
 
PPTX
Apache Ambari: Past, Present, Future
Hortonworks
 
PPTX
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
TriHUG October: Apache Ranger
trihug
 
PPT
Hadoop Security Architecture
Owen O'Malley
 
PDF
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Apache Ranger
Rommel Garcia
 
Apache Ambari: Past, Present, Future
Hortonworks
 
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
TriHUG October: Apache Ranger
trihug
 
Hadoop Security Architecture
Owen O'Malley
 
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 

What's hot (20)

PPTX
Introduction to helm
Jeeva Chelladhurai
 
PPTX
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
DataWorks Summit
 
PPTX
Open Source Security Tools for Big Data
Rommel Garcia
 
PDF
Docker in real life
Nguyen Van Vuong
 
PPTX
Hdp security overview
Hortonworks
 
PPTX
Hadoop Security Today and Tomorrow
DataWorks Summit
 
PPTX
Getting started with Docker
Ravindu Fernando
 
PPTX
Docker Networking Overview
Sreenivas Makam
 
PDF
Kubernetes Basics
Eueung Mulyana
 
PDF
IPFS: The Permanent Web
Sivachandran Paramsivam
 
PPTX
Virtualization Vs. Containers
actualtechmedia
 
PPTX
Solr Exchange: Introduction to SolrCloud
thelabdude
 
PDF
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
Edureka!
 
PPTX
Tuning kafka pipelines
Sumant Tambe
 
PDF
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
OpenStack Korea Community
 
PPTX
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
PPTX
ElasticSearch Basic Introduction
Mayur Rathod
 
PPTX
Overview of HDFS Transparent Encryption
Cloudera, Inc.
 
ODP
OAuth2 - Introduction
Knoldus Inc.
 
PDF
Helm - Application deployment management for Kubernetes
Alexei Ledenev
 
Introduction to helm
Jeeva Chelladhurai
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
DataWorks Summit
 
Open Source Security Tools for Big Data
Rommel Garcia
 
Docker in real life
Nguyen Van Vuong
 
Hdp security overview
Hortonworks
 
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Getting started with Docker
Ravindu Fernando
 
Docker Networking Overview
Sreenivas Makam
 
Kubernetes Basics
Eueung Mulyana
 
IPFS: The Permanent Web
Sivachandran Paramsivam
 
Virtualization Vs. Containers
actualtechmedia
 
Solr Exchange: Introduction to SolrCloud
thelabdude
 
What is Docker | Docker Tutorial for Beginners | Docker Container | DevOps To...
Edureka!
 
Tuning kafka pipelines
Sumant Tambe
 
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
OpenStack Korea Community
 
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
ElasticSearch Basic Introduction
Mayur Rathod
 
Overview of HDFS Transparent Encryption
Cloudera, Inc.
 
OAuth2 - Introduction
Knoldus Inc.
 
Helm - Application deployment management for Kubernetes
Alexei Ledenev
 
Ad

Viewers also liked (15)

PPTX
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
PPTX
Improvements in Hadoop Security
DataWorks Summit
 
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
PPT
Information security in big data -privacy and data mining
harithavijay94
 
PPTX
Built-In Security for the Cloud
DataWorks Summit
 
PDF
Big Data Security with Hadoop
Cloudera, Inc.
 
PPTX
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
PPTX
Big Data and Security - Where are we now? (2015)
Peter Wood
 
PPTX
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
PDF
Hadoop & Security - Past, Present, Future
Uwe Printz
 
PDF
OAuth - Open API Authentication
leahculver
 
PDF
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
PPTX
Hadoop and Data Access Security
Cloudera, Inc.
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
Improvements in Hadoop Security
DataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Information security in big data -privacy and data mining
harithavijay94
 
Built-In Security for the Cloud
DataWorks Summit
 
Big Data Security with Hadoop
Cloudera, Inc.
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Big Data and Security - Where are we now? (2015)
Peter Wood
 
An Approach for Multi-Tenancy Through Apache Knox
DataWorks Summit/Hadoop Summit
 
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
Hadoop & Security - Past, Present, Future
Uwe Printz
 
OAuth - Open API Authentication
leahculver
 
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
Hadoop and Data Access Security
Cloudera, Inc.
 
Ad

Similar to Hadoop security (20)

PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
PPTX
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PDF
2014 sept 4_hadoop_security
Adam Muise
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Improvements in Hadoop Security
Chris Nauroth
 
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
PDF
An Apache Hive Based Data Warehouse
DataWorks Summit
 
PDF
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
PPTX
Open Source Security Tools for Big Data
Great Wide Open
 
PDF
August 2014 HUG : Comprehensive Security for Hadoop
Yahoo Developer Network
 
PDF
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Big Data Spain
 
PDF
BigData Security - A Point of View
Karan Alang
 
PPTX
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
DataWorks Summit
 
PPTX
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
PPTX
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
PDF
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
PPTX
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
PDF
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk
 
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
2014 sept 4_hadoop_security
Adam Muise
 
Saving the elephant—now, not later
DataWorks Summit
 
Improvements in Hadoop Security
Chris Nauroth
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
Open Source Security Tools for Big Data
Great Wide Open
 
August 2014 HUG : Comprehensive Security for Hadoop
Yahoo Developer Network
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Big Data Spain
 
BigData Security - A Point of View
Karan Alang
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
DataWorks Summit
 
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk
 

More from Shivaji Dutta (8)

PPTX
Life in lock down - A Data Driven Story
Shivaji Dutta
 
PPTX
Deep learning an Introduction with Competitive Landscape
Shivaji Dutta
 
PDF
Aurius
Shivaji Dutta
 
PPTX
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
PPTX
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
PPTX
Ambari blueprints-overview
Shivaji Dutta
 
PPTX
Machine Learning With Spark
Shivaji Dutta
 
PPTX
Apache Slider
Shivaji Dutta
 
Life in lock down - A Data Driven Story
Shivaji Dutta
 
Deep learning an Introduction with Competitive Landscape
Shivaji Dutta
 
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Ambari blueprints-overview
Shivaji Dutta
 
Machine Learning With Spark
Shivaji Dutta
 
Apache Slider
Shivaji Dutta
 

Recently uploaded (20)

PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Presentation about variables and constant.pptx
kr2589474
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 

Hadoop security

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Security with HDP/PHD
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development or may be under development in the future. Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Agenda • Hadoop Security • Kerberos • Authorization and Auditing with Ranger • Gateway Security with Knox • Encryption
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Wire encryption in Hadoop • Native and partner encryption • Centralized audit reporting w/ Apache Ranger • Fine grain access control with Apache Ranger Security today in Hadoop with HDP/PHD Authorization What can I do? Audit What did I do? Data Protection Can data be encrypted at rest and over the wire? • Kerberos • API security with Apache Knox Authentication Who am I/prove it? HDPPHD Centralized Security Administration EnterpriseServices:Security
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Security needs are changing Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Security needs are changing • YARN unlocks the data lake • Multi-tenant: Multiple applications for data access • Different kinds of data • Changing and complex compliance environment 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Hive Access through Beeline client HiveServer 2 A B C Beeline Client
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Authenticate through Kerberos HiveServer 2 A B C KDC Use Hive Service T,icket submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN Service Ticket Client • Requests a TGT • Receives TGT • Client dcrypts it with the password hash • Sends the TGT and receives a Service Ticket Beeline Client
  • 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Add Authorization through Ranger(XA Secure) HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Client gets service ticket for Hive Beeline Client
  • 9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Firewall, Route through Knox Gateway HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request w/user id/password Client gets query result Beeline Client Apache Knox
  • 10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Add Wire and File Encryption HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request w/user id/password Client gets query result SSL Beeline Client SSL SASL SSL SSL Apache Knox
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Security Features PHD/HDP Security Authentication Kerberos Support ✔ Perimeter Security – For services and rest API ✔ Authorizations Fine grained access control HDFS, Hbase and Hive, Storm and Knox Role base access control ✔ Column level ✔ Permission Support Create, Drop, Index, lock, user Auditing Resource access auditing Extensive Auditing Policy auditing ✔
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP/PHD Security w/ Ranger Data Protection Wire Encryption ✔ Volume Encryption TDE File/Column Encryption HDFS TDE & Partners Reporting Global view of policies and audit data ✔ Manage User/ Group mapping ✔ Global policy manager, Web UI ✔ Delegated administration ✔ Security Features
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Partner Integration Security Integrations: ● Ranger plugins: centralize authorization/audit of 3rd party s/w in Ranger UI ● Via Custom Log4J appender, can stream audit events to INFA infrastructure ● Knox: Route partner APIs through Knox after validating compatibility ● Provide SSO capability to end users
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authentication w/ Kerberos Page 14
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Kerberos in the field Kerberos no longer “too complex”. Adoption growing. ● Ambari helps automate and manage kerberos integration with cluster Use: Active directory or a combine Kerberos/Active Directory ● Active Directory is seen most commonly in the field ● Many start with separate MIT KDC and then later grow into the AD KDC Knox should be considered for API/Perimeter security ● Removes need for Kerberos for end users ● Enables integration with different authentication standards ● Single location to manage security for REST APIs & HTTP based services ● Tip: In DMZ
  • 16. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization and Auditing Apache Ranger Page 22
  • 17. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization and Audit Authorization Fine grain access control • HDFS – Folder, File • Hive – Database, Table, Column • HBase – Table, Column Family, Column • Storm, Knox and more Audit Extensive user access auditing in HDFS, Hive and HBase • IP Address • Resource type/ resource • Timestamp • Access granted or denied Control access into system Flexibility in defining policies
  • 18. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Central Security Administration Apache Ranger • Delivers a ‘single pane of glass’ for the security administrator • Centralizes administration of security policy • Ensures consistent coverage across the entire Hadoop stack
  • 19. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Setup Authorization Policies 25 file level access control, flexible definition Control permissions
  • 20. Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Monitor through Auditing 26
  • 21. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Ranger Flow
  • 22. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Authorization and Auditing w/ Ranger HDFS Ranger Administration Portal HBase Hive Server2 Ranger Policy Server Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools & Data Governance Integration APIHDFS Knox Storm Ranger Plugin Ranger Plugin RDBMS HDP 2.2 Additions Planned for 2015 TBD EnterpriseServices:Security Ranger Plugin*
  • 23. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Installation Steps • Install PHD 3.0 • Install Apache Ranger (https://tinyurl.com/mlgs3jy) – Install Policy Manager – Install User Sync – Install Ranger Plugins • Start Policy Manager – service ranger-admin start • Verify – http://<host>:6080/ - admin/admin
  • 24. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Ranger Plugins • HDFS • HIVE • KNOX • STORM • HBASE Steps to Enable plugins 1. Start the Policy Manager 2. Create the Plugin repository in the Policy Manager 3. Install the Plugin • Edit the install.properties • Execue ./enable-<plugin>.sh 4. Restart the plugin service (e.g. HDFS, Hive etc)
  • 25. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Ranger Console 31 • The Repository Manager Tab • The Policy Manager Tab • The User/Group Tab • The Analytics Tab • The Audit Tab
  • 26. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Repository Manager 32 • Add New Repository • Edit Repository • Delete Repository
  • 27. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo 33
  • 28. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved REST API Security through Knox Securely share Hadoop Cluster Page 34
  • 29. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Share Data Lake with everyone - Securely • Simplifies access: Extends Hadoop’s REST/HTTP services by encapsulating Kerberos to within the Cluster. • Enhances security: Exposes Hadoop’s REST/HTTP services without revealing network details, providing SSL out of the box. • Centralized control: Enforces REST API security centrally, routing requests to multiple Hadoop clusters. • Enterprise integration: Supports LDAP, Active Directory, SSO, SAML and other authentication systems.
  • 30. Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Knox Knox can be used with both unsecured Hadoop clusters, and Kerberos secured clusters. In an enterprise solution that employs Kerberos secured clusters, the Apache Knox Gateway provides an enterprise security solution that: • Integrates well with enterprise identity management solutions • Protects the details of the Hadoop cluster deployment (hosts and ports are hidden from end users) • Simplifies the number of services with which a client needs to interact
  • 31. Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Load Balancer Extend Hadoop API reach with Knox Hadoop Cluster Application TierApp A App NApp B App C Data Ingest ETL Admin/ Operators Bastian Node SSH RPC Call Falcon Oozie Scoop Flume Data Operator Business User Hadoop Admin JDBC/ODBCREST/HTTP Knox
  • 32. Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Typical Flow – Add Wire and File Encryption HiveServer 2 A B C KDC Use Hive ST, submit query Hive gets Namenode (NN) service ticket Hive creates map reduce using NN ST Ranger Knox gets service ticket for Hive Knox runs as proxy user using Hive ST Original request w/user id/password Client gets query result SSL Beeline Client SSL SASL SSL SSL Apache Knox
  • 33. Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • SSL for non-SSL services • WebApp vulnerability filter
  • 34. Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop REST API with Knox Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One hosts, one port Consistent paths SSL config at one host
  • 35. Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop REST API Security: Drill-Down Page 41 REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ LB Edge Node/Hado op CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  • 36. Page42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Knox –features in PHD • Use Ambari for Install/start/stop/configuration • Knox support for HDFS HA • Support for YARN REST API • Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie) • Integration with Ranger for Knox Service Level Authorization • Knox Management REST API
  • 37. Page43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Installation • Installed via Ambari –This can be done manually –Start the embeded ldap • There is good examples in the Apache doc with groovy scripts –https://knox.apache.org/books/knox-0-4-0/knox-0-4-0.html
  • 38. Page44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Protection Wire and data at rest encryption Page 44
  • 39. Page45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Data Protection HDP allows you to apply data protection policy at different layers across the Hadoop stack Layer What? How ? Storage and Access Encrypt data while it is at rest Partners, HDFS Tech Preview, Hbase encryption, OS level encrypt, Transmission Encrypt data as it moves Supported from HDP 2.1
  • 40. Page49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Transparent Data Encryption (TDE) in 2.2 • Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop • End-to-end: data can be both encrypted and decrypted by the clients • Encryption/decryption using the usual HDFS functions from the client • No need to requiring to change user application code • No need to store data encryption keys on HDFS itself • No need to unencrypted data. • Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also encrypted on the wire while being transmitted. • HDFS file encryption/decryption is transparent to its client • users can read/write files to/from encryption zone as long they have the permission to access it • Depends on installing a Key Management Server
  • 41. Page53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Transparent Data Encryption (TDE) in 2.2 • Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop • End-to-end: data can be both encrypted and decrypted by the clients • Encryption/decryption using the usual HDFS functions from the client • No need to requiring to change user application code • No need to store data encryption keys on HDFS itself • No need to unencrypted data. • Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also encrypted on the wire while being transmitted. • HDFS file encryption/decryption is transparent to its client • users can read/write files to/from encryption zone as long they have the permission to access it • Depends on installing a Key Management Server
  • 42. Page54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDFS Transparent Data Encryption (TDE) - Steps • Install and run KMS on top of HDP 2.2 • Change HDFS params via Ambari • Create encryption key • hadoop key create key1 -size 256 • hadoop key list –metadata • Create an encryption zone using the key • hdfs dfs -mkdir /zone1 • hdfs crypto -createZone -keyName key1 /zone1 • hdfs –listZones – http://hortonworks.com/kb/hdfs-transparent-data-encryption/
  • 43. Page55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You