SlideShare a Scribd company logo
Multi-Cluster Live Synchronization
with Kerberos Federated Hadoop
張雅芳 Mammi Chang
@ 2015 Taiwan HadoopCon
Who am I ?
• Mammi Chang 張雅芳
• Sr. Engineer, SPN, Trend Micro
• SPN Hadoop Cluster Administrator
• DevOps on Hadoop ecosystem and AWS
• 2014 HadoopCon Speaker
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Original
Data
Center
New
Data
Center
TMH7TMH6
service
Data SyncData Sync
This is a story of move …
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Data Sync
Data Sync
Data Sync
Data
SynchronizationData synchronization is the process of establishing consistency among data from a
source to a target data storage and vice versa and the continuous harmonization of the
data over time.
- From wikipedia “Data synchronization”
One-way file synchronization
 Updated files copied from source to destination
Two-way file synchronization
 Updated files are copied in both directories
 Dropbox, SafeSync, etc
Linux One-Way File Synchronization
$ cp fileA fileB
$ scp ./directory/my_file
mammi@198.167.0.3:/home/mammi/
$ rsync -avP /source/data /destination/
Hadoop One-Way File Synchronization
$ hadoop fs -cp /user/mammi/file1 /user/mammi/dir/
$ hadoop distcp hdfs://cluster1/file
hdfs://cluster2/file
#TrendInsight
Hadoop Data Synchronization
DistCp with the same
Hadoop version is trivial.
Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2
$ hadoop distcp hdfs://cluster1_nn:8020/test
hdfs://cluster2_nn:8020/test
Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2
$ hadoop distcp hdfs://cluster2_nn:8020/test
hdfs://cluster1_nn:8020/test
DistCp with the same
Hadoop version is trivial.
different
a little bit tricky
Oops …
[root@tw-spnhadoop1 hadooppet]# hadoop distcp hdfs://cluster1/test hdfs://krb-1.spn.lab.trendnet.org:8020/test
15/01/22 15:11:44 INFO tools.DistCp: srcPaths=[hdfs://cluster1/test]
15/01/22 15:11:44 INFO tools.DistCp: destPath=hdfs://krb-1.spn.lab.trendnet.org:8020/test
15/01/22 15:11:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 381 for hdfs on ha-hdfs:cluster1
15/01/22 15:11:45 INFO security.TokenCache: Got dt for hdfs://cluster1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster1, Ident:
(HDFS_DELEGATION_TOKEN token 381 for hdfs)
15/01/22 15:11:46 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw-
spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null):
org.apache.hadoop.ipc.RPC$VersionMismatch
15/01/22 15:11:46 INFO security.UserGroupInformation: Initiating logout for hdfs/tw-spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM
15/01/22 15:11:46 INFO security.UserGroupInformation: Initiating re-login for hdfs/tw-spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM
15/01/22 15:11:50 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw-
spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null):
org.apache.hadoop.ipc.RPC$VersionMismatch
15/01/22 15:11:50 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds
before.
15/01/22 15:11:53 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw-
spnhadoop1.spn.tw.trendnet.org@ISPN.TRENDMICRO.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null):
org.apache.hadoop.ipc.RPC$VersionMismatch
Apache Hadoop – 2.0
Cluster1
Apache Hadoop – 2.6
Cluster2
$ hadoop distcp hftp://cluster1_nn:50070/test
hdfs://cluster2_nn:8020/test
HftpFileSystem is a read-only FileSystem, so DistCp
must be run on the destination cluster
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp ????://TMH6_NN:????/test
hdfs://TMH7_NN:8020/test
CDH Based Apache Based
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp hftp://TMH6_NN:50070/test
hdfs://TMH7_NN:8020/test
CDH Based Apache Based
Only support data sync from TMH6 to TMH7
DistCp with different Hadoop
version is a little bit tricky
plus kerberos security
annoying !!
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp ????://TMH6_NN:XXXX/test
????://TMH7_NN:XXXX/test
DistCp Data Copy Matrix:
HDP1/HDP2 to HDP2
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin-
guide/content/distcp-table.html
Webhdfs is a HTTP REST API
supports the complete
FileSystem interface for HDFS
DistCp Data Copy Matrix:
TMH6/TMH7 to TMH6/TMH7
TMH6
TMH7
insecure
secure
hdfs
hftp
webhdfs
2
TMH6 Cluster1 TMH7 Cluster2
$ hadoop distcp webhdfs://TMH6_NN:8020/test
webhdfs://TMH7_NN:8020/test
Hadoop Security with Kerberos
Kerberos is a computer
network authentication protocol which
works on the basis of 'tickets' to
allow nodes communicating over a non-
secure network to prove their identity to
one another in a secure manner
- From wikipedia “Kerberos_(Protocol)”
REALM – CLUSTER.DOMAIN.COM
Kerberos Negotiation
KDC
(Key Distributed Center)
TGT
(Ticket-Granting Ticket)
KDC
Client
Hadoop
Servers
Msg3 :
Authenticator, TGT
Msg4 : client/server ticket
Msg1 : client login KDC
Msg2 : client TGT
Msg5 : Authenticator, ticket
Msg6 : time auth
REALM – CLUSTER2.DOMAIN.COM
Kerberos Cross-Realm authenticate
REALM – CLUSTER1.DOMAIN.COM
KDC
Client
Hadoop
Servers
Msg3 : Authenticator, TGT
Msg4 : client/server ticket
Msg1 : client login KDC
Msg2 : client TGT
Msg5 : Authenticator, ticket
Msg6 : time auth
KDC
Kerberos Federation for Hadoop
Kerberos Setting
• Set different REALM in
each cluster’s KDC
• Add both cluster’s kerberos
information to configs
• Add federated kerberos
principal to both KDC DB
• Restart kerberos services
Hadoop Setting
• Add Hadoop configurations
• Make sure both cluster
nodes can recognize each
other
• Restart necessary Hadoop
services
Multi-Cluster Kerberos Federation
Cluster1
•Set different REALM
in each cluster’s KDC
•Add all other cluster’s
kerberos information
to configuration
•Add all federated
kerberos principal to
KDC DB
•Add Hadoop
configurations
•Make sure all cluster
nodes can recognize
each others
•Restart necessary
services
Cluster2
•Set different REALM
in each cluster’s KDC
•Add all other cluster’s
kerberos information
to configuration
•Add all federated
kerberos principal to
KDC DB
•Add Hadoop
configurations
•Make sure all cluster
nodes can recognize
each others
•Restart necessary
services
…
•…
Cluster N
•Set different REALM
in each cluster’s KDC
•Add all other cluster’s
kerberos information
to configuration
•Add all federated
kerberos principal to
KDC DB
•Add Hadoop
configurations
•Make sure all cluster
nodes can recognize
each others
•Restart necessary
services
DistCp with different Hadoop version
plus kerberos federation
is annoying !!
in cross DC multi-cluster
not easy.
Done!!
DistCp with different Hadoop
version plus kerberos federation in
cross DC mult-clusters is not easy
at all.
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Two-way keberos
federation link
Data Sync
Data Sync
Data Sync
#TrendInsight
More Than Functionality …
Issues
• Computing resource
• Zero-downtime
• Schedule limitation
• Network bandwidth
Computing Resource
• Principle
– Do not have production service impact when
many DistCp jobs running
• Strategy
– Run distcp on Staging Env. Instand of Production
Env.
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Two-way keberos
federation link
$ hadoop distcp
webhdfs://TMH6_PROD_NN:8020/test
webhdfs://TMH7_PROD_NN:8020/test
Data Sync
Data flow
Production
TMH6 TMH7
TMH6 TMH7
Original
Data
Center
New
Data
Center
Production
Staging Staging
Two-way keberos
federation link
$ hadoop distcp
webhdfs://TMH6_PROD_NN:8020/test
webhdfs://TMH7_PROD_NN:8020/test
Data Sync
Data flow
Zero-downtime
• Principle
– Do not have Production Env. downtime
• Strategy
– Change KDC REALM in Staging only
– Rolling restart services
Schedule Limitation
• Principle
– Provide minimum dataset that fulfill production
services requirement
• Strategy
– Divide dataset into cold data and hot data
– All necessary hot data need to be ready before
service move to new DC
#TrendInsight
Lesson Learn
Automation is vital !!!
• Automated CI tests on such complex and
repeated tasks
– save your life time
– prevent plenty of human errors
Customization is
necessary
• Home made distcp running script with
error handling
• Setting permission by real case
Just try it
• Survey is important but sometimes it
cannot totally solve your problem
#TrendInsight
Thank you
QUESTION?
#TrendInsight
Backups
Kerberos Cross-Realm Federation
• Set different REALM in each cluster’s KDC
• Add both cluster’s kerberos information to configs
• Add federated kerberos principal to both KDC DB
• Add Hadoop configurations
• Make sure both cluster nodes can recognize each
other
• Restart necessary services
Set different REAML in each cluster’s KDC
Cluster1 krb5.conf
[realms]
CLUSTER1.DOMAIN.COM = {
kdc = cluster1_kdc_master:88
kdc = cluster1_kdc_slave:88
admin_server = cluster1_kdc_master:749
}
[domain_realm]
cluster1.domain.com = CLUSTER1. DOMAIN.COM
.cluster1.domain.com = CLUSTER1. DOMAIN.COM
Cluster2 krb5.conf
[realms]
CLUSTER2.DOMAIN.COM = {
kdc = cluster2_kdc_master:88
kdc = cluster2_kdc_slave:88
admin_server = cluster2_kdc_master:749
}
[domain_realm]
cluster2.domain.com = CLUSTER2. DOMAIN.COM
.cluster2.domain.com = CLUSTER2. DOMAIN.COM
Add both cluster’s kerberos information to krb5.conf
Both Cluster1 and Cluster2 krb5.conf
[realms]
CLUSTER1.DOMAIN.COM = {
kdc = cluster1_kdc_master:88
kdc = cluster1_kdc_slave:88
admin_server = cluster1_kdc_master:749
}
CLUSTER2.DOMAIN.COM = {
kdc = cluster2_kdc_master:88
kdc = cluster2_kdc_slave:88
admin_server = cluster2_kdc_master:749
}
[domain_realm]
cluster1.domain.com = CLUSTER1. DOMAIN.COM
.cluster1.domain.com = CLUSTER1. DOMAIN.COM
cluster2.domain.com = CLUSTER2. DOMAIN.COM
.cluster2.domain.com = CLUSTER2. DOMAIN.COM
Add federated kerberos principal to both KDC DB
$ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/
CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM
WARNING: no policy specified for krbtgt/CLUSTER1.DOMAIN.COM@ CLUSTER2.DOMAIN.COM; defaulting to
no policy Enter password for principal "krbtgt/ CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM ": //
123456
Re-enter password for principal "krbtgt/CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM": // 123456
Principal "krbtgt/CLUSTER1.DOMAIN.COM@CLUSTER2.DOMAIN.COM" created.
$ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal"
krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM
WARNING: no policy specified for krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM; defaulting to no
policy Enter password for principal "krbtgt/CLUSTER2.DOMAIN.COM @CLUSTER1.DOMAIN.COM ": // 654321
Re-enter password for principal "krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM ": // 654321
Principal "krbtgt/CLUSTER2.DOMAIN.COM@CLUSTER1.DOMAIN.COM " created.
use the same password for a principal to make sure the encryption key is the same
Add Hadoop Configuration
core-site.xml
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](^.*@CLUSTER.DOMAIN.
COM$)s/^(.*)@CLUSTER.DOMAIN.COM$/$1/g
RULE:[2:$1@$0](^.*@CLUSTER.DOMAIN.
COM$)s/^(.*)@CLUSTER.DOMAIN.COM$/$1/g
DEFAULT
</value>
</property>
hdfs-site.xml
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
Verify the setting of rule
hadoop org.apache.hadoop.security.HadoopKerberosName mapred/machine.cluster.domain.com@CLUSTER.DOMAIN.COM
Name: mapred/machine.cluster.domain.com@CLUSTER.DOMAIN.COM to mapred
Make sure both cluster nodes can recognize each other
• /etc/hosts for both cluster1 and cluster2 nodes
10.1.145.1 machine1.cluster1.domain.com
10.1.145.2 machine2.cluster1.domain.com
10.1.145.3 machine3.cluster1.domain.com
10.1.144.1 machine1.cluster2.domain.com
10.1.144.2 machine2.cluster2.domain.com
10.1.144.3 machine3.cluster2.domain.com
Restart necessary services
• KDC server
– service krb5kdc restart
– service kadmin restart
• Namenodes, Datanodes
– service hadoop-hdfs-namenode restart
– servcie hadoop-hdfs-datanode restart
Ad

More Related Content

What's hot (20)

CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
DataWorks Summit/Hadoop Summit
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
ha_module5
ha_module5ha_module5
ha_module5
Gurmukh Singh
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
Dave Holland
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
Peter Clapham
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
Yan Wang
 
YARN
YARNYARN
YARN
Alex Moundalexis
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Biju Nair
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
DataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
Cloudera, Inc.
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
tcurdt
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
Dave Holland
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
Yan Wang
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Biju Nair
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
DataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
Cloudera, Inc.
 

Viewers also liked (20)

The Angela Apartments @ Malate, The Heritage
The Angela  Apartments @ Malate, The HeritageThe Angela  Apartments @ Malate, The Heritage
The Angela Apartments @ Malate, The Heritage
Evangeline Yia
 
Soldagem 2009 2-emi
Soldagem 2009 2-emiSoldagem 2009 2-emi
Soldagem 2009 2-emi
Tadeu Granato
 
Prdc2012
Prdc2012Prdc2012
Prdc2012
Yusuke Shimizu
 
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
Swiss Big Data User Group
 
Cluster Housing in a Cultural Park on the Coeur d'Alene Reservation
Cluster Housing in a Cultural Park on the Coeur d'Alene ReservationCluster Housing in a Cultural Park on the Coeur d'Alene Reservation
Cluster Housing in a Cultural Park on the Coeur d'Alene Reservation
Joshua Arnold
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
Chicago Hadoop Users Group
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
Athemaster Co., Ltd.
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
Edureka!
 
DevOps Overview
DevOps OverviewDevOps Overview
DevOps Overview
Omri Spector
 
Dev ops
Dev opsDev ops
Dev ops
Shoaib Shaukat
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
Hortonworks
 
Cluster Housing
Cluster HousingCluster Housing
Cluster Housing
adam david
 
Designing Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles PatternDesigning Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles Pattern
Puppet
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
saba khan
 
Housing Presentation
Housing Presentation Housing Presentation
Housing Presentation
Dhanya Pravin
 
Aranya Low Cost Housing
Aranya Low Cost HousingAranya Low Cost Housing
Aranya Low Cost Housing
Ankita Kolamkar
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Jewel Refran
 
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup ParisKudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
نهاد مبارك
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
Sonatype
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
Vitthal Gogate
 
The Angela Apartments @ Malate, The Heritage
The Angela  Apartments @ Malate, The HeritageThe Angela  Apartments @ Malate, The Heritage
The Angela Apartments @ Malate, The Heritage
Evangeline Yia
 
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
Swiss Big Data User Group
 
Cluster Housing in a Cultural Park on the Coeur d'Alene Reservation
Cluster Housing in a Cultural Park on the Coeur d'Alene ReservationCluster Housing in a Cultural Park on the Coeur d'Alene Reservation
Cluster Housing in a Cultural Park on the Coeur d'Alene Reservation
Joshua Arnold
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
Athemaster Co., Ltd.
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
Edureka!
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
Hortonworks
 
Cluster Housing
Cluster HousingCluster Housing
Cluster Housing
adam david
 
Designing Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles PatternDesigning Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles Pattern
Puppet
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
saba khan
 
Housing Presentation
Housing Presentation Housing Presentation
Housing Presentation
Dhanya Pravin
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
Sonatype
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
Vitthal Gogate
 
Ad

Similar to HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop (20)

To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
Sergey Dzyuban
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Big-Data-as-a-Service (BDaaS) Meetup
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
Kellyn Pot'Vin-Gorman
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
Vishal Biyani
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Kashif Khan
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
Kellyn Pot'Vin-Gorman
 
E2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane ThirionE2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane Thirion
sthirion
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !! Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Karthik Babu Sekar
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
Ted Jung
 
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipelineKubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeAcademy
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Fwdays
 
Copy Data Management for the DBA
Copy Data Management for the DBACopy Data Management for the DBA
Copy Data Management for the DBA
Kellyn Pot'Vin-Gorman
 
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Couchbase Chennai Meetup:  Developing with Couchbase- made easyCouchbase Chennai Meetup:  Developing with Couchbase- made easy
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Karthik Babu Sekar
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Docker, Inc.
 
TIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container worldTIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container world
The Incredible Automation Day
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
ACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
Stijn Wijndaele
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
Sergey Dzyuban
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Denver SQL Saturday The Next Frontier
Denver SQL Saturday The Next FrontierDenver SQL Saturday The Next Frontier
Denver SQL Saturday The Next Frontier
Kellyn Pot'Vin-Gorman
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
Vishal Biyani
 
E2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane ThirionE2E PVS Technical Overview Stephane Thirion
E2E PVS Technical Overview Stephane Thirion
sthirion
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld
 
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !! Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Karthik Babu Sekar
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
Ted Jung
 
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipelineKubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeAcademy
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Fwdays
 
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Couchbase Chennai Meetup:  Developing with Couchbase- made easyCouchbase Chennai Meetup:  Developing with Couchbase- made easy
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Karthik Babu Sekar
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Docker, Inc.
 
TIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container worldTIAD 2016 : Application delivery in a container world
TIAD 2016 : Application delivery in a container world
The Incredible Automation Day
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
ACA IT-Solutions
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
Stijn Wijndaele
 
Ad

Recently uploaded (20)

Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 

HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop

  • 1. Multi-Cluster Live Synchronization with Kerberos Federated Hadoop 張雅芳 Mammi Chang @ 2015 Taiwan HadoopCon
  • 2. Who am I ? • Mammi Chang 張雅芳 • Sr. Engineer, SPN, Trend Micro • SPN Hadoop Cluster Administrator • DevOps on Hadoop ecosystem and AWS • 2014 HadoopCon Speaker
  • 6. Data SynchronizationData synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. - From wikipedia “Data synchronization”
  • 7. One-way file synchronization  Updated files copied from source to destination Two-way file synchronization  Updated files are copied in both directories  Dropbox, SafeSync, etc
  • 8. Linux One-Way File Synchronization $ cp fileA fileB $ scp ./directory/my_file [email protected]:/home/mammi/ $ rsync -avP /source/data /destination/
  • 9. Hadoop One-Way File Synchronization $ hadoop fs -cp /user/mammi/file1 /user/mammi/dir/ $ hadoop distcp hdfs://cluster1/file hdfs://cluster2/file
  • 11. DistCp with the same Hadoop version is trivial.
  • 12. Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2 $ hadoop distcp hdfs://cluster1_nn:8020/test hdfs://cluster2_nn:8020/test
  • 13. Hadoop - 2.6 Cluster1 Hadoop – 2.6 Cluster2 $ hadoop distcp hdfs://cluster2_nn:8020/test hdfs://cluster1_nn:8020/test
  • 14. DistCp with the same Hadoop version is trivial. different a little bit tricky
  • 15. Oops … [root@tw-spnhadoop1 hadooppet]# hadoop distcp hdfs://cluster1/test hdfs://krb-1.spn.lab.trendnet.org:8020/test 15/01/22 15:11:44 INFO tools.DistCp: srcPaths=[hdfs://cluster1/test] 15/01/22 15:11:44 INFO tools.DistCp: destPath=hdfs://krb-1.spn.lab.trendnet.org:8020/test 15/01/22 15:11:45 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 381 for hdfs on ha-hdfs:cluster1 15/01/22 15:11:45 INFO security.TokenCache: Got dt for hdfs://cluster1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster1, Ident: (HDFS_DELEGATION_TOKEN token 381 for hdfs) 15/01/22 15:11:46 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw- [email protected] (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null): org.apache.hadoop.ipc.RPC$VersionMismatch 15/01/22 15:11:46 INFO security.UserGroupInformation: Initiating logout for hdfs/[email protected] 15/01/22 15:11:46 INFO security.UserGroupInformation: Initiating re-login for hdfs/[email protected] 15/01/22 15:11:50 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw- [email protected] (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null): org.apache.hadoop.ipc.RPC$VersionMismatch 15/01/22 15:11:50 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. 15/01/22 15:11:53 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs/tw- [email protected] (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(null): org.apache.hadoop.ipc.RPC$VersionMismatch
  • 16. Apache Hadoop – 2.0 Cluster1 Apache Hadoop – 2.6 Cluster2 $ hadoop distcp hftp://cluster1_nn:50070/test hdfs://cluster2_nn:8020/test HftpFileSystem is a read-only FileSystem, so DistCp must be run on the destination cluster
  • 17. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp ????://TMH6_NN:????/test hdfs://TMH7_NN:8020/test CDH Based Apache Based
  • 18. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp hftp://TMH6_NN:50070/test hdfs://TMH7_NN:8020/test CDH Based Apache Based Only support data sync from TMH6 to TMH7
  • 19. DistCp with different Hadoop version is a little bit tricky plus kerberos security annoying !!
  • 20. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp ????://TMH6_NN:XXXX/test ????://TMH7_NN:XXXX/test
  • 21. DistCp Data Copy Matrix: HDP1/HDP2 to HDP2 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin- guide/content/distcp-table.html Webhdfs is a HTTP REST API supports the complete FileSystem interface for HDFS
  • 22. DistCp Data Copy Matrix: TMH6/TMH7 to TMH6/TMH7 TMH6 TMH7 insecure secure hdfs hftp webhdfs 2
  • 23. TMH6 Cluster1 TMH7 Cluster2 $ hadoop distcp webhdfs://TMH6_NN:8020/test webhdfs://TMH7_NN:8020/test
  • 24. Hadoop Security with Kerberos Kerberos is a computer network authentication protocol which works on the basis of 'tickets' to allow nodes communicating over a non- secure network to prove their identity to one another in a secure manner - From wikipedia “Kerberos_(Protocol)”
  • 25. REALM – CLUSTER.DOMAIN.COM Kerberos Negotiation KDC (Key Distributed Center) TGT (Ticket-Granting Ticket) KDC Client Hadoop Servers Msg3 : Authenticator, TGT Msg4 : client/server ticket Msg1 : client login KDC Msg2 : client TGT Msg5 : Authenticator, ticket Msg6 : time auth
  • 26. REALM – CLUSTER2.DOMAIN.COM Kerberos Cross-Realm authenticate REALM – CLUSTER1.DOMAIN.COM KDC Client Hadoop Servers Msg3 : Authenticator, TGT Msg4 : client/server ticket Msg1 : client login KDC Msg2 : client TGT Msg5 : Authenticator, ticket Msg6 : time auth KDC
  • 27. Kerberos Federation for Hadoop Kerberos Setting • Set different REALM in each cluster’s KDC • Add both cluster’s kerberos information to configs • Add federated kerberos principal to both KDC DB • Restart kerberos services Hadoop Setting • Add Hadoop configurations • Make sure both cluster nodes can recognize each other • Restart necessary Hadoop services
  • 28. Multi-Cluster Kerberos Federation Cluster1 •Set different REALM in each cluster’s KDC •Add all other cluster’s kerberos information to configuration •Add all federated kerberos principal to KDC DB •Add Hadoop configurations •Make sure all cluster nodes can recognize each others •Restart necessary services Cluster2 •Set different REALM in each cluster’s KDC •Add all other cluster’s kerberos information to configuration •Add all federated kerberos principal to KDC DB •Add Hadoop configurations •Make sure all cluster nodes can recognize each others •Restart necessary services … •… Cluster N •Set different REALM in each cluster’s KDC •Add all other cluster’s kerberos information to configuration •Add all federated kerberos principal to KDC DB •Add Hadoop configurations •Make sure all cluster nodes can recognize each others •Restart necessary services
  • 29. DistCp with different Hadoop version plus kerberos federation is annoying !! in cross DC multi-cluster not easy. Done!!
  • 30. DistCp with different Hadoop version plus kerberos federation in cross DC mult-clusters is not easy at all.
  • 31. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Two-way keberos federation link Data Sync Data Sync Data Sync
  • 33. Issues • Computing resource • Zero-downtime • Schedule limitation • Network bandwidth
  • 34. Computing Resource • Principle – Do not have production service impact when many DistCp jobs running • Strategy – Run distcp on Staging Env. Instand of Production Env.
  • 35. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Two-way keberos federation link $ hadoop distcp webhdfs://TMH6_PROD_NN:8020/test webhdfs://TMH7_PROD_NN:8020/test Data Sync Data flow
  • 36. Production TMH6 TMH7 TMH6 TMH7 Original Data Center New Data Center Production Staging Staging Two-way keberos federation link $ hadoop distcp webhdfs://TMH6_PROD_NN:8020/test webhdfs://TMH7_PROD_NN:8020/test Data Sync Data flow
  • 37. Zero-downtime • Principle – Do not have Production Env. downtime • Strategy – Change KDC REALM in Staging only – Rolling restart services
  • 38. Schedule Limitation • Principle – Provide minimum dataset that fulfill production services requirement • Strategy – Divide dataset into cold data and hot data – All necessary hot data need to be ready before service move to new DC
  • 40. Automation is vital !!! • Automated CI tests on such complex and repeated tasks – save your life time – prevent plenty of human errors
  • 41. Customization is necessary • Home made distcp running script with error handling • Setting permission by real case
  • 42. Just try it • Survey is important but sometimes it cannot totally solve your problem
  • 45. Kerberos Cross-Realm Federation • Set different REALM in each cluster’s KDC • Add both cluster’s kerberos information to configs • Add federated kerberos principal to both KDC DB • Add Hadoop configurations • Make sure both cluster nodes can recognize each other • Restart necessary services
  • 46. Set different REAML in each cluster’s KDC Cluster1 krb5.conf [realms] CLUSTER1.DOMAIN.COM = { kdc = cluster1_kdc_master:88 kdc = cluster1_kdc_slave:88 admin_server = cluster1_kdc_master:749 } [domain_realm] cluster1.domain.com = CLUSTER1. DOMAIN.COM .cluster1.domain.com = CLUSTER1. DOMAIN.COM Cluster2 krb5.conf [realms] CLUSTER2.DOMAIN.COM = { kdc = cluster2_kdc_master:88 kdc = cluster2_kdc_slave:88 admin_server = cluster2_kdc_master:749 } [domain_realm] cluster2.domain.com = CLUSTER2. DOMAIN.COM .cluster2.domain.com = CLUSTER2. DOMAIN.COM
  • 47. Add both cluster’s kerberos information to krb5.conf Both Cluster1 and Cluster2 krb5.conf [realms] CLUSTER1.DOMAIN.COM = { kdc = cluster1_kdc_master:88 kdc = cluster1_kdc_slave:88 admin_server = cluster1_kdc_master:749 } CLUSTER2.DOMAIN.COM = { kdc = cluster2_kdc_master:88 kdc = cluster2_kdc_slave:88 admin_server = cluster2_kdc_master:749 } [domain_realm] cluster1.domain.com = CLUSTER1. DOMAIN.COM .cluster1.domain.com = CLUSTER1. DOMAIN.COM cluster2.domain.com = CLUSTER2. DOMAIN.COM .cluster2.domain.com = CLUSTER2. DOMAIN.COM
  • 48. Add federated kerberos principal to both KDC DB $ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/ [email protected] WARNING: no policy specified for krbtgt/CLUSTER1.DOMAIN.COM@ CLUSTER2.DOMAIN.COM; defaulting to no policy Enter password for principal "krbtgt/ [email protected] ": // 123456 Re-enter password for principal "krbtgt/[email protected]": // 123456 Principal "krbtgt/[email protected]" created. $ kadmin.local: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/[email protected] WARNING: no policy specified for krbtgt/[email protected]; defaulting to no policy Enter password for principal "krbtgt/CLUSTER2.DOMAIN.COM @CLUSTER1.DOMAIN.COM ": // 654321 Re-enter password for principal "krbtgt/[email protected] ": // 654321 Principal "krbtgt/[email protected] " created. use the same password for a principal to make sure the encryption key is the same
  • 50. Make sure both cluster nodes can recognize each other • /etc/hosts for both cluster1 and cluster2 nodes 10.1.145.1 machine1.cluster1.domain.com 10.1.145.2 machine2.cluster1.domain.com 10.1.145.3 machine3.cluster1.domain.com 10.1.144.1 machine1.cluster2.domain.com 10.1.144.2 machine2.cluster2.domain.com 10.1.144.3 machine3.cluster2.domain.com
  • 51. Restart necessary services • KDC server – service krb5kdc restart – service kadmin restart • Namenodes, Datanodes – service hadoop-hdfs-namenode restart – servcie hadoop-hdfs-datanode restart