SlideShare a Scribd company logo
Running secured Spark job in
Kubernetes compute cluster and
integrating with Kerberized HDFS
Joy Chakraborty
June 21, 2018
Email: joychak1@[yahoo/gmail].com]]
• To run Spark job on Elastic
Compute platform (Kubernetes)
accessing data stored in HDFS
Essentially what we will be doing -
3
Who am I ???
I learn and apply ….
- Design and write software for living
- for last 18 years …
4
Disclaimer
We will be just scratching the Surface !!!
Agenda
5
Kubernetes as Elastic Compute1
HDFS as secured distributed storage2
Configuring Spark to run in Kubernetes & accessing HDFS3
Demo - Build & setup the Kubernetes/Spark/HDFS environment4
6
Why Kubernetes?
7
Compute Requirements
8
Compute Requirements
• Support Elasticity
• Flexibility and variability in work load
• Process massive amount of data in parallel
• Support Reliability & Multitenancy
• Ease in Accessibility without compromising
Security
9
Distributed
Containerized
10
• Container Management
• Scheduling
• Resource Management
Distributed Containerized System
11
• Container Management
• Scheduling
• Resource Management
Kubernetes
Distributed Containerized System
12
what?
13
Kubernetes
An open-source system for automating deployment, scaling,
and management of containerized applications.
• Portable
• Extensible: modular, plug-able, compos-able
• Self-healing: auto-placement, auto-restart, auto-
replication, auto-scaling
• Latest Release: 1.8.13
• Community driven completely
• Written in GoLang
14
Kubernetes High Level Architecture
15
Kubernetes High Level Architecture
• The Kubernetes Master is a collection
of three processes that run on a single
node in your cluster, which is
designated as the master node.
• kube-apiserver
• kube-controller-manager
• kube-scheduler
• Each individual non-master node in
your cluster runs two processes –
• kubelet, which communicates with
the Kubernetes Master
• kube-proxy, a network proxy which
reflects Kubernetes networking
services on each node.
Kube-apiserver
Kube-Controller-manager
16
Kubernetes - basic
components
17
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
18
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
19
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
• RBAC
20
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
• RBAC
*** Custom Resource
21
Kubernetes Pod
• Pod is the basic building block of Kubernetes–
• the smallest and simplest unit in the Kubernetes object model
that you create or deploy
• represents a running process on your cluster.
• encapsulates an application container (or, in some cases,
multiple containers), storage resources, a unique network IP,
and options that govern how the container(s) should run
• Docker is the most common container runtime used in a Pod,
but Pods support other container runtimes as well
22
How to Interact with Kubernetes
23
How to Interact with Kubernetes
• REST API
• Curl or browser
• Command Line Interface
• Kubectl
• Kube Config
• Created during cluster
creation
• Programmatic
• GoLang, Python, Scala
24
How Secured
(kerberized) HDFS
works?
25
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Secured/Kerberized HDFS
26
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
2. Client-app requests Ticket
3. KDC sends TGT
1. Service
Principles/Keys
6. Sends Service Ticket and requests for Authentication
5. User Authenticated
using Service Principle/key
Retrieves
User roles/permissions
Secured/Kerberized (keytab) HDFS
27
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
2. Client-app requests Ticket
3. KDC sends TGT
1. Service
Principles/Keys
6. Sends Service Ticket and requests for Delegation Token
7. User Authenticated
using Service Principle/key
Retrieves
User roles/permissions
Secured/Kerberized (delegation token) HDFS
8. Name node sends delegation token
28
Spark + Kubernetes +
HDFS
29
Spark Running in Kubernetes
Kubernetes ClusterKube Master
30
Spark Running in Kubernetes
Kubernetes Cluster
Spark-Submit
(--master k8s://…)
Kube Master
31
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Spark-Submit
Spark
Master
Kube Master
32
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
33
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
34
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Kube Secret
Keytab
35
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Kube Secret
Keytab
36
Lets build the
Kubernetes/Spark/HDFS
environment !!!
37
What to use
• To install Kubernetes
• vagrant-kubeadm (https://github.com/c9s/vagrant-kubeadm)
• Run local Docker Hub (https://docs.docker.com/registry/deploying )
• Run Docker registry container and map localhost to
some ip (10.0.2.2)
• Build Spark Docker image from Spark download Dockerfile
under Kubernetes directory (spark-2.3.0-bin-hadoop2.7/ kubernetes)
• Publish the image to Docker hub
• Run Spark-submit
38
Demo !!!
Email: joychak1@[yahoo/gmail].com]
Thank You
39
Ad

More Related Content

What's hot (20)

Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
DataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
DataWorks Summit
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Databricks
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
DataWorks Summit
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
DataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
DataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
DataWorks Summit
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Databricks
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
DataWorks Summit
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
DataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 

Similar to Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS (20)

01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
VMUG IT
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
Chris McEniry
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes services
Rajesh Kolla
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Bitnami
 
Nodeless and serverless kubernetes
Nodeless and serverless kubernetesNodeless and serverless kubernetes
Nodeless and serverless kubernetes
Nills Franssens
 
Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev
Haufe-Lexware GmbH & Co KG
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
Martin Danielsson
 
Kubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanKubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-Hassan
Syed Murtaza Hassan
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
VMUG IT
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 
Kubermatic.pdf
Kubermatic.pdfKubermatic.pdf
Kubermatic.pdf
LibbySchulze
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
LibbySchulze
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-final
Michel Schildmeijer
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
EDB
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Redis Labs
 
Introduction to Virtual Kubelet
Introduction to Virtual KubeletIntroduction to Virtual Kubelet
Introduction to Virtual Kubelet
Mitchell Pronschinske
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
Anirudh Ramanathan
 
Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.
Jooho Lee
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)
Krishna-Kumar
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
VMUG IT
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
Chris McEniry
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes services
Rajesh Kolla
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Bitnami
 
Nodeless and serverless kubernetes
Nodeless and serverless kubernetesNodeless and serverless kubernetes
Nodeless and serverless kubernetes
Nills Franssens
 
Kubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanKubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-Hassan
Syed Murtaza Hassan
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
VMUG IT
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
LibbySchulze
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-final
Michel Schildmeijer
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
EDB
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Redis Labs
 
Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.
Jooho Lee
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)
Krishna-Kumar
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...
BookNet Canada
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...
BookNet Canada
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
Q1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor PresentationQ1 2025 Dropbox Earnings and Investor Presentation
Q1 2025 Dropbox Earnings and Investor Presentation
Dropbox
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 

Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS

  • 1. Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS Joy Chakraborty June 21, 2018 Email: joychak1@[yahoo/gmail].com]]
  • 2. • To run Spark job on Elastic Compute platform (Kubernetes) accessing data stored in HDFS Essentially what we will be doing -
  • 3. 3 Who am I ??? I learn and apply …. - Design and write software for living - for last 18 years …
  • 4. 4 Disclaimer We will be just scratching the Surface !!!
  • 5. Agenda 5 Kubernetes as Elastic Compute1 HDFS as secured distributed storage2 Configuring Spark to run in Kubernetes & accessing HDFS3 Demo - Build & setup the Kubernetes/Spark/HDFS environment4
  • 8. 8 Compute Requirements • Support Elasticity • Flexibility and variability in work load • Process massive amount of data in parallel • Support Reliability & Multitenancy • Ease in Accessibility without compromising Security
  • 10. 10 • Container Management • Scheduling • Resource Management Distributed Containerized System
  • 11. 11 • Container Management • Scheduling • Resource Management Kubernetes Distributed Containerized System
  • 13. 13 Kubernetes An open-source system for automating deployment, scaling, and management of containerized applications. • Portable • Extensible: modular, plug-able, compos-able • Self-healing: auto-placement, auto-restart, auto- replication, auto-scaling • Latest Release: 1.8.13 • Community driven completely • Written in GoLang
  • 14. 14 Kubernetes High Level Architecture
  • 15. 15 Kubernetes High Level Architecture • The Kubernetes Master is a collection of three processes that run on a single node in your cluster, which is designated as the master node. • kube-apiserver • kube-controller-manager • kube-scheduler • Each individual non-master node in your cluster runs two processes – • kubelet, which communicates with the Kubernetes Master • kube-proxy, a network proxy which reflects Kubernetes networking services on each node. Kube-apiserver Kube-Controller-manager
  • 17. 17 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume
  • 18. 18 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job
  • 19. 19 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job • RBAC
  • 20. 20 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job • RBAC *** Custom Resource
  • 21. 21 Kubernetes Pod • Pod is the basic building block of Kubernetes– • the smallest and simplest unit in the Kubernetes object model that you create or deploy • represents a running process on your cluster. • encapsulates an application container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run • Docker is the most common container runtime used in a Pod, but Pods support other container runtimes as well
  • 22. 22 How to Interact with Kubernetes
  • 23. 23 How to Interact with Kubernetes • REST API • Curl or browser • Command Line Interface • Kubectl • Kube Config • Created during cluster creation • Programmatic • GoLang, Python, Scala
  • 25. 25 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node Secured/Kerberized HDFS
  • 26. 26 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node 2. Client-app requests Ticket 3. KDC sends TGT 1. Service Principles/Keys 6. Sends Service Ticket and requests for Authentication 5. User Authenticated using Service Principle/key Retrieves User roles/permissions Secured/Kerberized (keytab) HDFS
  • 27. 27 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node 2. Client-app requests Ticket 3. KDC sends TGT 1. Service Principles/Keys 6. Sends Service Ticket and requests for Delegation Token 7. User Authenticated using Service Principle/key Retrieves User roles/permissions Secured/Kerberized (delegation token) HDFS 8. Name node sends delegation token
  • 29. 29 Spark Running in Kubernetes Kubernetes ClusterKube Master
  • 30. 30 Spark Running in Kubernetes Kubernetes Cluster Spark-Submit (--master k8s://…) Kube Master
  • 31. 31 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Spark-Submit Spark Master Kube Master
  • 32. 32 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master
  • 33. 33 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node
  • 34. 34 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node Kube Secret Keytab
  • 35. 35 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node Kube Secret Keytab
  • 37. 37 What to use • To install Kubernetes • vagrant-kubeadm (https://github.com/c9s/vagrant-kubeadm) • Run local Docker Hub (https://docs.docker.com/registry/deploying ) • Run Docker registry container and map localhost to some ip (10.0.2.2) • Build Spark Docker image from Spark download Dockerfile under Kubernetes directory (spark-2.3.0-bin-hadoop2.7/ kubernetes) • Publish the image to Docker hub • Run Spark-submit

Editor's Notes

  • #12: Artificial intelligence is intelligence exhibited by machines. In computer science, the field of AI defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal