以 Kubernetes 部屬 Spark 大數據計算環境

0 likes917 views

This document summarizes using Kubernetes to deploy a Spark big data computing environment. It discusses why Kubernetes is preferable to other solutions like Cloudera for managing Spark. The architecture of running Spark on Kubernetes is shown, with the Spark master and worker controllers. Performance is compared between Spark on Kubernetes and standalone Spark using the SparkPI and WordCount examples. Support for Spark 2.3.0 on Kubernetes is now official.

Technology

Getter May. 10
以 Kubernetes 部屬
Spark 大數據計算環境

Who am I?
● Getter (楊曜佑)
○ inwinstack RD(Ready to Die) engineer
○ OpenStack integration & Operation
○ K8S Beginner

User
We need
a Big Data
solution!!
Okay….

About Big Data Solution
● Famous management tool -- Cloudera
○ Too big
○ Too difficult
○ User does not want it (Most Important)
● Famous container management tool -- K8S
○ Small
○ Simple
○ User want it

Basic Hadoop MapReduce Compoment
● YARN
○ NodeManager
○ ResourceManager
● HDFS
○ NameNode
○ DataNode

Basic Spark Compoment
● Master
● Slave
● Storage

Spark on K8S Architecture
● https://github.com/kubernetes/examples/tree/m
aster/staging/spark
○ spark-master-controller
○ spark-master-service
○ spark-work-controller
○ spark-ui-proxy-controller
○ spark-ui-proxy-service

Spark on K8S Architecture
● Only one master
● Using nodeAffinity to avoid Worker and Master
same node
● Using podAntiAffinity to ensure each node have
only one worker

About storage
● HDFS
● Persistent Volumes
○ iSCSI
○ NFS
○ CephFS
○ RBD
○ Etc...

Environment
● 3 node
● K8S version v1.9.0
○ kubespray
○ calico
● Spark version 2.2.0

Simple performance compare
● https://codait.github.io/spark-bench/ -- SparkPI
○ slices: 10000
■ Spark on K8S
■ Spark standalone
● Spark-example -- WordCount
○ Input file: 3G
■ Spark on K8S with NFS
■ Spark standalone with NFS

How it works
$ bin/spark-submit
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
--deploy-mode cluster
--name spark-pi
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=<spark-image>
local:///path/to/examples.jar

Currently experimental...
● Client mode is not currently supported.
● Future Work
○ PySpark
○ R
○ Dynamic Executor Scaling
○ Local File Dependency Management
○ Spark Application Management
○ Job Queues and Resource Management

www.inwinstack.com
Thank You!
迎棧科技股份有限公司

More Related Content

What's hot (20)

PDF

Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackNicolas Brousse

PDF

A Container Stack for Openstack - OpenStack Silicon ValleyStephen Gordon

PDF

How to Integrate Kubernetes in OpenStack Meng-Ze Lee

PDF

Deploying openstack using ansibleopenstackindia

PDF

Cloud Native User Group: Shift-Left Testing IaC With PaCsmalltown

PDF

Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and CephSean Cohen

PDF

Intro to Kubernetesmatthewbrahms

PDF

Introduction to Docker and Monitoring with InfluxDataInfluxData

PPTX

Introduction to Container Storage Interface (CSI)Idan Atias

PDF

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS

PPTX

Containers and CloudStackShapeBlue

PDF

Kubernetes on the Edge / 在邊緣的K8SYi-Fu Ciou

PDF

How Kubernetes make OpenStack & Ceph betterTeK Charnsilp Chinprasert

PDF

Google container engine (GKE)Md. Sadhan Sarker

PDF

Implementing Progressive Delivery with Your Team (by Leigh Capili)Weaveworks

PDF

從Google cloud看kubernetes服務inwin stack

PDF

How to manage Kubernetes at scale with just git Weaveworks

PDF

Kubernetes User Group: 維運 Kubernetes 的兩三事smalltown

PDF

GKE Tip Series - Usage MeteringSreenivas Makam

PDF

Kubernetes scheduling and QoSCloud Technology Experts

Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackNicolas Brousse

A Container Stack for Openstack - OpenStack Silicon ValleyStephen Gordon

How to Integrate Kubernetes in OpenStack Meng-Ze Lee

Deploying openstack using ansibleopenstackindia

Cloud Native User Group: Shift-Left Testing IaC With PaCsmalltown

Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and CephSean Cohen

Intro to Kubernetesmatthewbrahms

Introduction to Docker and Monitoring with InfluxDataInfluxData

Introduction to Container Storage Interface (CSI)Idan Atias

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS

Containers and CloudStackShapeBlue

Kubernetes on the Edge / 在邊緣的K8SYi-Fu Ciou

How Kubernetes make OpenStack & Ceph betterTeK Charnsilp Chinprasert

Google container engine (GKE)Md. Sadhan Sarker

Implementing Progressive Delivery with Your Team (by Leigh Capili)Weaveworks

從Google cloud看kubernetes服務inwin stack

How to manage Kubernetes at scale with just git Weaveworks

Kubernetes User Group: 維運 Kubernetes 的兩三事smalltown

GKE Tip Series - Usage MeteringSreenivas Makam

Kubernetes scheduling and QoSCloud Technology Experts

More from inwin stack (20)

PDF

Migrating to Cloud Native Solutionsinwin stack

PDF

Cloud Native 下的應用網路設計inwin stack

PDF

當電子發票遇見 Google Cloud Functioninwin stack

PDF

運用高效、敏捷全新平台極速落實雲原生開發inwin stack

PDF

The last mile of digital transformation AI大眾化：數位轉型的最後一哩inwin stack

PDF

整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案inwin stack

PDF

An Open, Open source way to enable your Cloud Native Journeyinwin stack

PDF

維運Kubernetes的兩三事inwin stack

PDF

Serverless framework on kubernetesinwin stack

PDF

Train.IO 【第六期－OpenStack 二三事】inwin stack

PDF

Setup Hybrid Clusters Using Kubernetes Federationinwin stack

PDF

基於 K8S 開發的 FaaS 專案 - riffinwin stack

PPTX

使用 Prometheus 監控 Kubernetes Cluster inwin stack

PDF

Extend the Kubernetes API with CRD and Custom API Serverinwin stack

PPTX

Distributed tensorflow on kubernetesinwin stack

PDF

Build your own kubernetes apiserver and resource typeinwin stack

PDF

Virtualization inside kubernetesinwin stack

PDF

利用K8S實現高可靠應用inwin stack

PDF

Build the Blockchain as service (BaaS) Using Ethereum on Kubernetesinwin stack

PDF

How to integrate Kubernetes in OpenStack: You need to know these projectinwin stack

Migrating to Cloud Native Solutionsinwin stack

Cloud Native 下的應用網路設計inwin stack

當電子發票遇見 Google Cloud Functioninwin stack

運用高效、敏捷全新平台極速落實雲原生開發inwin stack

The last mile of digital transformation AI大眾化：數位轉型的最後一哩inwin stack

整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案inwin stack

An Open, Open source way to enable your Cloud Native Journeyinwin stack

維運Kubernetes的兩三事inwin stack

Serverless framework on kubernetesinwin stack

Train.IO 【第六期－OpenStack 二三事】inwin stack

Setup Hybrid Clusters Using Kubernetes Federationinwin stack

基於 K8S 開發的 FaaS 專案 - riffinwin stack

使用 Prometheus 監控 Kubernetes Cluster inwin stack

Extend the Kubernetes API with CRD and Custom API Serverinwin stack

Distributed tensorflow on kubernetesinwin stack

Build your own kubernetes apiserver and resource typeinwin stack

Virtualization inside kubernetesinwin stack

利用K8S實現高可靠應用inwin stack

Build the Blockchain as service (BaaS) Using Ethereum on Kubernetesinwin stack

How to integrate Kubernetes in OpenStack: You need to know these projectinwin stack

Recently uploaded (20)

PPTX

Paycifi - Programmable Trust_Breakfast_PPTXTFinTech Belgium

PDF

''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...Fwdays

PDF

How to Visualize the Spatio-Temporal Data Using CesiumJSSANGHEE SHIN

PDF

99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...treyka

PDF

Unlocking FME Flow’s Potential: Architecture Design for Modern EnterprisesSafe Software

PDF

Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...Safe Software

PDF

Java 25 and Beyond - A Roadmap of InnovationsAna-Maria Mihalceanu

PPTX

Practical Applications of AI in Local GovernmentOnBoard

PDF

Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...treyka

PDF

Pipeline Industry IoT - Real Time Data MonitoringSafe Software

PPTX

Wondershare Filmora Crack Free Download 2025josanj305

PPTX

MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...Michele Kryston

PDF

Why aren't you using FME Flow's CPU Time?Safe Software

PPTX

Enabling the Digital Artisan – keynote at ICOCI 2025Alan Dix

PDF

How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdfBluechip Advanced Technologies

PDF

Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools CaseSafe Software

PDF

Proactive Server and System Monitoring with FME: Using HTTP and System Caller...Safe Software

PDF

Understanding AI Optimization AIO, LLMO, and GEOCoDigital

PPTX

Smart Factory Monitoring IIoT in Machine and Production Operations.pptxRejig Digital

PDF

Automating the Geo-Referencing of Historic Aerial Photography in FlandersSafe Software

Paycifi - Programmable Trust_Breakfast_PPTXTFinTech Belgium

''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...Fwdays

How to Visualize the Spatio-Temporal Data Using CesiumJSSANGHEE SHIN

99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...treyka

Unlocking FME Flow’s Potential: Architecture Design for Modern EnterprisesSafe Software

Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...Safe Software

Java 25 and Beyond - A Roadmap of InnovationsAna-Maria Mihalceanu

Practical Applications of AI in Local GovernmentOnBoard

Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...treyka

Pipeline Industry IoT - Real Time Data MonitoringSafe Software

Wondershare Filmora Crack Free Download 2025josanj305

MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...Michele Kryston

Why aren't you using FME Flow's CPU Time?Safe Software

Enabling the Digital Artisan – keynote at ICOCI 2025Alan Dix

How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdfBluechip Advanced Technologies

Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools CaseSafe Software

Proactive Server and System Monitoring with FME: Using HTTP and System Caller...Safe Software

Understanding AI Optimization AIO, LLMO, and GEOCoDigital

Smart Factory Monitoring IIoT in Machine and Production Operations.pptxRejig Digital

Automating the Geo-Referencing of Historic Aerial Photography in FlandersSafe Software

以 Kubernetes 部屬 Spark 大數據計算環境

1. Getter May. 10 以 Kubernetes 部屬 Spark 大數據計算環境

2. Who am I? ● Getter (楊曜佑) ○ inwinstack RD(Ready to Die) engineer ○ OpenStack integration & Operation ○ K8S Beginner

3. Why use K8S?

4. User We need a Big Data solution!! Okay….

5. About Big Data Solution ● Famous management tool -- Cloudera ○ Too big ○ Too difficult ○ User does not want it (Most Important) ● Famous container management tool -- K8S ○ Small ○ Simple ○ User want it

6. Why use Spark?

7. Basic Hadoop MapReduce Compoment ● YARN ○ NodeManager ○ ResourceManager ● HDFS ○ NameNode ○ DataNode

8. Basic Spark Compoment ● Master ● Slave ● Storage

9. Spark on K8S Architecture

10. Spark on K8S Architecture ● https://github.com/kubernetes/examples/tree/m aster/staging/spark ○ spark-master-controller ○ spark-master-service ○ spark-work-controller ○ spark-ui-proxy-controller ○ spark-ui-proxy-service

11. Spark on K8S Architecture

12. Spark on K8S Architecture ● Only one master ● Using nodeAffinity to avoid Worker and Master same node ● Using podAntiAffinity to ensure each node have only one worker

13. About storage ● HDFS ● Persistent Volumes ○ iSCSI ○ NFS ○ CephFS ○ RBD ○ Etc...

14. Environment ● 3 node ● K8S version v1.9.0 ○ kubespray ○ calico ● Spark version 2.2.0

15. Simple performance compare ● https://codait.github.io/spark-bench/ -- SparkPI ○ slices: 10000 ■ Spark on K8S ■ Spark standalone ● Spark-example -- WordCount ○ Input file: 3G ■ Spark on K8S with NFS ■ Spark standalone with NFS

16. Offical support spark 2.3.0 on K8S

17. How it works

18. How it works $ bin/spark-submit --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=<spark-image> local:///path/to/examples.jar

19. Currently experimental... ● Client mode is not currently supported. ● Future Work ○ PySpark ○ R ○ Dynamic Executor Scaling ○ Local File Dependency Management ○ Spark Application Management ○ Job Queues and Resource Management

20. www.inwinstack.com Thank You! 迎棧科技股份有限公司