SlideShare a Scribd company logo
Getter May. 10
以 Kubernetes 部屬
Spark 大數據計算環境
Who am I?
● Getter (楊曜佑)
○ inwinstack RD(Ready to Die) engineer
○ OpenStack integration & Operation
○ K8S Beginner
Why use K8S?
User
We need
a Big Data
solution!!
Okay….
About Big Data Solution
● Famous management tool -- Cloudera
○ Too big
○ Too difficult
○ User does not want it (Most Important)
● Famous container management tool -- K8S
○ Small
○ Simple
○ User want it
Why use Spark?
Basic Hadoop MapReduce Compoment
● YARN
○ NodeManager
○ ResourceManager
● HDFS
○ NameNode
○ DataNode
Basic Spark Compoment
● Master
● Slave
● Storage
Spark on K8S Architecture
Spark on K8S Architecture
● https://github.com/kubernetes/examples/tree/m
aster/staging/spark
○ spark-master-controller
○ spark-master-service
○ spark-work-controller
○ spark-ui-proxy-controller
○ spark-ui-proxy-service
Spark on K8S Architecture
Spark on K8S Architecture
● Only one master
● Using nodeAffinity to avoid Worker and Master
same node
● Using podAntiAffinity to ensure each node have
only one worker
About storage
● HDFS
● Persistent Volumes
○ iSCSI
○ NFS
○ CephFS
○ RBD
○ Etc...
Environment
● 3 node
● K8S version v1.9.0
○ kubespray
○ calico
● Spark version 2.2.0
Simple performance compare
● https://codait.github.io/spark-bench/ -- SparkPI
○ slices: 10000
■ Spark on K8S
■ Spark standalone
● Spark-example -- WordCount
○ Input file: 3G
■ Spark on K8S with NFS
■ Spark standalone with NFS
Offical support spark 2.3.0 on
K8S
How it works
How it works
$ bin/spark-submit 
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> 
--deploy-mode cluster 
--name spark-pi 
--class org.apache.spark.examples.SparkPi 
--conf spark.executor.instances=5 
--conf spark.kubernetes.container.image=<spark-image> 
local:///path/to/examples.jar
Currently experimental...
● Client mode is not currently supported.
● Future Work
○ PySpark
○ R
○ Dynamic Executor Scaling
○ Local File Dependency Management
○ Spark Application Management
○ Job Queues and Resource Management
www.inwinstack.com
Thank You!
迎 棧 科 技 股 份 有 限 公 司

More Related Content

What's hot (20)

PDF
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Nicolas Brousse
 
PDF
A Container Stack for Openstack - OpenStack Silicon Valley
Stephen Gordon
 
PDF
How to Integrate Kubernetes in OpenStack
Meng-Ze Lee
 
PDF
Deploying openstack using ansible
openstackindia
 
PDF
Cloud Native User Group: Shift-Left Testing IaC With PaC
smalltown
 
PDF
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
Sean Cohen
 
PDF
Intro to Kubernetes
matthewbrahms
 
PDF
Introduction to Docker and Monitoring with InfluxData
InfluxData
 
PPTX
Introduction to Container Storage Interface (CSI)
Idan Atias
 
PDF
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
NETWAYS
 
PPTX
Containers and CloudStack
ShapeBlue
 
PDF
Kubernetes on the Edge / 在邊緣的K8S
Yi-Fu Ciou
 
PDF
How Kubernetes make OpenStack & Ceph better
TeK Charnsilp Chinprasert
 
PDF
Google container engine (GKE)
Md. Sadhan Sarker
 
PDF
Implementing Progressive Delivery with Your Team (by Leigh Capili)
Weaveworks
 
PDF
從Google cloud看kubernetes服務
inwin stack
 
PDF
How to manage Kubernetes at scale with just git
Weaveworks
 
PDF
Kubernetes User Group: 維運 Kubernetes 的兩三事
smalltown
 
PDF
GKE Tip Series - Usage Metering
Sreenivas Makam
 
PDF
Kubernetes scheduling and QoS
Cloud Technology Experts
 
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Nicolas Brousse
 
A Container Stack for Openstack - OpenStack Silicon Valley
Stephen Gordon
 
How to Integrate Kubernetes in OpenStack
Meng-Ze Lee
 
Deploying openstack using ansible
openstackindia
 
Cloud Native User Group: Shift-Left Testing IaC With PaC
smalltown
 
Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph
Sean Cohen
 
Intro to Kubernetes
matthewbrahms
 
Introduction to Docker and Monitoring with InfluxData
InfluxData
 
Introduction to Container Storage Interface (CSI)
Idan Atias
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
NETWAYS
 
Containers and CloudStack
ShapeBlue
 
Kubernetes on the Edge / 在邊緣的K8S
Yi-Fu Ciou
 
How Kubernetes make OpenStack & Ceph better
TeK Charnsilp Chinprasert
 
Google container engine (GKE)
Md. Sadhan Sarker
 
Implementing Progressive Delivery with Your Team (by Leigh Capili)
Weaveworks
 
從Google cloud看kubernetes服務
inwin stack
 
How to manage Kubernetes at scale with just git
Weaveworks
 
Kubernetes User Group: 維運 Kubernetes 的兩三事
smalltown
 
GKE Tip Series - Usage Metering
Sreenivas Makam
 
Kubernetes scheduling and QoS
Cloud Technology Experts
 

More from inwin stack (20)

PDF
Migrating to Cloud Native Solutions
inwin stack
 
PDF
Cloud Native 下的應用網路設計
inwin stack
 
PDF
當電子發票遇見 Google Cloud Function
inwin stack
 
PDF
運用高效、敏捷全新平台極速落實雲原生開發
inwin stack
 
PDF
The last mile of digital transformation AI大眾化:數位轉型的最後一哩
inwin stack
 
PDF
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
inwin stack
 
PDF
An Open, Open source way to enable your Cloud Native Journey
inwin stack
 
PDF
維運Kubernetes的兩三事
inwin stack
 
PDF
Serverless framework on kubernetes
inwin stack
 
PDF
Train.IO 【第六期-OpenStack 二三事】
inwin stack
 
PDF
Setup Hybrid Clusters Using Kubernetes Federation
inwin stack
 
PDF
基於 K8S 開發的 FaaS 專案 - riff
inwin stack
 
PPTX
使用 Prometheus 監控 Kubernetes Cluster
inwin stack
 
PDF
Extend the Kubernetes API with CRD and Custom API Server
inwin stack
 
PPTX
Distributed tensorflow on kubernetes
inwin stack
 
PDF
Build your own kubernetes apiserver and resource type
inwin stack
 
PDF
Virtualization inside kubernetes
inwin stack
 
PDF
利用K8S實現高可靠應用
inwin stack
 
PDF
Build the Blockchain as service (BaaS) Using Ethereum on Kubernetes
inwin stack
 
PDF
How to integrate Kubernetes in OpenStack: You need to know these project
inwin stack
 
Migrating to Cloud Native Solutions
inwin stack
 
Cloud Native 下的應用網路設計
inwin stack
 
當電子發票遇見 Google Cloud Function
inwin stack
 
運用高效、敏捷全新平台極速落實雲原生開發
inwin stack
 
The last mile of digital transformation AI大眾化:數位轉型的最後一哩
inwin stack
 
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
inwin stack
 
An Open, Open source way to enable your Cloud Native Journey
inwin stack
 
維運Kubernetes的兩三事
inwin stack
 
Serverless framework on kubernetes
inwin stack
 
Train.IO 【第六期-OpenStack 二三事】
inwin stack
 
Setup Hybrid Clusters Using Kubernetes Federation
inwin stack
 
基於 K8S 開發的 FaaS 專案 - riff
inwin stack
 
使用 Prometheus 監控 Kubernetes Cluster
inwin stack
 
Extend the Kubernetes API with CRD and Custom API Server
inwin stack
 
Distributed tensorflow on kubernetes
inwin stack
 
Build your own kubernetes apiserver and resource type
inwin stack
 
Virtualization inside kubernetes
inwin stack
 
利用K8S實現高可靠應用
inwin stack
 
Build the Blockchain as service (BaaS) Using Ethereum on Kubernetes
inwin stack
 
How to integrate Kubernetes in OpenStack: You need to know these project
inwin stack
 
Ad

Recently uploaded (20)

PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PDF
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
PPTX
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
Java 25 and Beyond - A Roadmap of Innovations
Ana-Maria Mihalceanu
 
Practical Applications of AI in Local Government
OnBoard
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Ad

以 Kubernetes 部屬 Spark 大數據計算環境

  • 1. Getter May. 10 以 Kubernetes 部屬 Spark 大數據計算環境
  • 2. Who am I? ● Getter (楊曜佑) ○ inwinstack RD(Ready to Die) engineer ○ OpenStack integration & Operation ○ K8S Beginner
  • 4. User We need a Big Data solution!! Okay….
  • 5. About Big Data Solution ● Famous management tool -- Cloudera ○ Too big ○ Too difficult ○ User does not want it (Most Important) ● Famous container management tool -- K8S ○ Small ○ Simple ○ User want it
  • 7. Basic Hadoop MapReduce Compoment ● YARN ○ NodeManager ○ ResourceManager ● HDFS ○ NameNode ○ DataNode
  • 8. Basic Spark Compoment ● Master ● Slave ● Storage
  • 9. Spark on K8S Architecture
  • 10. Spark on K8S Architecture ● https://github.com/kubernetes/examples/tree/m aster/staging/spark ○ spark-master-controller ○ spark-master-service ○ spark-work-controller ○ spark-ui-proxy-controller ○ spark-ui-proxy-service
  • 11. Spark on K8S Architecture
  • 12. Spark on K8S Architecture ● Only one master ● Using nodeAffinity to avoid Worker and Master same node ● Using podAntiAffinity to ensure each node have only one worker
  • 13. About storage ● HDFS ● Persistent Volumes ○ iSCSI ○ NFS ○ CephFS ○ RBD ○ Etc...
  • 14. Environment ● 3 node ● K8S version v1.9.0 ○ kubespray ○ calico ● Spark version 2.2.0
  • 15. Simple performance compare ● https://codait.github.io/spark-bench/ -- SparkPI ○ slices: 10000 ■ Spark on K8S ■ Spark standalone ● Spark-example -- WordCount ○ Input file: 3G ■ Spark on K8S with NFS ■ Spark standalone with NFS
  • 16. Offical support spark 2.3.0 on K8S
  • 18. How it works $ bin/spark-submit --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=<spark-image> local:///path/to/examples.jar
  • 19. Currently experimental... ● Client mode is not currently supported. ● Future Work ○ PySpark ○ R ○ Dynamic Executor Scaling ○ Local File Dependency Management ○ Spark Application Management ○ Job Queues and Resource Management
  • 20. www.inwinstack.com Thank You! 迎 棧 科 技 股 份 有 限 公 司