SlideShare a Scribd company logo
How to build a tool for
operating Flink on Kubernetes
Andrea Medeghini
Software Engineer / Contractor
Which products are available?
Ververica dA Platform:
● Automated deployments
● Easy management of jobs
● Monitoring
● Logging (ELK)
● Trial version available
Are there free alternatives?
There are few projects on GitHub, however…
● They mainly focus on deployment
● They do not provide a complete solution
● They might not work for all use cases
Can we use Helm?
It’s good, but it doesn’t help with…
● Jobs and Savepoints
● Monitoring / Alerting
● Automatic Scaling
# helm install --name my-flink-cluster charts/flink
Wait for next Flink release?
Better integration with Kubernetes it’s coming:
● Reactive container mode
https://issues.apache.org/jira/browse/FLINK-10407
● Active Kubernetes integration
https://issues.apache.org/jira/browse/FLINK-9953
Shall we build our own tool?
It’s going to be challenging! Because…
● Flink is a distributed engine
● Flink is a stateful engine
● Jobs need to be packaged and uploaded
● Jobs need to be monitored to detect failures
● Resources need to be adjusted according to workload
Do we need an open source tool?
Everybody likes open source tools…
… how do we build one?
Overview of a Flink Cluster
● One or more JobManagers
(typically one)
● One or more TaskManagers
(typically many)
● One or more jobs packaged as
JAR files
● Storage for savepoints
Exploiting Kubernetes API
It’s all REST!
There are clients libraries…
… for many languages not only Go!
See Kubernetes Documentation:
https://kubernetes.io/docs/reference/
Control Kubernetes Programmatically
val jobmanagerStatefulSet = V1StatefulSet()
.metadata(jobmanagerMetadata)
.spec(
V1StatefulSetSpec()
.replicas(1)
.template(
V1PodTemplateSpec().spec(jobmanagerPodSpec).metadata(jobmanagerMetadata)
)
.updateStrategy(updateStrategy)
.serviceName("jobmanager")
.selector(jobmanagerSelector)
.addVolumeClaimTemplatesItem(persistentVolumeClaim)
)
api.createNamespacedStatefulSet(namespace, jobmanagerStatefulSet, null, null, null)
What resources do we need?
● StatefulSet for JobManager (1 replica)
● StatefulSet for TaskManager (N replicas)
● Services for JobManager (headless, NodePort, ...)
● PersistentVolumeClaims
● …
What configuration do we need?
● Set JOB_MANAGER_RPC_ADDRESS to JobManager service
● Set TASK_MANAGER_NUMBER_OF_TASK_SLOTS to 1
● Set memory limits of container higher than max heap
● Set CPU limits to sensible value
● Configure pod affinity to spread workload
● Expose relevant ports (usually only internally)
● Add sensible labels to identify resources
Run exec against the Job Manager
How does it work?
● Kubernetes Client for
managing clusters
● Exec for executing
commands in the
containers
Easy to implement but...
● It depends on commands installed in the container
● It seems too consuming in terms of resources (we need to
run a process inside the container for each operation)
● It doesn’t enforce any protocol (stdin/stdout)
Flink Monitoring API to the rescue!
Flink has a pretty useful REST API:
● Endpoints for managing jobs
● Endpoints for managing savepoints
● Endpoints for monitoring the cluster
Is there a client library? I am afraid not…
Create client using OpenAPI
I manually crafted an
OpenAPI specification file…
… It’s tedious but the
generated client works fine!
See Swagger Documentation:
https://swagger.io/docs/specification/about/
Swagger Editor and Code generator
/v1/jobs:
get:
operationId: getJobs
summary: Returns an overview over all jobs and their current state
responses:
'200':
description: |-
200 response
content:
application/json:
schema:
$ref: '#/components/schemas/JobIdsWithStatusOverview'
...
See full specification on GitHub:
https://github.com/nextbreakpoint/flink-client/blob/master/flink-openapi.yaml
Combine all in one application
We can combine the APIs:
● Kubernetes Client for
managing clusters
● Flink Client for
managing jobs
What are the limitations?
● Where does the client live?
● Still no monitoring or automatic scaling
● NodePort or Port Forward required for each
JobManager (for each Flink Cluster)
● Port Forward doesn’t work well with file
upload (there is a problem with timeout in
the Kubernetes Client for Java)
Run controller inside Kubernetes
What are the benefits?
● It can easily access
internal resources
● It runs with its own service
account
● It can monitor the clusters
● It can rescale the clusters
Better than before but...
● One port forward is still required
● Authorization is required for API
● It doesn’t follow best practises!
We need a Kubernetes Operator!
Everybody think we need Go, but…
… an Operator is like a pattern…
… and we can use any programming language!
Operator SDK for Go:
https://github.com/operator-framework/operator-sdk
Operator Pattern:
https://coreos.com/blog/introducing-operators.html
Custom Resource Definition
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: flinkclusters.beta.nextbreakpoint.com
spec:
group: beta.nextbreakpoint.com
versions:
- name: v1
served: true
storage: true
scope: Namespaced
names:
plural: flinkclusters
singular: flinkcluster
kind: FlinkCluster
shortNames:
- fc
/api/beta.nextbreakpoint.com/v1/namespaces/*/flinkclusters
# kubectl create -f flink-crd.yaml
# kubectl get crd
NAME AGE
flinkclusters.beta.nextbreakpoint.com 1d
…
Custom Objects
apiVersion: "beta.nextbreakpoint.com/v1"
kind: FlinkCluster
metadata:
name: test
spec:
clusterName: test
environment: test
pullSecrets: regcred
pullPolicy: Always
flinkImage: nextbreakpoint/flink:1.7.2-1
sidecarImage: flink-workshop-jobs:2
sidecarServiceAccount: flink-operator
sidecarClassName: com.nextbreakpoint.flink.jobs.TestJob
sidecarJarPath: /com.nextbreakpoint.flinkworkshop-1.0.0.jar
sidecarParallelism: 1
sidecarArguments:
- --BUCKET_BASE_PATH
- file:///var/tmp
# kubectl create -f cluster.yaml
# kubectl get flinkclusters
NAME AGE
test 4s
The Operator Loop
1. Receive updates of Custom Objects
2. Receive updates of StatefulSets,
Services, PVCs, …
3. Compare desired state to actual
state
4. Adjust current state to match
desired state
5. Repeat from 1
Run a Flink Operator
What are the benefits?
● It follows Kubernetes
best practises
● It runs with its own
service account
● We only need to create
cluster objects
Operator meets Controller
They can operate together:
● Use operator with CD pipeline
● Use controller for manual ops
● Use controller for monitoring
● Use controller for alerting
● Use controller for scaling
Time for a demo !
A preview of Flink K8S Toolbox:
● Easy installation
● Easy deployments
● Jobs management
● Cluster metrics
● Cluster scaling
Monitoring and Scaling
We can use Flink API for:
● Watching jobs status and
alerting when something is
broken
● Observing cluster metrics
and scaling cluster when
required
Checkpoints/Savepoints
We can use Flink API for:
● Monitoring checkpoints
● Managing savepoints
● Retrieving last savepoint
Continuous Delivery
We can use tools like Flux:
● Push changes into Git repo
● Changes are automatically
applied to resources
Flux (I haven’t actually tried it)
https://github.com/weaveworks/flux
Nice features to have...
● Pluggable alerting strategy
● Pluggable scaling strategy
● Web console
● Secure access
● Support for HA mode
● …
It’s all free!
Flink Kubernetes Toolbox:
https://github.com/nextbreakpoint/flink-k8s-toolbox
Related projects:
https://github.com/nextbreakpoint/flink-client
https://github.com/nextbreakpoint/flink-workshop
https://github.com/nextbreakpoint/kubernetes-playground
Fine.
Where to follow:
@AndreaMedeghini
nextbreakpoint.com
Ad

More Related Content

What's hot (17)

Brief intro to K8s controller and operator
Brief intro to K8s controller and operator Brief intro to K8s controller and operator
Brief intro to K8s controller and operator
Shang Xiang Fan
 
Ci with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgiumCi with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgium
Chris Adkin
 
Eclipse 2011 Hot Topics
Eclipse 2011 Hot TopicsEclipse 2011 Hot Topics
Eclipse 2011 Hot Topics
Lars Vogel
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Slawa Giterman
 
Import golang; struct microservice
Import golang; struct microserviceImport golang; struct microservice
Import golang; struct microservice
Giulio De Donato
 
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
ZongXian Shen
 
Jenkins, pipeline and docker
Jenkins, pipeline and docker Jenkins, pipeline and docker
Jenkins, pipeline and docker
AgileDenver
 
Ci for-android-apps
Ci for-android-appsCi for-android-apps
Ci for-android-apps
Anthony Dahanne
 
(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines
Steffen Gebert
 
Using Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and JenkinsUsing Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and Jenkins
Micael Gallego
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
Steffen Gebert
 
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDEAn Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
KubeAcademy
 
Introduction to the Android NDK
Introduction to the Android NDKIntroduction to the Android NDK
Introduction to the Android NDK
Sebastian Mauer
 
Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2
Michal Ziarnik
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
Eui Heo
 
Pipeline based deployments on Jenkins
Pipeline based deployments  on JenkinsPipeline based deployments  on Jenkins
Pipeline based deployments on Jenkins
Knoldus Inc.
 
Android Native Development Kit
Android Native Development KitAndroid Native Development Kit
Android Native Development Kit
Peter R. Egli
 
Brief intro to K8s controller and operator
Brief intro to K8s controller and operator Brief intro to K8s controller and operator
Brief intro to K8s controller and operator
Shang Xiang Fan
 
Ci with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgiumCi with jenkins docker and mssql belgium
Ci with jenkins docker and mssql belgium
Chris Adkin
 
Eclipse 2011 Hot Topics
Eclipse 2011 Hot TopicsEclipse 2011 Hot Topics
Eclipse 2011 Hot Topics
Lars Vogel
 
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 PipelineDelivery Pipeline as Code: using Jenkins 2.0 Pipeline
Delivery Pipeline as Code: using Jenkins 2.0 Pipeline
Slawa Giterman
 
Import golang; struct microservice
Import golang; struct microserviceImport golang; struct microservice
Import golang; struct microservice
Giulio De Donato
 
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
ZongXian Shen
 
Jenkins, pipeline and docker
Jenkins, pipeline and docker Jenkins, pipeline and docker
Jenkins, pipeline and docker
AgileDenver
 
(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines
Steffen Gebert
 
Using Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and JenkinsUsing Docker to build and test in your laptop and Jenkins
Using Docker to build and test in your laptop and Jenkins
Micael Gallego
 
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins PipelinesAn Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
An Open-Source Chef Cookbook CI/CD Implementation Using Jenkins Pipelines
Steffen Gebert
 
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDEAn Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
An Introduction to Eclipse Che - Next-Gen Eclipse Java IDE
KubeAcademy
 
Introduction to the Android NDK
Introduction to the Android NDKIntroduction to the Android NDK
Introduction to the Android NDK
Sebastian Mauer
 
Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2Pipeline as code - new feature in Jenkins 2
Pipeline as code - new feature in Jenkins 2
Michal Ziarnik
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
Eui Heo
 
Pipeline based deployments on Jenkins
Pipeline based deployments  on JenkinsPipeline based deployments  on Jenkins
Pipeline based deployments on Jenkins
Knoldus Inc.
 
Android Native Development Kit
Android Native Development KitAndroid Native Development Kit
Android Native Development Kit
Peter R. Egli
 

Similar to How to build a tool for operating Flink on Kubernetes (20)

Odo improving the developer experience on OpenShift - hack & sangria
Odo   improving the developer experience on OpenShift - hack & sangriaOdo   improving the developer experience on OpenShift - hack & sangria
Odo improving the developer experience on OpenShift - hack & sangria
Jorge Morales
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Flink Forward
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
Bob Killen
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Stanislav Pogrebnyak
 
Introduction to Tekton
Introduction to TektonIntroduction to Tekton
Introduction to Tekton
Victor Iglesias
 
Rejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform GainRejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform Gain
Łukasz Piątkowski
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShift
Yaniv cohen
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
corehard_by
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Gabriel Carro
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward
 
Introducing Koki Short
Introducing Koki ShortIntroducing Koki Short
Introducing Koki Short
Sidhartha Mani
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
CI/CD Across Multiple Environments
CI/CD Across Multiple EnvironmentsCI/CD Across Multiple Environments
CI/CD Across Multiple Environments
Karl Isenberg
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
William Stewart
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetup
Mist.io
 
When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?
Niklas Heidloff
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
Bob Killen
 
The State of the Veil Framework
The State of the Veil FrameworkThe State of the Veil Framework
The State of the Veil Framework
VeilFramework
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
Odo improving the developer experience on OpenShift - hack & sangria
Odo   improving the developer experience on OpenShift - hack & sangriaOdo   improving the developer experience on OpenShift - hack & sangria
Odo improving the developer experience on OpenShift - hack & sangria
Jorge Morales
 
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang WangVirtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Flink Forward
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
Bob Killen
 
Rejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform GainRejekts 24 EU No GitOps Pain, No Platform Gain
Rejekts 24 EU No GitOps Pain, No Platform Gain
Łukasz Piątkowski
 
Devops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShiftDevops with Python by Yaniv Cohen DevopShift
Devops with Python by Yaniv Cohen DevopShift
Yaniv cohen
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
corehard_by
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
Gabriel Carro
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward
 
Introducing Koki Short
Introducing Koki ShortIntroducing Koki Short
Introducing Koki Short
Sidhartha Mani
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
CI/CD Across Multiple Environments
CI/CD Across Multiple EnvironmentsCI/CD Across Multiple Environments
CI/CD Across Multiple Environments
Karl Isenberg
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
William Stewart
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetup
Mist.io
 
When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?When to use Serverless? When to use Kubernetes?
When to use Serverless? When to use Kubernetes?
Niklas Heidloff
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
Bob Killen
 
The State of the Veil Framework
The State of the Veil FrameworkThe State of the Veil Framework
The State of the Veil Framework
VeilFramework
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
Ad

Recently uploaded (20)

Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Mastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core PillarsMastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core Pillars
Marcel David
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Salesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdfSalesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdf
SRINIVASARAO PUSULURI
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install IllustratorAdobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install Illustrator
usmanhidray
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Mastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core PillarsMastering OOP: Understanding the Four Core Pillars
Mastering OOP: Understanding the Four Core Pillars
Marcel David
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Salesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdfSalesforce Aged Complex Org Revitalization Process .pdf
Salesforce Aged Complex Org Revitalization Process .pdf
SRINIVASARAO PUSULURI
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install IllustratorAdobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install Illustrator
usmanhidray
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Ad

How to build a tool for operating Flink on Kubernetes

  • 1. How to build a tool for operating Flink on Kubernetes Andrea Medeghini Software Engineer / Contractor
  • 2. Which products are available? Ververica dA Platform: ● Automated deployments ● Easy management of jobs ● Monitoring ● Logging (ELK) ● Trial version available
  • 3. Are there free alternatives? There are few projects on GitHub, however… ● They mainly focus on deployment ● They do not provide a complete solution ● They might not work for all use cases
  • 4. Can we use Helm? It’s good, but it doesn’t help with… ● Jobs and Savepoints ● Monitoring / Alerting ● Automatic Scaling # helm install --name my-flink-cluster charts/flink
  • 5. Wait for next Flink release? Better integration with Kubernetes it’s coming: ● Reactive container mode https://issues.apache.org/jira/browse/FLINK-10407 ● Active Kubernetes integration https://issues.apache.org/jira/browse/FLINK-9953
  • 6. Shall we build our own tool? It’s going to be challenging! Because… ● Flink is a distributed engine ● Flink is a stateful engine ● Jobs need to be packaged and uploaded ● Jobs need to be monitored to detect failures ● Resources need to be adjusted according to workload
  • 7. Do we need an open source tool? Everybody likes open source tools… … how do we build one?
  • 8. Overview of a Flink Cluster ● One or more JobManagers (typically one) ● One or more TaskManagers (typically many) ● One or more jobs packaged as JAR files ● Storage for savepoints
  • 9. Exploiting Kubernetes API It’s all REST! There are clients libraries… … for many languages not only Go! See Kubernetes Documentation: https://kubernetes.io/docs/reference/
  • 10. Control Kubernetes Programmatically val jobmanagerStatefulSet = V1StatefulSet() .metadata(jobmanagerMetadata) .spec( V1StatefulSetSpec() .replicas(1) .template( V1PodTemplateSpec().spec(jobmanagerPodSpec).metadata(jobmanagerMetadata) ) .updateStrategy(updateStrategy) .serviceName("jobmanager") .selector(jobmanagerSelector) .addVolumeClaimTemplatesItem(persistentVolumeClaim) ) api.createNamespacedStatefulSet(namespace, jobmanagerStatefulSet, null, null, null)
  • 11. What resources do we need? ● StatefulSet for JobManager (1 replica) ● StatefulSet for TaskManager (N replicas) ● Services for JobManager (headless, NodePort, ...) ● PersistentVolumeClaims ● …
  • 12. What configuration do we need? ● Set JOB_MANAGER_RPC_ADDRESS to JobManager service ● Set TASK_MANAGER_NUMBER_OF_TASK_SLOTS to 1 ● Set memory limits of container higher than max heap ● Set CPU limits to sensible value ● Configure pod affinity to spread workload ● Expose relevant ports (usually only internally) ● Add sensible labels to identify resources
  • 13. Run exec against the Job Manager How does it work? ● Kubernetes Client for managing clusters ● Exec for executing commands in the containers
  • 14. Easy to implement but... ● It depends on commands installed in the container ● It seems too consuming in terms of resources (we need to run a process inside the container for each operation) ● It doesn’t enforce any protocol (stdin/stdout)
  • 15. Flink Monitoring API to the rescue! Flink has a pretty useful REST API: ● Endpoints for managing jobs ● Endpoints for managing savepoints ● Endpoints for monitoring the cluster Is there a client library? I am afraid not…
  • 16. Create client using OpenAPI I manually crafted an OpenAPI specification file… … It’s tedious but the generated client works fine! See Swagger Documentation: https://swagger.io/docs/specification/about/
  • 17. Swagger Editor and Code generator /v1/jobs: get: operationId: getJobs summary: Returns an overview over all jobs and their current state responses: '200': description: |- 200 response content: application/json: schema: $ref: '#/components/schemas/JobIdsWithStatusOverview' ... See full specification on GitHub: https://github.com/nextbreakpoint/flink-client/blob/master/flink-openapi.yaml
  • 18. Combine all in one application We can combine the APIs: ● Kubernetes Client for managing clusters ● Flink Client for managing jobs
  • 19. What are the limitations? ● Where does the client live? ● Still no monitoring or automatic scaling ● NodePort or Port Forward required for each JobManager (for each Flink Cluster) ● Port Forward doesn’t work well with file upload (there is a problem with timeout in the Kubernetes Client for Java)
  • 20. Run controller inside Kubernetes What are the benefits? ● It can easily access internal resources ● It runs with its own service account ● It can monitor the clusters ● It can rescale the clusters
  • 21. Better than before but... ● One port forward is still required ● Authorization is required for API ● It doesn’t follow best practises!
  • 22. We need a Kubernetes Operator! Everybody think we need Go, but… … an Operator is like a pattern… … and we can use any programming language! Operator SDK for Go: https://github.com/operator-framework/operator-sdk Operator Pattern: https://coreos.com/blog/introducing-operators.html
  • 23. Custom Resource Definition apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: flinkclusters.beta.nextbreakpoint.com spec: group: beta.nextbreakpoint.com versions: - name: v1 served: true storage: true scope: Namespaced names: plural: flinkclusters singular: flinkcluster kind: FlinkCluster shortNames: - fc /api/beta.nextbreakpoint.com/v1/namespaces/*/flinkclusters # kubectl create -f flink-crd.yaml # kubectl get crd NAME AGE flinkclusters.beta.nextbreakpoint.com 1d …
  • 24. Custom Objects apiVersion: "beta.nextbreakpoint.com/v1" kind: FlinkCluster metadata: name: test spec: clusterName: test environment: test pullSecrets: regcred pullPolicy: Always flinkImage: nextbreakpoint/flink:1.7.2-1 sidecarImage: flink-workshop-jobs:2 sidecarServiceAccount: flink-operator sidecarClassName: com.nextbreakpoint.flink.jobs.TestJob sidecarJarPath: /com.nextbreakpoint.flinkworkshop-1.0.0.jar sidecarParallelism: 1 sidecarArguments: - --BUCKET_BASE_PATH - file:///var/tmp # kubectl create -f cluster.yaml # kubectl get flinkclusters NAME AGE test 4s
  • 25. The Operator Loop 1. Receive updates of Custom Objects 2. Receive updates of StatefulSets, Services, PVCs, … 3. Compare desired state to actual state 4. Adjust current state to match desired state 5. Repeat from 1
  • 26. Run a Flink Operator What are the benefits? ● It follows Kubernetes best practises ● It runs with its own service account ● We only need to create cluster objects
  • 27. Operator meets Controller They can operate together: ● Use operator with CD pipeline ● Use controller for manual ops ● Use controller for monitoring ● Use controller for alerting ● Use controller for scaling
  • 28. Time for a demo ! A preview of Flink K8S Toolbox: ● Easy installation ● Easy deployments ● Jobs management ● Cluster metrics ● Cluster scaling
  • 29. Monitoring and Scaling We can use Flink API for: ● Watching jobs status and alerting when something is broken ● Observing cluster metrics and scaling cluster when required
  • 30. Checkpoints/Savepoints We can use Flink API for: ● Monitoring checkpoints ● Managing savepoints ● Retrieving last savepoint
  • 31. Continuous Delivery We can use tools like Flux: ● Push changes into Git repo ● Changes are automatically applied to resources Flux (I haven’t actually tried it) https://github.com/weaveworks/flux
  • 32. Nice features to have... ● Pluggable alerting strategy ● Pluggable scaling strategy ● Web console ● Secure access ● Support for HA mode ● …
  • 33. It’s all free! Flink Kubernetes Toolbox: https://github.com/nextbreakpoint/flink-k8s-toolbox Related projects: https://github.com/nextbreakpoint/flink-client https://github.com/nextbreakpoint/flink-workshop https://github.com/nextbreakpoint/kubernetes-playground