SlideShare a Scribd company logo
Enabling ceph-mgr to control Ceph
services via Kubernetes
28 August 2018
Travis Nielsen
Rook
tnielsen@redhat.com
John Spray
Ceph Mgr
jspray@redhat.com
2
Ceph operations today
●
RPM packages (all daemons on server same version)
●
Physical services configured by external orchestrator:
– Ansible, salt, etc
●
Logical entities configured via Ceph itself (pools,
filesystems, auth):
– CLI, mgr module interface, restful module
– Separate workflow from the physical deployment
●
Plus some external monitoring to make sure your
services stay up
3
Pain points
●
All those elements combine to create a high surface area
between users and the software.
●
Lots of human decision making, opportunities for mistakes
●
In practice, deployments often kept relatively static after initial
decision making is done.
Can new container environments enable something better?
4
The solution: container orchestration
●
Kubernetes implements the basic operations that we need for
the management of cluster services
– Deploy builds (in container format)
– Detect devices, start container in specific location (OSD)
– Schedule/place groups of services (MDS, RGW)
●
If we were writing a Ceph management server/agent today, it
would look much like Kubernetes: so let’s just use Kubernetes
●
Kubernetes gives us the primitives
●
We still need the business logic and UI
5
Why Kubernetes?
●
Widely adopted (Red Hat OpenShift, Google Compute
Engine, Amazon EKS, etc.)
●
CLI/REST driven (extensible API)
●
Lightweight design
Rook
7
Rook
●
Simplified, container-native way of consuming Ceph
●
Built for Kubernetes, extending the Kubernetes API
●
CNCF sandbox project (proposed for incubation)
http://rook.io/
http://github.com/rook/rook
8
Rook components
●
Docker Image: Ceph and Rook binaries in one artifact
– In Rook 0.9, these will be decoupled
●
The Agent handles mounting volumes
– Hide complexity of client version, kernel version variations
●
The Operator watches objects in K8s, manipulates Ceph
in response
– Create a “Filesystem” object, Rook operator does corresponding
“ceph fs new”
9
Rook example
$ kubectl create -f operator.yaml
$ kubectl create -f cluster.yaml
$ kubectl -n rook-ceph get pod
NAME READY STATUS
rook-ceph-mgr-a-9c44495df-jpfvb 1/1 Running
rook-ceph-mon0-zz8l2 1/1 Running
rook-ceph-mon1-rltcp 1/1 Running
rook-ceph-mon2-lxl9x 1/1 Running
rook-ceph-osd-id-0-76f5696669-d9gwj 1/1 Running
rook-ceph-osd-id-1-5d8477d8f4-kq7n5 1/1 Running
rook-ceph-osd-prepare-minikube-tj69w 0/1 Completed
apiVersion: ceph.rook.io/v1beta1
kind: Cluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
mon:
count: 3
network:
hostNetwork: false
storage:
useAllNodes: true
useAllDevices: true
config:
storeType: bluestore
10
Rook user interface
●
Rook objects are created via the extensible Kubernetes
API service (Custom Resource Definitions)
●
kubectl + yaml files
– This style is consistent with Kubernetes ecosystem
●
Point and Click is desirable for many users (& vendors)
– Deleting a pool should require a confirmation button!
Combining Rook with ceph-mgr
12
“Just give me the storage”
●
Rook’s simplified model is suitable for people who do not want to
pay any attention to how Ceph is configured: they just want to
see a volume attached to their container.
●
People buying hardware (or paying for cloud) often care a lot
about how the storage cluster is configured.
●
Lifecycle: over time users care more and more about optimizing
resource usage.
13
What is ceph-mgr?
●
Component of RADOS: a sibling of the mon and OSD
daemons. C++ code using same auth/networking stack.
●
Mandatory component: includes key functionality
●
Host to python modules that do monitoring/management
●
Relatively simple in itself: the fun parts are the python
modules.
14
Dashboard module
● Mimic (13.2.x) release includes an extended management
web UI based on OpenAttic
●
Would like Kubernetes integration, so that we can create
containers from the dashboard too:
– The “Create Filesystem” button starts MDS cluster
– A “Create OSD” button that starts OSDs
→ Call out to Rook from ceph-mgr
(and to other orchestrators too)
15
Three ways to consume containerized Ceph
Rook operator
K8s
Ceph-mgr dashboardRook user
kubectl
yaml files
point+click
Rook agent
Ceph CLI
Rook toolbox
All Ceph command line tools
Ceph
Demo
Rook + Mimic Dashboard
17
Rook automation vs Ceph-Mgr
Both Rook and ceph-mgr are managing the state of the cluster
●
ceph-mgr creates pools and Filesystem object
●
Rook creates the MDS container
– Pools and the file system are skipped by Rook
●
Rook settings can change modes as needed:
– Full management: Pure Rook
– Partial management: Shared mgmt with the dashboard
18
Why not build Rook-like functionality into mgr?
1. Upgrades!
– An external component needs to orchestrate the Ceph upgrade, while other
Ceph services may be offline (aka “who manages the manager?”)
2. Commonality between simplified pure-Rook systems and
fully-featured containerized Ceph clusters.
3. Rook’s client mounting/volume management
19
What Kubernetes doesn’t do for us
●
Install itself
●
Configure the underlying network
●
Bootstrap Rook
→ External setup tools will continue to have a role in the non-Ceph-
specific tasks
Ceph orchestrator modules
21
Orchestrator modules
●
Would like to drive tasks like creating OSDs from the dashboard
●
Ceph users use various different orchestrators:
– Ansible (ceph-ansible)
– SaltStack (DeepSea)
– Kubernetes (Rook).
●
Abstraction layer: the orchestrator interface
22
Orchestrator module Interface
Subclass of MgrModule: specialized ceph-mgr modules that
implement a set of service orchestration primitives:
●
Get device inventory
●
Create OSDs
●
Start/stop/scale stateless services (MDS, RGW, etc)
23
rook module
●
Implement Orchestrator interface using Kubernetes API
client, mapping operations to Rook’s structures:
– Device inventory → read ConfigMaps populated by Rook
– OSD creation → add entries to cluster→nodes→devices
– MDS creation → create FilesystemSpec entities
– RGW creation → create ObjectStoreSpec entities
●
Some extra code to implement clean completions/progress
events, e.g. not reporting OSD creation complete until OSD is
actually up in OSDMap.
New in Nautilus
24
Orchestration vs Ceph Management
●
External orchestrators are handling physical deployment of
services, but most logical management is still direct to Ceph
●
We must continue to make managing Ceph easier, and where
possible, remove need for intervention.
●
Ceph-mgr modules fill this role
●
Orchestrator should orchestrate, Ceph modules should manage
25
Orchestrator Simplification
Orchestrators mix physically deploying Ceph services with logical
configuration:
●
Rook creates volumes as CephFS filesystems, but this means
creating underlying pools. How does it know how to configure
them?
●
Same for anything deploying RGW
●
Rook also exposes some health/monitoring of the Ceph cluster,
but is this in terms a non-Ceph-expert can understand?
Ceph management modules
27
Placement group merging
●
Historically, pg_num could be increased but not decreased
●
Sometimes problematic, such as when physically shrinking a
cluster, or if bad pg_nums were chosen.
●
Bigger problem: prevented automatic pg_num selection, because
mistakes could not be reversed.
●
Implementation is not simple, and doing it still has an IO cost, but
the option will be there → now we can autoselect pg_num!
Targeted for Nautilus
28
poolsets module
●
Pick pg_num so the user doesn’t have to!
●
Hard (impossible?) to do perfectly, but...
●
Pretty easy to do useful common cases:
– Select initial pg_nums according to expected space use
– Increase pg_nums if actual space use has gone ~2x over ideal PG capacity
– Decrease pg_num for underused pools if another pool needs to increase
theirs
●
Not an optimizer! But does the job as well as most human
beings are doing it today.
Targeted for Nautilus
29
poolsets module
Prompting users for expected capacity makes sense for data pools,
but not for metadata pools:
●
Combine data and metadata pool creation into one command
●
Wrap pools into new “poolset” structure describing policy
●
Auto-construct poolsets for existing deployments, but don’t auto-
adjust unless explicitly enabled
ceph poolset create cephfs my_filesystem 100GB
New in Nautilus
30
progress module
● Health reporting was improved in luminous, but in many cases it is
still too low level.
●
Placement groups:
– Hard to distinguish between real problems and normal rebalancing
– Once we start auto-picking pg_num, users won’t know what a PG is until
they see them in the health status
●
Introduce `progress` module to synthesize high level view from
PG state: “56% recovered from failure of OSD 123”
●
Also enable other modules to describe their long running
operations via this module (creating an OSD, etc)
New in Nautilus
31
volumes module
●
Currently, ceph_volume_client.py (Used by Manila, etc) creates
“volumes” within CephFS “filesystems”, which require RADOS
“pools” and provisioning of MDS daemons.
●
Simplify this:
– Two concepts: Volumes (aka filesystems), and Subvolumes
– Automatically provision MDS daemons on demand using Rook
– Automatically create pools on demand using `poolsets` module
– Expose functionality as commands (consumable via librados) instead of
library
– Run background tasks from ceph-mgr (e.g. subvolume purge)
Targeted for Nautilus
32
From zero to working subvolume
Before
ceph osd pool create metadata 128
ceph osd pool create data 2048
ceph fs new myfs metadata data
# Create an MDS somehow…
# Call into ceph_volume_client.py
# volume_client.create_volume(…
After
ceph fs volume create myvol
ceph fs subvolume create myvol subv
Done!
33
Wrap up
●
All these improvements reduce cognitive load on ordinary
user.
– Do not need to know what an MDS is: ask Rook for a filesystem, and get
one.
– Do not need to know what a placement group is
– Do not need to know magic commands: look at the dashboard
●
Actions that no longer require human thought can now be tied
into automated workflows: fulfill the promise of software
defined storage.
●
A smart container orchestrator is an essential part of this vision:
on-demand Ceph requires on-demand service orchestration.
34
Resources
●
Rook
– https://rook.io
– https://github.com/rook/rook
●
Ceph Mgr
– http://docs.ceph.com/docs/master/mgr/plugins/
Contributions welcome!
Q&A
Ad

More Related Content

What's hot (20)

Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
Victor Palma
 
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Wojciech Barczyński
 
Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
Ajeet Singh Raina
 
Securing and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with KyvernoSecuring and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with Kyverno
Saim Safder
 
Kubernetes persistence 101
Kubernetes persistence 101Kubernetes persistence 101
Kubernetes persistence 101
Kublr
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Edureka!
 
Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple
Wojciech Barczyński
 
Building Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and DockerBuilding Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and Docker
Steve Watt
 
Building Portable Applications with Kubernetes
Building Portable Applications with KubernetesBuilding Portable Applications with Kubernetes
Building Portable Applications with Kubernetes
Kublr
 
Top 3 reasons why you should run your Enterprise workloads on GKE
Top 3 reasons why you should run your Enterprise workloads on GKETop 3 reasons why you should run your Enterprise workloads on GKE
Top 3 reasons why you should run your Enterprise workloads on GKE
Sreenivas Makam
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
Paul Czarkowski
 
A Million ways of Deploying a Kubernetes Cluster
A Million ways of Deploying a Kubernetes ClusterA Million ways of Deploying a Kubernetes Cluster
A Million ways of Deploying a Kubernetes Cluster
Jimmy Lu
 
Kubernetes Architecture and Introduction
Kubernetes Architecture and IntroductionKubernetes Architecture and Introduction
Kubernetes Architecture and Introduction
Stefan Schimanski
 
OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)
rhirschfeld
 
Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes
Weaveworks
 
KubeCon EU 2016: Heroku to Kubernetes
KubeCon EU 2016: Heroku to KubernetesKubeCon EU 2016: Heroku to Kubernetes
KubeCon EU 2016: Heroku to Kubernetes
KubeAcademy
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
DoiT International
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015
Bob Wise
 
Kubernetes and Istio
Kubernetes and IstioKubernetes and Istio
Kubernetes and Istio
Ketan Gote
 
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Sanjeev Rampal
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
Victor Palma
 
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Effective Kubernetes - Is Kubernetes the new Linux? Is the new Application Se...
Wojciech Barczyński
 
Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
Ajeet Singh Raina
 
Securing and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with KyvernoSecuring and Automating Kubernetes with Kyverno
Securing and Automating Kubernetes with Kyverno
Saim Safder
 
Kubernetes persistence 101
Kubernetes persistence 101Kubernetes persistence 101
Kubernetes persistence 101
Kublr
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Edureka!
 
Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple Effective Building your Platform with Kubernetes == Keep it Simple
Effective Building your Platform with Kubernetes == Keep it Simple
Wojciech Barczyński
 
Building Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and DockerBuilding Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and Docker
Steve Watt
 
Building Portable Applications with Kubernetes
Building Portable Applications with KubernetesBuilding Portable Applications with Kubernetes
Building Portable Applications with Kubernetes
Kublr
 
Top 3 reasons why you should run your Enterprise workloads on GKE
Top 3 reasons why you should run your Enterprise workloads on GKETop 3 reasons why you should run your Enterprise workloads on GKE
Top 3 reasons why you should run your Enterprise workloads on GKE
Sreenivas Makam
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
Paul Czarkowski
 
A Million ways of Deploying a Kubernetes Cluster
A Million ways of Deploying a Kubernetes ClusterA Million ways of Deploying a Kubernetes Cluster
A Million ways of Deploying a Kubernetes Cluster
Jimmy Lu
 
Kubernetes Architecture and Introduction
Kubernetes Architecture and IntroductionKubernetes Architecture and Introduction
Kubernetes Architecture and Introduction
Stefan Schimanski
 
OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)OpenStack on Kubernetes (BOS Summit / May 2017 update)
OpenStack on Kubernetes (BOS Summit / May 2017 update)
rhirschfeld
 
Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes
Weaveworks
 
KubeCon EU 2016: Heroku to Kubernetes
KubeCon EU 2016: Heroku to KubernetesKubeCon EU 2016: Heroku to Kubernetes
KubeCon EU 2016: Heroku to Kubernetes
KubeAcademy
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
DoiT International
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015
Bob Wise
 
Kubernetes and Istio
Kubernetes and IstioKubernetes and Istio
Kubernetes and Istio
Ketan Gote
 
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Architecture of Cisco Container Platform: A new Enterprise Multi-Cloud Kubern...
Sanjeev Rampal
 

Similar to Enabling ceph-mgr to control Ceph services via Kubernetes (20)

John Spray - Ceph in Kubernetes
John Spray - Ceph in KubernetesJohn Spray - Ceph in Kubernetes
John Spray - Ceph in Kubernetes
ShapeBlue
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
Stanislav Osipov
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
vespian_256
 
First steps on CentOs7
First steps on CentOs7First steps on CentOs7
First steps on CentOs7
Marc Cortinas Val
 
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Linux Foundation Mentorship Sessions - Kernel Livepatch: An IntroductionLinux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Marcos de Souza
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł Rozlach
PROIDEA
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainly
vespian_256
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Stanislav Pogrebnyak
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
Ceph Community
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
Kernel TLV
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
OpenNebula Project
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
Cédric Delgehier
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
Aneesh Pulickal Karunakaran
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
Jimmy Angelakos
 
Making distributed storage easy: usability in Ceph Luminous and beyond
Making distributed storage easy: usability in Ceph Luminous and beyondMaking distributed storage easy: usability in Ceph Luminous and beyond
Making distributed storage easy: usability in Ceph Luminous and beyond
Sage Weil
 
Red Hat Satellite 6 - Automation with Puppet
Red Hat Satellite 6 - Automation with PuppetRed Hat Satellite 6 - Automation with Puppet
Red Hat Satellite 6 - Automation with Puppet
Michael Lessard
 
Open Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFVOpen Dayligth usando SDN-NFV
Open Dayligth usando SDN-NFV
Open Networking Perú (Opennetsoft)
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Uri Cohen
 
John Spray - Ceph in Kubernetes
John Spray - Ceph in KubernetesJohn Spray - Ceph in Kubernetes
John Spray - Ceph in Kubernetes
ShapeBlue
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
Stanislav Osipov
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
vespian_256
 
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Linux Foundation Mentorship Sessions - Kernel Livepatch: An IntroductionLinux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Linux Foundation Mentorship Sessions - Kernel Livepatch: An Introduction
Marcos de Souza
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł Rozlach
PROIDEA
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainly
vespian_256
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
Ceph Community
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
Kernel TLV
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
OpenNebula Project
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
Cédric Delgehier
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
Jimmy Angelakos
 
Making distributed storage easy: usability in Ceph Luminous and beyond
Making distributed storage easy: usability in Ceph Luminous and beyondMaking distributed storage easy: usability in Ceph Luminous and beyond
Making distributed storage easy: usability in Ceph Luminous and beyond
Sage Weil
 
Red Hat Satellite 6 - Automation with Puppet
Red Hat Satellite 6 - Automation with PuppetRed Hat Satellite 6 - Automation with Puppet
Red Hat Satellite 6 - Automation with Puppet
Michael Lessard
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Uri Cohen
 
Ad

Recently uploaded (20)

TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Ad

Enabling ceph-mgr to control Ceph services via Kubernetes

  • 1. Enabling ceph-mgr to control Ceph services via Kubernetes 28 August 2018 Travis Nielsen Rook [email protected] John Spray Ceph Mgr [email protected]
  • 2. 2 Ceph operations today ● RPM packages (all daemons on server same version) ● Physical services configured by external orchestrator: – Ansible, salt, etc ● Logical entities configured via Ceph itself (pools, filesystems, auth): – CLI, mgr module interface, restful module – Separate workflow from the physical deployment ● Plus some external monitoring to make sure your services stay up
  • 3. 3 Pain points ● All those elements combine to create a high surface area between users and the software. ● Lots of human decision making, opportunities for mistakes ● In practice, deployments often kept relatively static after initial decision making is done. Can new container environments enable something better?
  • 4. 4 The solution: container orchestration ● Kubernetes implements the basic operations that we need for the management of cluster services – Deploy builds (in container format) – Detect devices, start container in specific location (OSD) – Schedule/place groups of services (MDS, RGW) ● If we were writing a Ceph management server/agent today, it would look much like Kubernetes: so let’s just use Kubernetes ● Kubernetes gives us the primitives ● We still need the business logic and UI
  • 5. 5 Why Kubernetes? ● Widely adopted (Red Hat OpenShift, Google Compute Engine, Amazon EKS, etc.) ● CLI/REST driven (extensible API) ● Lightweight design
  • 7. 7 Rook ● Simplified, container-native way of consuming Ceph ● Built for Kubernetes, extending the Kubernetes API ● CNCF sandbox project (proposed for incubation) http://rook.io/ http://github.com/rook/rook
  • 8. 8 Rook components ● Docker Image: Ceph and Rook binaries in one artifact – In Rook 0.9, these will be decoupled ● The Agent handles mounting volumes – Hide complexity of client version, kernel version variations ● The Operator watches objects in K8s, manipulates Ceph in response – Create a “Filesystem” object, Rook operator does corresponding “ceph fs new”
  • 9. 9 Rook example $ kubectl create -f operator.yaml $ kubectl create -f cluster.yaml $ kubectl -n rook-ceph get pod NAME READY STATUS rook-ceph-mgr-a-9c44495df-jpfvb 1/1 Running rook-ceph-mon0-zz8l2 1/1 Running rook-ceph-mon1-rltcp 1/1 Running rook-ceph-mon2-lxl9x 1/1 Running rook-ceph-osd-id-0-76f5696669-d9gwj 1/1 Running rook-ceph-osd-id-1-5d8477d8f4-kq7n5 1/1 Running rook-ceph-osd-prepare-minikube-tj69w 0/1 Completed apiVersion: ceph.rook.io/v1beta1 kind: Cluster metadata: name: rook-ceph namespace: rook-ceph spec: mon: count: 3 network: hostNetwork: false storage: useAllNodes: true useAllDevices: true config: storeType: bluestore
  • 10. 10 Rook user interface ● Rook objects are created via the extensible Kubernetes API service (Custom Resource Definitions) ● kubectl + yaml files – This style is consistent with Kubernetes ecosystem ● Point and Click is desirable for many users (& vendors) – Deleting a pool should require a confirmation button!
  • 12. 12 “Just give me the storage” ● Rook’s simplified model is suitable for people who do not want to pay any attention to how Ceph is configured: they just want to see a volume attached to their container. ● People buying hardware (or paying for cloud) often care a lot about how the storage cluster is configured. ● Lifecycle: over time users care more and more about optimizing resource usage.
  • 13. 13 What is ceph-mgr? ● Component of RADOS: a sibling of the mon and OSD daemons. C++ code using same auth/networking stack. ● Mandatory component: includes key functionality ● Host to python modules that do monitoring/management ● Relatively simple in itself: the fun parts are the python modules.
  • 14. 14 Dashboard module ● Mimic (13.2.x) release includes an extended management web UI based on OpenAttic ● Would like Kubernetes integration, so that we can create containers from the dashboard too: – The “Create Filesystem” button starts MDS cluster – A “Create OSD” button that starts OSDs → Call out to Rook from ceph-mgr (and to other orchestrators too)
  • 15. 15 Three ways to consume containerized Ceph Rook operator K8s Ceph-mgr dashboardRook user kubectl yaml files point+click Rook agent Ceph CLI Rook toolbox All Ceph command line tools Ceph
  • 16. Demo Rook + Mimic Dashboard
  • 17. 17 Rook automation vs Ceph-Mgr Both Rook and ceph-mgr are managing the state of the cluster ● ceph-mgr creates pools and Filesystem object ● Rook creates the MDS container – Pools and the file system are skipped by Rook ● Rook settings can change modes as needed: – Full management: Pure Rook – Partial management: Shared mgmt with the dashboard
  • 18. 18 Why not build Rook-like functionality into mgr? 1. Upgrades! – An external component needs to orchestrate the Ceph upgrade, while other Ceph services may be offline (aka “who manages the manager?”) 2. Commonality between simplified pure-Rook systems and fully-featured containerized Ceph clusters. 3. Rook’s client mounting/volume management
  • 19. 19 What Kubernetes doesn’t do for us ● Install itself ● Configure the underlying network ● Bootstrap Rook → External setup tools will continue to have a role in the non-Ceph- specific tasks
  • 21. 21 Orchestrator modules ● Would like to drive tasks like creating OSDs from the dashboard ● Ceph users use various different orchestrators: – Ansible (ceph-ansible) – SaltStack (DeepSea) – Kubernetes (Rook). ● Abstraction layer: the orchestrator interface
  • 22. 22 Orchestrator module Interface Subclass of MgrModule: specialized ceph-mgr modules that implement a set of service orchestration primitives: ● Get device inventory ● Create OSDs ● Start/stop/scale stateless services (MDS, RGW, etc)
  • 23. 23 rook module ● Implement Orchestrator interface using Kubernetes API client, mapping operations to Rook’s structures: – Device inventory → read ConfigMaps populated by Rook – OSD creation → add entries to cluster→nodes→devices – MDS creation → create FilesystemSpec entities – RGW creation → create ObjectStoreSpec entities ● Some extra code to implement clean completions/progress events, e.g. not reporting OSD creation complete until OSD is actually up in OSDMap. New in Nautilus
  • 24. 24 Orchestration vs Ceph Management ● External orchestrators are handling physical deployment of services, but most logical management is still direct to Ceph ● We must continue to make managing Ceph easier, and where possible, remove need for intervention. ● Ceph-mgr modules fill this role ● Orchestrator should orchestrate, Ceph modules should manage
  • 25. 25 Orchestrator Simplification Orchestrators mix physically deploying Ceph services with logical configuration: ● Rook creates volumes as CephFS filesystems, but this means creating underlying pools. How does it know how to configure them? ● Same for anything deploying RGW ● Rook also exposes some health/monitoring of the Ceph cluster, but is this in terms a non-Ceph-expert can understand?
  • 27. 27 Placement group merging ● Historically, pg_num could be increased but not decreased ● Sometimes problematic, such as when physically shrinking a cluster, or if bad pg_nums were chosen. ● Bigger problem: prevented automatic pg_num selection, because mistakes could not be reversed. ● Implementation is not simple, and doing it still has an IO cost, but the option will be there → now we can autoselect pg_num! Targeted for Nautilus
  • 28. 28 poolsets module ● Pick pg_num so the user doesn’t have to! ● Hard (impossible?) to do perfectly, but... ● Pretty easy to do useful common cases: – Select initial pg_nums according to expected space use – Increase pg_nums if actual space use has gone ~2x over ideal PG capacity – Decrease pg_num for underused pools if another pool needs to increase theirs ● Not an optimizer! But does the job as well as most human beings are doing it today. Targeted for Nautilus
  • 29. 29 poolsets module Prompting users for expected capacity makes sense for data pools, but not for metadata pools: ● Combine data and metadata pool creation into one command ● Wrap pools into new “poolset” structure describing policy ● Auto-construct poolsets for existing deployments, but don’t auto- adjust unless explicitly enabled ceph poolset create cephfs my_filesystem 100GB New in Nautilus
  • 30. 30 progress module ● Health reporting was improved in luminous, but in many cases it is still too low level. ● Placement groups: – Hard to distinguish between real problems and normal rebalancing – Once we start auto-picking pg_num, users won’t know what a PG is until they see them in the health status ● Introduce `progress` module to synthesize high level view from PG state: “56% recovered from failure of OSD 123” ● Also enable other modules to describe their long running operations via this module (creating an OSD, etc) New in Nautilus
  • 31. 31 volumes module ● Currently, ceph_volume_client.py (Used by Manila, etc) creates “volumes” within CephFS “filesystems”, which require RADOS “pools” and provisioning of MDS daemons. ● Simplify this: – Two concepts: Volumes (aka filesystems), and Subvolumes – Automatically provision MDS daemons on demand using Rook – Automatically create pools on demand using `poolsets` module – Expose functionality as commands (consumable via librados) instead of library – Run background tasks from ceph-mgr (e.g. subvolume purge) Targeted for Nautilus
  • 32. 32 From zero to working subvolume Before ceph osd pool create metadata 128 ceph osd pool create data 2048 ceph fs new myfs metadata data # Create an MDS somehow… # Call into ceph_volume_client.py # volume_client.create_volume(… After ceph fs volume create myvol ceph fs subvolume create myvol subv Done!
  • 33. 33 Wrap up ● All these improvements reduce cognitive load on ordinary user. – Do not need to know what an MDS is: ask Rook for a filesystem, and get one. – Do not need to know what a placement group is – Do not need to know magic commands: look at the dashboard ● Actions that no longer require human thought can now be tied into automated workflows: fulfill the promise of software defined storage. ● A smart container orchestrator is an essential part of this vision: on-demand Ceph requires on-demand service orchestration.
  • 34. 34 Resources ● Rook – https://rook.io – https://github.com/rook/rook ● Ceph Mgr – http://docs.ceph.com/docs/master/mgr/plugins/ Contributions welcome!
  • 35. Q&A

Editor's Notes

  • #31: FAQ: this seems like it would struggle with corner cases like a PG failing on one OSD, then the new OSD failing too, do you end up with multiple progress bars or what? A: The code handles the simple cases well, and when things get complicated we just do the simplest thing we can. The fallback general case is to look at the overall recovery progress of the cluster, if it doesn’t break down neatly into individual progress events.
  • #32: FAQ: why was ceph_volume_client.py implemented externally to begin with? A: ceph_volume_client.py predates ceph-mgr, we’ve been wanting to integrate it for a while. FAQ: why rename the entities? A: Needed to distinguish between “lightweight volume” (subvolume) and “heavyweight volume” (volume). Term “filesystem” was prone to confusion, because at the point you mount a subvolume, that is a filesystem from the POV of the client node. FAQ: A separate MDS for each volume, that seems so resource intensive! A: That’s the point of subvolumes: you can choose when to deploy a full blown filesystem, and when to just carve out a logical partition in an existing one. By the way, MDS daemons are increasingly smart about how they manage memory, enabling running more daemons with less memory each, if your workload doesn’t require a single high-memory MDS with a big cache.
  • #33: Key idea: that this isn’t just a convenience to hide a few commands, we’re protecting the user from making decisions like PG counts (delegate decision to poolsets), and where to run an MDS daemon (delegate scheduling to k8s) Key idea: by implementing this functionality inside Ceph, we enable seamless integration with dashboard, etc. Nothing extra to install, nothing extra to configure.