Storage - Kubernetes
Storage - Kubernetes
Storage
Ways to provide both long-term and temporary storage to Pods in your
cluster.
1: Volumes
2: Persistent Volumes
3: Projected Volumes
4: Ephemeral Volumes
5: Storage Classes
6: Dynamic Volume Provisioning
7: Volume Snapshots
8: Volume Snapshot Classes
9: CSI Volume Cloning
10: Storage Capacity
11: Node-specific Volume Limits
12: Volume Health Monitoring
13: Windows Storage
1 - Volumes
On-disk files in a container are ephemeral, which presents some problems for non-trivial
applications when running in containers. One problem occurs when a container crashes or is
stopped. Container state is not saved so all of the files that were created or modified during
the lifetime of the container are lost. During a crash, kubelet restarts the container with a
clean state. Another problem occurs when multiple containers are running in a Pod and need
to share files. It can be challenging to setup and access a shared filesystem across all of the
containers. The Kubernetes volume abstraction solves both of these problems. Familiarity
with Pods is suggested.
Background
Kubernetes supports many types of volumes. A Pod can use any number of volume types
simultaneously. Ephemeral volume types have a lifetime of a pod, but persistent volumes
exist beyond the lifetime of a pod. When a pod ceases to exist, Kubernetes destroys
ephemeral volumes; however, Kubernetes does not destroy persistent volumes. For any kind
of volume in a given pod, data is preserved across container restarts.
At its core, a volume is a directory, possibly with some data in it, which is accessible to the
containers in a pod. How that directory comes to be, the medium that backs it, and the
contents of it are determined by the particular volume type used.
To use a volume, specify the volumes to provide for the Pod in .spec.volumes and declare
where to mount those volumes into containers in .spec.containers[*].volumeMounts . A
process in a container sees a filesystem view composed from the initial contents of the
container image, plus volumes (if defined) mounted inside the container. The process sees a
root filesystem that initially matches the contents of the container image. Any writes to within
that filesystem hierarchy, if allowed, affect what that process views when it performs a
subsequent filesystem access. Volumes mount at the specified paths within the image. For
each container defined within a Pod, you must independently specify where to mount each
volume that the container uses.
Volumes cannot mount within other volumes (but see Using subPath for a related
mechanism). Also, a volume cannot contain a hard link to anything in a different volume.
https://kubernetes.io/docs/concepts/storage/_print/ 1/76
6/6/23, 3:54 PM Storage | Kubernetes
Types of volumes
Kubernetes supports several types of volumes.
awsElasticBlockStore (deprecated)
FEATURE STATE: Kubernetes v1.17 [deprecated]
An awsElasticBlockStore volume mounts an Amazon Web Services (AWS) EBS volume into
your pod. Unlike emptyDir , which is erased when a pod is removed, the contents of an EBS
volume are persisted and the volume is unmounted. This means that an EBS volume can be
pre-populated with data, and that data can be shared between pods.
Note: You must create an EBS volume by using aws ec2 create-volume or the AWS API
before you can use it.
the nodes on which pods are running must be AWS EC2 instances
those instances need to be in the same region and availability zone as the EBS volume
EBS only supports a single EC2 instance mounting a volume
Make sure the zone matches the zone you brought up your cluster in. Check that the size and
EBS volume type are suitable for your use.
apiVersion: v1
kind: Pod
metadata:
name: test-ebs
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-ebs
name: test-volume
volumes:
- name: test-volume
# This AWS EBS volume must already exist.
awsElasticBlockStore:
volumeID: "<volume id>"
fsType: ext4
If the EBS volume is partitioned, you can supply the optional field partition: "<partition
number>" to specify which partition to mount on.
https://kubernetes.io/docs/concepts/storage/_print/ 2/76
6/6/23, 3:54 PM Storage | Kubernetes
The CSIMigration feature for awsElasticBlockStore , when enabled, redirects all plugin
operations from the existing in-tree plugin to the ebs.csi.aws.com Container Storage
Interface (CSI) driver. In order to use this feature, the AWS EBS CSI driver must be installed on
the cluster.
To disable the awsElasticBlockStore storage plugin from being loaded by the controller
manager and the kubelet, set the InTreePluginAWSUnregister flag to true .
azureDisk (deprecated)
FEATURE STATE: Kubernetes v1.19 [deprecated]
The azureDisk volume type mounts a Microsoft Azure Data Disk into a pod.
The CSIMigration feature for azureDisk , when enabled, redirects all plugin operations from
the existing in-tree plugin to the disk.csi.azure.com Container Storage Interface (CSI)
Driver. In order to use this feature, the Azure Disk CSI Driver must be installed on the cluster.
To disable the azureDisk storage plugin from being loaded by the controller manager and
the kubelet, set the InTreePluginAzureDiskUnregister flag to true .
azureFile (deprecated)
FEATURE STATE: Kubernetes v1.21 [deprecated]
The azureFile volume type mounts a Microsoft Azure File volume (SMB 2.1 and 3.0) into a
pod.
The CSIMigration feature for azureFile , when enabled, redirects all plugin operations from
the existing in-tree plugin to the file.csi.azure.com Container Storage Interface (CSI)
Driver. In order to use this feature, the Azure File CSI Driver must be installed on the cluster
and the CSIMigrationAzureFile feature gates must be enabled.
Azure File CSI driver does not support using same volume with different fsgroups. If
CSIMigrationAzureFile is enabled, using same volume with different fsgroups won't be
supported at all.
To disable the azureFile storage plugin from being loaded by the controller manager and
the kubelet, set the InTreePluginAzureFileUnregister flag to true .
https://kubernetes.io/docs/concepts/storage/_print/ 3/76
6/6/23, 3:54 PM Storage | Kubernetes
cephfs
A volume allows an existing CephFS volume to be mounted into your Pod. Unlike
cephfs
emptyDir , which is erased when a pod is removed, the contents of a cephfs volume are
preserved and the volume is merely unmounted. This means that a cephfs volume can be
pre-populated with data, and that data can be shared between pods. The cephfs volume can
be mounted by multiple writers simultaneously.
Note: You must have your own Ceph server running with the share exported before you
can use it.
cinder (deprecated)
FEATURE STATE: Kubernetes v1.18 [deprecated]
The cinder volume type is used to mount the OpenStack Cinder volume into your pod.
apiVersion: v1
kind: Pod
metadata:
name: test-cinder
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-cinder-container
volumeMounts:
- mountPath: /test-cinder
name: test-volume
volumes:
- name: test-volume
# This OpenStack volume must already exist.
cinder:
volumeID: "<volume id>"
fsType: ext4
The CSIMigration feature for Cinder is enabled by default since Kubernetes 1.21. It redirects
all plugin operations from the existing in-tree plugin to the cinder.csi.openstack.org
Container Storage Interface (CSI) Driver. OpenStack Cinder CSI Driver must be installed on the
cluster.
To disable the in-tree Cinder plugin from being loaded by the controller manager and the
kubelet, you can enable the InTreePluginOpenStackUnregister feature gate.
configMap
A ConfigMap provides a way to inject configuration data into pods. The data stored in a
ConfigMap can be referenced in a volume of type configMap and then consumed by
containerized applications running in a pod.
https://kubernetes.io/docs/concepts/storage/_print/ 4/76
6/6/23, 3:54 PM Storage | Kubernetes
When referencing a ConfigMap, you provide the name of the ConfigMap in the volume. You
can customize the path to use for a specific entry in the ConfigMap. The following
configuration shows how to mount the log-config ConfigMap onto a Pod called configmap-
pod :
apiVersion: v1
kind: Pod
metadata:
name: configmap-pod
spec:
containers:
- name: test
image: busybox:1.28
volumeMounts:
- name: config-vol
mountPath: /etc/config
volumes:
- name: config-vol
configMap:
name: log-config
items:
- key: log_level
path: log_level
The log-config ConfigMap is mounted as a volume, and all contents stored in its log_level
entry are mounted into the Pod at path /etc/config/log_level . Note that this path is
derived from the volume's mountPath and the path keyed with log_level .
Note:
You must create a ConfigMap before you can use it.
Text data is exposed as files using the UTF-8 character encoding. For other character
encodings, use binaryData .
downwardAPI
A downwardAPI volume makes downward API data available to applications. Within the
volume, you can find the exposed data as read-only files in plain text format.
Note: A container using the downward API as a subPath volume mount does not receive
updates when field values change.
emptyDir
An emptyDir volume is first created when a Pod is assigned to a node, and exists as long as
that Pod is running on that node. As the name says, the emptyDir volume is initially empty.
All containers in the Pod can read and write the same files in the emptyDir volume, though
that volume can be mounted at the same or different paths in each container. When a Pod is
removed from a node for any reason, the data in the emptyDir is deleted permanently.
Note: A container crashing does not remove a Pod from a node. The data in an emptyDir
https://kubernetes.io/docs/concepts/storage/_print/ 5/76
6/6/23, 3:54 PM Storage | Kubernetes
A size limit can be specified for the default medium, which limits the capacity of the emptyDir
volume. The storage is allocated from node ephemeral storage. If that is filled up from
another source (for example, log files or image overlays), the emptyDir may run out of
capacity before this limit.
Note: If the SizeMemoryBackedVolumes feature gate is enabled, you can specify a size for
memory backed volumes. If no size is specified, memory backed volumes are sized to
50% of the memory on a Linux host.
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 500Mi
fc (fibre channel)
An fc volume type allows an existing fibre channel block storage volume to mount in a Pod.
You can specify single or multiple target world wide names (WWNs) using the parameter
targetWWNs in your Volume configuration. If multiple WWNs are specified, targetWWNs
expect that those WWNs are from multi-path connections.
Note: You must configure FC SAN Zoning to allocate and mask those LUNs (volumes) to
the target WWNs beforehand so that Kubernetes hosts can access them.
gcePersistentDisk (deprecated)
FEATURE STATE: Kubernetes v1.17 [deprecated]
https://kubernetes.io/docs/concepts/storage/_print/ 6/76
6/6/23, 3:54 PM Storage | Kubernetes
A gcePersistentDisk volume mounts a Google Compute Engine (GCE) persistent disk (PD)
into your Pod. Unlike emptyDir , which is erased when a pod is removed, the contents of a PD
are preserved and the volume is merely unmounted. This means that a PD can be pre-
populated with data, and that data can be shared between pods.
Note: You must create a PD using gcloud or the GCE API or UI before you can use it.
One feature of GCE persistent disk is concurrent read-only access to a persistent disk. A
gcePersistentDisk volume permits multiple consumers to simultaneously mount a
persistent disk as read-only. This means that you can pre-populate a PD with your dataset and
then serve it in parallel from as many Pods as you need. Unfortunately, PDs can only be
mounted by a single consumer in read-write mode. Simultaneous writers are not allowed.
Using a GCE persistent disk with a Pod controlled by a ReplicaSet will fail unless the PD is
read-only or the replica count is 0 or 1.
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
# This GCE PD must already exist.
gcePersistentDisk:
pdName: my-data-disk
fsType: ext4
https://kubernetes.io/docs/concepts/storage/_print/ 7/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-volume
spec:
capacity:
storage: 400Gi
accessModes:
- ReadWriteOnce
gcePersistentDisk:
pdName: my-data-disk
fsType: ext4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
# failure-domain.beta.kubernetes.io/zone should be used prior to 1.21
- key: topology.kubernetes.io/zone
operator: In
values:
- us-central1-a
- us-central1-b
The CSIMigration feature for GCE PD, when enabled, redirects all plugin operations from the
existing in-tree plugin to the pd.csi.storage.gke.io Container Storage Interface (CSI) Driver.
In order to use this feature, the GCE PD CSI Driver must be installed on the cluster.
To disable the gcePersistentDisk storage plugin from being loaded by the controller
manager and the kubelet, set the InTreePluginGCEUnregister flag to true .
gitRepo (deprecated)
Warning: The gitRepo volume type is deprecated. To provision a container with a git
repo, mount an EmptyDir into an InitContainer that clones the repo using git, then mount
the EmptyDir into the Pod's container.
A gitRepo volume is an example of a volume plugin. This plugin mounts an empty directory
and clones a git repository into this directory for your Pod to use.
https://kubernetes.io/docs/concepts/storage/_print/ 8/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: server
spec:
containers:
- image: nginx
name: nginx
volumeMounts:
- mountPath: /mypath
name: git-volume
volumes:
- name: git-volume
gitRepo:
repository: "git@somewhere:me/my-git-repository.git"
revision: "22f1d8406d464b0c0874075539c1f2e96c253775"
glusterfs (removed)
Kubernetes 1.27 does not include a glusterfs volume type.
The GlusterFS in-tree storage driver was deprecated in the Kubernetes v1.25 release and then
removed entirely in the v1.26 release.
hostPath
Warning:
HostPath volumes present many security risks, and it is a best practice to avoid the use of
HostPaths when possible. When a HostPath volume must be used, it should be scoped to
only the required file or directory, and mounted as ReadOnly.
A hostPath volume mounts a file or directory from the host node's filesystem into your Pod.
This is not something that most Pods will need, but it offers a powerful escape hatch for some
applications.
In addition to the required path property, you can optionally specify a type for a hostPath
volume.
Value Behavior
Directory If nothing exists at the given path, an empty directory will be created
OrCreate there as needed with permission set to 0755, having the same group and
ownership with Kubelet.
https://kubernetes.io/docs/concepts/storage/_print/ 9/76
6/6/23, 3:54 PM Storage | Kubernetes
Value Behavior
FileOrCre If nothing exists at the given path, an empty file will be created there as
ate needed with permission set to 0644, having the same group and
ownership with Kubelet.
HostPaths can expose privileged system credentials (such as for the Kubelet) or
privileged APIs (such as container runtime socket), which can be used for container
escape or to attack other parts of the cluster.
Pods with identical configuration (such as created from a PodTemplate) may behave
differently on different nodes due to different files on the nodes
The files or directories created on the underlying hosts are only writable by root. You
either need to run your process as root in a privileged Container or modify the file
permissions on the host to be able to write to a hostPath volume
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
hostPath:
# directory location on host
path: /data
# this field is optional
type: Directory
Caution: The FileOrCreate mode does not create the parent directory of the file. If the
parent directory of the mounted file does not exist, the pod fails to start. To ensure that
this mode works, you can try to mount directories and files separately, as shown in the
FileOrCreateconfiguration.
https://kubernetes.io/docs/concepts/storage/_print/ 10/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: test-webserver
spec:
containers:
- name: test-webserver
image: registry.k8s.io/test-webserver:latest
volumeMounts:
- mountPath: /var/local/aaa
name: mydir
- mountPath: /var/local/aaa/1.txt
name: myfile
volumes:
- name: mydir
hostPath:
# Ensure the file directory is created.
path: /var/local/aaa
type: DirectoryOrCreate
- name: myfile
hostPath:
path: /var/local/aaa/1.txt
type: FileOrCreate
iscsi
An iscsi volume allows an existing iSCSI (SCSI over IP) volume to be mounted into your Pod.
Unlike emptyDir , which is erased when a Pod is removed, the contents of an iscsi volume
are preserved and the volume is merely unmounted. This means that an iscsi volume can be
pre-populated with data, and that data can be shared between pods.
Note: You must have your own iSCSI server running with the volume created before you
can use it.
local
A local volume represents a mounted local storage device such as a disk, partition or
directory.
Compared to hostPath volumes, local volumes are used in a durable and portable manner
without manually scheduling pods to nodes. The system is aware of the volume's node
constraints by looking at the node affinity on the PersistentVolume.
However, local volumes are subject to the availability of the underlying node and are not
suitable for all applications. If a node becomes unhealthy, then the local volume becomes
inaccessible by the pod. The pod using this volume is unable to run. Applications using local
volumes must be able to tolerate this reduced availability, as well as potential data loss,
depending on the durability characteristics of the underlying disk.
The following example shows a PersistentVolume using a local volume and nodeAffinity :
https://kubernetes.io/docs/concepts/storage/_print/ 11/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
capacity:
storage: 100Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- example-node
You must set a PersistentVolume nodeAffinity when using local volumes. The Kubernetes
scheduler uses the PersistentVolume nodeAffinity to schedule these Pods to the correct
node.
PersistentVolume volumeMode can be set to "Block" (instead of the default value "Filesystem")
to expose the local volume as a raw block device.
An external static provisioner can be run separately for improved management of the local
volume lifecycle. Note that this provisioner does not support dynamic provisioning yet. For an
example on how to run an external local provisioner, see the local volume provisioner user
guide.
Note: The local PersistentVolume requires manual cleanup and deletion by the user if the
external static provisioner is not used to manage the volume lifecycle.
nfs
An nfs volume allows an existing NFS (Network File System) share to be mounted into a Pod.
Unlike emptyDir , which is erased when a Pod is removed, the contents of an nfs volume are
preserved and the volume is merely unmounted. This means that an NFS volume can be pre-
populated with data, and that data can be shared between pods. NFS can be mounted by
multiple writers simultaneously.
https://kubernetes.io/docs/concepts/storage/_print/ 12/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /my-nfs-data
name: test-volume
volumes:
- name: test-volume
nfs:
server: my-nfs-server.example.com
path: /my-nfs-volume
readOnly: true
Note:
You must have your own NFS server running with the share exported before you can use
it.
Also note that you can't specify NFS mount options in a Pod spec. You can either set
mount options server-side or use /etc/nfsmount.conf. You can also mount NFS volumes
via PersistentVolumes which do allow you to set mount options.
See the NFS example for an example of mounting NFS volumes with PersistentVolumes.
persistentVolumeClaim
A persistentVolumeClaim volume is used to mount a PersistentVolume into a Pod.
PersistentVolumeClaims are a way for users to "claim" durable storage (such as a GCE
PersistentDisk or an iSCSI volume) without knowing the details of the particular cloud
environment.
portworxVolume (deprecated)
FEATURE STATE: Kubernetes v1.25 [deprecated]
https://kubernetes.io/docs/concepts/storage/_print/ 13/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: test-portworx-volume-pod
spec:
containers:
- image: registry.k8s.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /mnt
name: pxvol
volumes:
- name: pxvol
# This Portworx volume must already exist.
portworxVolume:
volumeID: "pxvol"
fsType: "<fs-type>"
Note: Make sure you have an existing PortworxVolume with name pxvol before using it
in the Pod.
The CSIMigration feature for Portworx has been added but disabled by default in
Kubernetes 1.23 since it's in alpha state. It has been beta now since v1.25 but it is still turned
off by default. It redirects all plugin operations from the existing in-tree plugin to the
pxd.portworx.com Container Storage Interface (CSI) Driver. Portworx CSI Driver must be
installed on the cluster. To enable the feature, set CSIMigrationPortworx=true in kube-
controller-manager and kubelet.
projected
A projected volume maps several existing volume sources into the same directory. For more
details, see projected volumes.
rbd
An volume allows a Rados Block Device (RBD) volume to mount into your Pod. Unlike
rbd
emptyDir , which is erased when a pod is removed, the contents of an rbd volume are
preserved and the volume is unmounted. This means that a RBD volume can be pre-
populated with data, and that data can be shared between pods.
Note: You must have a Ceph installation running before you can use RBD.
https://kubernetes.io/docs/concepts/storage/_print/ 14/76
6/6/23, 3:54 PM Storage | Kubernetes
The CSIMigration feature for RBD , when enabled, redirects all plugin operations from the
existing in-tree plugin to the rbd.csi.ceph.com CSI driver. In order to use this feature, the
Ceph CSI driver must be installed on the cluster and the CSIMigrationRBD feature gate must
be enabled. (Note that the csiMigrationRBD flag has been removed and replaced with
CSIMigrationRBD in release v1.24)
Note:
As a Kubernetes cluster operator that administers storage, here are the prerequisites that
you must complete before you attempt migration to the RBD CSI driver:
You must install the Ceph CSI driver ( rbd.csi.ceph.com ), v3.5.0 or above, into your
Kubernetes cluster.
considering the clusterID field is a required parameter for CSI driver for its
operations, but in-tree StorageClass has monitors field as a required parameter, a
Kubernetes storage admin has to create a clusterID based on the monitors hash (
ex: #echo -n '<monitors_string>' | md5sum ) in the CSI config map and keep the
monitors under this clusterID configuration.
Also, if the value of adminId in the in-tree Storageclass is different from admin , the
adminSecretName mentioned in the in-tree Storageclass has to be patched with the
base64 value of the adminId parameter value, otherwise this step can be skipped.
secret
A secret volume is used to pass sensitive information, such as passwords, to Pods. You can
store secrets in the Kubernetes API and mount them as files for use by pods without coupling
to Kubernetes directly. secret volumes are backed by tmpfs (a RAM-backed filesystem) so
they are never written to non-volatile storage.
Note:
You must create a Secret in the Kubernetes API before you can use it.
A container using a Secret as a subPath volume mount will not receive Secret
updates.
vsphereVolume (deprecated)
A vsphereVolume is used to mount a vSphere VMDK volume into your Pod. The contents of a
volume are preserved when it is unmounted. It supports both VMFS and VSAN datastore.
In Kubernetes 1.27, all operations for the in-tree vsphereVolume type are redirected to the
csi.vsphere.vmware.com CSI driver.
vSphere CSI driver must be installed on the cluster. You can find additional advice on how to
migrate in-tree vsphereVolume in VMware's documentation page Migrating In-Tree vSphere
Volumes to vSphere Container Storage lug-in. If vSphere CSI Driver is not installed volume
operations can not be performed on the PV created with the in-tree vsphereVolume type.
https://kubernetes.io/docs/concepts/storage/_print/ 15/76
6/6/23, 3:54 PM Storage | Kubernetes
You must run vSphere 7.0u2 or later in order to migrate to the vSphere CSI driver.
If you are running a version of Kubernetes other than v1.27, consult the documentation for
that version of Kubernetes.
Note:
The following StorageClass parameters from the built-in vsphereVolume plugin are not
supported by the vSphere CSI driver:
diskformat
hostfailurestotolerate
forceprovisioning
cachereservation
diskstripes
objectspacereservation
iopslimit
Existing volumes created using these parameters will be migrated to the vSphere CSI
driver, but new volumes created by the vSphere CSI driver will not be honoring these
parameters.
To turn off the vsphereVolume plugin from being loaded by the controller manager and the
kubelet, you need to set InTreePluginvSphereUnregister feature flag to true . You must
install a csi.vsphere.vmware.com CSI driver on all worker nodes.
Using subPath
Sometimes, it is useful to share one volume for multiple uses in a single pod. The
volumeMounts.subPath property specifies a sub-path inside the referenced volume instead of
its root.
The following example shows how to configure a Pod with a LAMP stack (Linux Apache MySQL
PHP) using a single, shared volume. This sample subPath configuration is not recommended
for production use.
The PHP application's code and assets map to the volume's html folder and the MySQL
database is stored in the volume's mysql folder. For example:
https://kubernetes.io/docs/concepts/storage/_print/ 16/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: my-lamp-site
spec:
containers:
- name: mysql
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "rootpasswd"
volumeMounts:
- mountPath: /var/lib/mysql
name: site-data
subPath: mysql
- name: php
image: php:7.0-apache
volumeMounts:
- mountPath: /var/www/html
name: site-data
subPath: html
volumes:
- name: site-data
persistentVolumeClaim:
claimName: my-lamp-site-data
Use the subPathExpr field to construct subPath directory names from downward API
environment variables. The subPath and subPathExpr properties are mutually exclusive.
In this example, a Pod uses subPathExpr to create a directory pod1 within the hostPath
volume /var/log/pods . The hostPath volume takes the Pod name from the downwardAPI .
The host directory /var/log/pods/pod1 is mounted at /logs in the container.
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
containers:
- name: container1
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: busybox:1.28
command: [ "sh", "-c", "while [ true ]; do echo 'Hello'; sleep 10; done | tee
volumeMounts:
- name: workdir1
mountPath: /logs
# The variable expansion uses round brackets (not curly brackets).
subPathExpr: $(POD_NAME)
restartPolicy: Never
volumes:
- name: workdir1
hostPath:
path: /var/log/pods
https://kubernetes.io/docs/concepts/storage/_print/ 17/76
6/6/23, 3:54 PM Storage | Kubernetes
Resources
The storage media (such as Disk or SSD) of an emptyDir volume is determined by the
medium of the filesystem holding the kubelet root dir (typically /var/lib/kubelet ). There is
no limit on how much space an emptyDir or hostPath volume can consume, and no
isolation between containers or between pods.
To learn about requesting space using a resource specification, see how to manage resources.
Previously, all volume plugins were "in-tree". The "in-tree" plugins were built, linked, compiled,
and shipped with the core Kubernetes binaries. This meant that adding a new storage system
to Kubernetes (a volume plugin) required checking code into the core Kubernetes code
repository.
Both CSI and FlexVolume allow volume plugins to be developed independent of the
Kubernetes code base, and deployed (installed) on Kubernetes clusters as extensions.
For storage vendors looking to create an out-of-tree volume plugin, please refer to the volume
plugin FAQ.
csi
Container Storage Interface (CSI) defines a standard interface for container orchestration
systems (like Kubernetes) to expose arbitrary storage systems to their container workloads.
Note: Support for CSI spec versions 0.2 and 0.3 are deprecated in Kubernetes v1.13 and
will be removed in a future release.
Note: CSI drivers may not be compatible across all Kubernetes releases. Please check the
specific CSI driver's documentation for supported deployments steps for each Kubernetes
release and a compatibility matrix.
Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the
csi volume type to attach or mount the volumes exposed by the CSI driver.
The following fields are available to storage administrators to configure a CSI persistent
volume:
driver: A string value that specifies the name of the volume driver to use. This value
must correspond to the value returned in the GetPluginInfoResponse by the CSI driver
as defined in the CSI spec. It is used by Kubernetes to identify which CSI driver to call out
to, and by CSI driver components to identify which PV objects belong to the CSI driver.
volumeHandle : A string value that uniquely identifies the volume. This value must
correspond to the value returned in the volume.id field of the CreateVolumeResponse
by the CSI driver as defined in the CSI spec. The value is passed as volume_id on all calls
to the CSI volume driver when referencing the volume.
https://kubernetes.io/docs/concepts/storage/_print/ 18/76
6/6/23, 3:54 PM Storage | Kubernetes
Vendors with external CSI drivers can implement raw block volume support in Kubernetes
workloads.
You can set up your PersistentVolume/PersistentVolumeClaim with raw block volume support
as usual, without any CSI specific changes.
You can directly configure CSI volumes within the Pod specification. Volumes specified in this
way are ephemeral and do not persist across pod restarts. See Ephemeral Volumes for more
information.
For more information on how to develop a CSI driver, refer to the kubernetes-csi
documentation
https://kubernetes.io/docs/concepts/storage/_print/ 19/76
6/6/23, 3:54 PM Storage | Kubernetes
CSI node plugins need to perform various privileged operations like scanning of disk devices
and mounting of file systems. These operations differ for each host operating system. For
Linux worker nodes, containerized CSI node node plugins are typically deployed as privileged
containers. For Windows worker nodes, privileged operations for containerized CSI node
plugins is supported using csi-proxy, a community-managed, stand-alone binary that needs to
be pre-installed on each Windows node.
For more details, refer to the deployment guide of the CSI plugin you wish to deploy.
The operations and features that are supported include: provisioning/delete, attach/detach,
mount/unmount and resizing of volumes.
In-tree plugins that support CSIMigration and have a corresponding CSI driver implemented
are listed in Types of Volumes.
awsElasticBlockStore
azureDisk
azureFile
gcePersistentDisk
vsphereVolume
flexVolume (deprecated)
FEATURE STATE: Kubernetes v1.23 [deprecated]
FlexVolume is an out-of-tree plugin interface that uses an exec-based model to interface with
storage drivers. The FlexVolume driver binaries must be installed in a pre-defined volume
plugin path on each node and in some cases the control plane nodes as well.
Pods interact with FlexVolume drivers through the flexVolume in-tree volume plugin. For
more details, see the FlexVolume README document.
The following FlexVolume plugins, deployed as PowerShell scripts on the host, support
Windows nodes:
SMB
iSCSI
Note:
FlexVolume is deprecated. Using an out-of-tree CSI driver is the recommended way to
integrate external storage with Kubernetes.
Maintainers of FlexVolume driver should implement a CSI Driver and help to migrate
users of FlexVolume drivers to CSI. Users of FlexVolume should move their workloads to
use the equivalent CSI Driver.
Mount propagation
Mount propagation allows for sharing volumes mounted by a container to other containers in
the same pod, or even to other pods on the same node.
https://kubernetes.io/docs/concepts/storage/_print/ 20/76
6/6/23, 3:54 PM Storage | Kubernetes
None - This volume mount will not receive any subsequent mounts that are mounted to
this volume or any of its subdirectories by the host. In similar fashion, no mounts
created by the container will be visible on the host. This is the default mode.
However, the CRI runtime may choose rslave mount propagation (i.e.,
HostToContainer ) instead, when rprivate propagation is not applicable. cri-dockerd
(Docker) is known to choose rslave mount propagation when the mount source
contains the Docker daemon's root directory ( /var/lib/docker ).
HostToContainer - This volume mount will receive all subsequent mounts that are
mounted to this volume or any of its subdirectories.
In other words, if the host mounts anything inside the volume mount, the container will
see it mounted there.
Similarly, if any Pod with Bidirectional mount propagation to the same volume
mounts anything there, the container with HostToContainer mount propagation will see
it.
Bidirectional - This volume mount behaves the same the HostToContainer mount. In
addition, all volume mounts created by the container will be propagated back to the
host and to all containers of all pods that use the same volume.
A typical use case for this mode is a Pod with a FlexVolume or CSI driver or a Pod that
needs to mount something on the host using a hostPath volume.
Configuration
Before mount propagation can work properly on some deployments (CoreOS, RedHat/Centos,
Ubuntu) mount share must be configured correctly in Docker as shown below.
MountFlags=shared
What's next
Follow an example of deploying WordPress and MySQL with Persistent Volumes.
https://kubernetes.io/docs/concepts/storage/_print/ 21/76
6/6/23, 3:54 PM Storage | Kubernetes
2 - Persistent Volumes
This document describes persistent volumes in Kubernetes. Familiarity with volumes is
suggested.
Introduction
Managing storage is a distinct problem from managing compute instances. The
PersistentVolume subsystem provides an API for users and administrators that abstracts
details of how storage is provided from how it is consumed. To do this, we introduce two new
API resources: PersistentVolume and PersistentVolumeClaim.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an
administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster
just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle
independent of any individual Pod that uses the PV. This API object captures the details of the
implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
Provisioning
There are two ways PVs may be provisioned: statically or dynamically.
Static
A cluster administrator creates a number of PVs. They carry the details of the real storage,
which is available for use by cluster users. They exist in the Kubernetes API and are available
for consumption.
Dynamic
When none of the static PVs the administrator created match a user's PersistentVolumeClaim,
the cluster may try to dynamically provision a volume specially for the PVC. This provisioning
is based on StorageClasses: the PVC must request a storage class and the administrator must
have created and configured that class for dynamic provisioning to occur. Claims that request
the class "" effectively disable dynamic provisioning for themselves.
To enable dynamic storage provisioning based on storage class, the cluster administrator
needs to enable the DefaultStorageClass admission controller on the API server. This can be
done, for example, by ensuring that DefaultStorageClass is among the comma-delimited,
ordered list of values for the --enable-admission-plugins flag of the API server component.
For more information on API server command-line flags, check kube-apiserver
documentation.
https://kubernetes.io/docs/concepts/storage/_print/ 22/76
6/6/23, 3:54 PM Storage | Kubernetes
Binding
A user creates, or in the case of dynamic provisioning, has already created, a
PersistentVolumeClaim with a specific amount of storage requested and with certain access
modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible),
and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will
always bind that PV to the PVC. Otherwise, the user will always get at least what they asked
for, but the volume may be in excess of what was requested. Once bound,
PersistentVolumeClaim binds are exclusive, regardless of how they were bound. A PVC to PV
binding is a one-to-one mapping, using a ClaimRef which is a bi-directional binding between
the PersistentVolume and the PersistentVolumeClaim.
Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be
bound as matching volumes become available. For example, a cluster provisioned with many
50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is
added to the cluster.
Using
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and
mounts that volume for a Pod. For volumes that support multiple access modes, the user
specifies which mode is desired when using their claim as a volume in a Pod.
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long
as they need it. Users schedule Pods and access their claimed PVs by including a
persistentVolumeClaim section in a Pod's volumes block. See Claims As Volumes for more
details on this.
Note: PVC is in active use by a Pod when a Pod object exists that is using the PVC.
If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately. PVC
removal is postponed until the PVC is no longer actively used by any Pods. Also, if an admin
deletes a PV that is bound to a PVC, the PV is not removed immediately. PV removal is
postponed until the PV is no longer bound to a PVC.
You can see that a PVC is protected when the PVC's status is Terminating and the
Finalizers list includes kubernetes.io/pvc-protection :
You can see that a PV is protected when the PV's status is Terminating and the Finalizers
list includes kubernetes.io/pv-protection too:
https://kubernetes.io/docs/concepts/storage/_print/ 23/76
6/6/23, 3:54 PM Storage | Kubernetes
Reclaiming
When a user is done with their volume, they can delete the PVC objects from the API that
allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster
what to do with the volume after it has been released of its claim. Currently, volumes can
either be Retained, Recycled, or Deleted.
Retain
The Retain reclaim policy allows for manual reclamation of the resource. When the
PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is
considered "released". But it is not yet available for another claim because the previous
claimant's data remains on the volume. An administrator can manually reclaim the volume
with the following steps.
If you want to reuse the same storage asset, create a new PersistentVolume with the same
storage asset definition.
Delete
For volume plugins that support the Delete reclaim policy, deletion removes both the
PersistentVolume object from Kubernetes, as well as the associated storage asset in the
external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes
that were dynamically provisioned inherit the reclaim policy of their StorageClass, which
defaults to Delete . The administrator should configure the StorageClass according to users'
expectations; otherwise, the PV must be edited or patched after it is created. See Change the
Reclaim Policy of a PersistentVolume.
Recycle
Warning: The Recycle reclaim policy is deprecated. Instead, the recommended approach
is to use dynamic provisioning.
If supported by the underlying volume plugin, the Recycle reclaim policy performs a basic
scrub ( rm -rf /thevolume/* ) on the volume and makes it available again for a new claim.
https://kubernetes.io/docs/concepts/storage/_print/ 24/76
6/6/23, 3:54 PM Storage | Kubernetes
However, an administrator can configure a custom recycler Pod template using the
Kubernetes controller manager command line arguments as described in the reference. The
custom recycler Pod template must contain a volumes specification, as shown in the example
below:
apiVersion: v1
kind: Pod
metadata:
name: pv-recycler
namespace: default
spec:
restartPolicy: Never
volumes:
- name: vol
hostPath:
path: /any/path/it/will/be/replaced
containers:
- name: pv-recycler
image: "registry.k8s.io/busybox"
command: ["/bin/sh", "-c", "test -e /scrub && rm -rf /scrub/..?* /scrub/.[!.]
volumeMounts:
- name: vol
mountPath: /scrub
However, the particular path specified in the custom recycler Pod template in the volumes
part is replaced with the particular path of the volume that is being recycled.
https://kubernetes.io/docs/concepts/storage/_print/ 25/76
6/6/23, 3:54 PM Storage | Kubernetes
Name: pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pv-protection external-provisioner.volume.kuberne
StorageClass: fast
Status: Bound
Claim: demo-app/nginx-logs
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 200Mi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: csi.vsphere.vmware.com
FSType: ext4
VolumeHandle: 44830fa8-79b4-406b-8b58-621ba25353fd
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=164844235
type=vSphere CNS Block Volume
Events: <none>
When the CSIMigration{provider} feature flag is enabled for a specific in-tree volume
plugin, the kubernetes.io/pv-controller finalizer is replaced by the external-
provisioner.volume.kubernetes.io/finalizer finalizer.
Reserving a PersistentVolume
The control plane can bind PersistentVolumeClaims to matching PersistentVolumes in the
cluster. However, if you want a PVC to bind to a specific PV, you need to pre-bind them.
The binding happens regardless of some volume matching criteria, including node affinity.
The control plane still checks that storage class, access modes, and requested storage size are
valid.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: foo-pvc
namespace: foo
spec:
storageClassName: "" # Empty string must be explicitly set otherwise default St
volumeName: foo-pv
...
This method does not guarantee any binding privileges to the PersistentVolume. If other
PersistentVolumeClaims could use the PV that you specify, you first need to reserve that
storage volume. Specify the relevant PersistentVolumeClaim in the claimRef field of the PV
so that other PVCs can not bind to it.
https://kubernetes.io/docs/concepts/storage/_print/ 26/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: PersistentVolume
metadata:
name: foo-pv
spec:
storageClassName: ""
claimRef:
name: foo-pvc
namespace: foo
...
This is useful if you want to consume PersistentVolumes that have their claimPolicy set to
Retain , including cases where you are reusing an existing PV.
Support for expanding PersistentVolumeClaims (PVCs) is enabled by default. You can expand
the following types of volumes:
azureDisk
azureFile
awsElasticBlockStore
cinder (deprecated)
csi
flexVolume (deprecated)
gcePersistentDisk
rbd
portworxVolume
You can only expand a PVC if its storage class's allowVolumeExpansion field is set to true.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example-vol-default
provisioner: vendor-name.example/magicstorage
parameters:
resturl: "http://192.168.10.100:8080"
restuser: ""
secretNamespace: ""
secretName: ""
allowVolumeExpansion: true
To request a larger volume for a PVC, edit the PVC object and specify a larger size. This
triggers expansion of the volume that backs the underlying PersistentVolume. A new
PersistentVolume is never created to satisfy the claim. Instead, an existing volume is resized.
Warning: Directly editing the size of a PersistentVolume can prevent an automatic resize
of that volume. If you edit the capacity of a PersistentVolume, and then edit the .spec of a
matching PersistentVolumeClaim to make the size of the PersistentVolumeClaim match
the PersistentVolume, then no storage resize happens. The Kubernetes control plane will
see that the desired state of both resources matches, conclude that the backing volume
size has been manually increased and that no resize is necessary.
Support for expanding CSI volumes is enabled by default but it also requires a specific CSI
driver to support volume expansion. Refer to documentation of the specific CSI driver for
more information.
When a volume contains a file system, the file system is only resized when a new Pod is using
the PersistentVolumeClaim in ReadWrite mode. File system expansion is either done when a
Pod is starting up or when a Pod is running and the underlying file system supports online
expansion.
FlexVolumes (deprecated since Kubernetes v1.23) allow resize if the driver is configured with
the RequiresFSResize capability to true . The FlexVolume can be resized on Pod restart.
In this case, you don't need to delete and recreate a Pod or deployment that is using an
existing PVC. Any in-use PVC automatically becomes available to its Pod as soon as its file
system has been expanded. This feature has no effect on PVCs that are not in use by a Pod or
deployment. You must create a Pod that uses the PVC before the expansion can complete.
Similar to other volume types - FlexVolume volumes can also be expanded when in-use by a
Pod.
Note: FlexVolume resize is possible only when the underlying driver supports resize.
If expanding underlying storage fails, the cluster administrator can manually recover the
Persistent Volume Claim (PVC) state and cancel the resize requests. Otherwise, the resize
requests are continuously retried by the controller without administrator intervention.
https://kubernetes.io/docs/concepts/storage/_print/ 28/76
6/6/23, 3:54 PM Storage | Kubernetes
hostPath - HostPath volume (for single node testing only; WILL NOT WORK in a multi-
node cluster; consider using local volume instead)
iscsi - iSCSI (SCSI over IP) storage
The following types of PersistentVolume are deprecated. This means that support is still
available but will be removed in a future Kubernetes release.
Older versions of Kubernetes also supported the following in-tree PersistentVolume types:
Persistent Volumes
Each PV contains a spec and status, which is the specification and status of the volume. The
name of a PersistentVolume object must be a valid DNS subdomain name.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: slow
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /tmp
server: 172.17.0.2
https://kubernetes.io/docs/concepts/storage/_print/ 29/76
6/6/23, 3:54 PM Storage | Kubernetes
Note: Helper programs relating to the volume type may be required for consumption of a
PersistentVolume within a cluster. In this example, the PersistentVolume is of type NFS
and the helper program /sbin/mount.nfs is required to support the mounting of NFS
filesystems.
Capacity
Generally, a PV will have a specific storage capacity. This is set using the PV's capacity
attribute. Read the glossary term Quantity to understand the units expected by capacity .
Currently, storage size is the only resource that can be set or requested. Future attributes
may include IOPS, throughput, etc.
Volume Mode
FEATURE STATE: Kubernetes v1.18 [stable]
volumeMode is an optional API parameter. Filesystem is the default mode used when
volumeMode parameter is omitted.
A volume with volumeMode: Filesystem is mounted into Pods into a directory. If the volume is
backed by a block device and the device is empty, Kubernetes creates a filesystem on the
device before mounting it for the first time.
You can set the value of volumeMode to Block to use a volume as a raw block device. Such
volume is presented into a Pod as a block device, without any filesystem on it. This mode is
useful to provide a Pod the fastest possible way to access a volume, without any filesystem
layer between the Pod and the volume. On the other hand, the application running in the Pod
must know how to handle a raw block device. See Raw Block Volume Support for an example
on how to use a volume with volumeMode: Block in a Pod.
Access Modes
A PersistentVolume can be mounted on a host in any way supported by the resource
provider. As shown in the table below, providers will have different capabilities and each PV's
access modes are set to the specific modes supported by that particular volume. For example,
NFS can support multiple read/write clients, but a specific NFS PV might be exported on the
server as read-only. Each PV gets its own set of access modes describing that specific PV's
capabilities.
ReadWriteOnce
the volume can be mounted as read-write by a single node. ReadWriteOnce access mode
still can allow multiple pods to access the volume when the pods are running on the same
node.
ReadOnlyMany
ReadWriteMany
ReadWriteOncePod
the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod access
mode if you want to ensure that only one pod across whole cluster can read that PVC or
write to it. This is only supported for CSI volumes and Kubernetes version 1.22+.
https://kubernetes.io/docs/concepts/storage/_print/ 30/76
6/6/23, 3:54 PM Storage | Kubernetes
The blog article Introducing Single Pod Access Mode for PersistentVolumes covers this in
more detail.
RWO - ReadWriteOnce
ROX - ReadOnlyMany
RWX - ReadWriteMany
RWOP - ReadWriteOncePod
Important! A volume can only be mounted using one access mode at a time, even if it
supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a
single node or ReadOnlyMany by many nodes, but not at the same time.
Volume
Plugin ReadWriteOnce ReadOnlyMany ReadWriteMany ReadWriteOncePod
AWSEla ✓ - - -
sticBloc
kStore
AzureFil ✓ ✓ ✓ -
e
AzureDi ✓ - - -
sk
CephFS ✓ ✓ ✓ -
FC ✓ ✓ - -
GCEPer ✓ ✓ - -
sistent
Disk
Glusterf ✓ ✓ ✓ -
s
HostPat ✓ - - -
h
https://kubernetes.io/docs/concepts/storage/_print/ 31/76
6/6/23, 3:54 PM Storage | Kubernetes
Volume
Plugin ReadWriteOnce ReadOnlyMany ReadWriteMany ReadWriteOncePod
iSCSI ✓ ✓ - -
NFS ✓ ✓ ✓ -
RBD ✓ ✓ - -
Portwor ✓ - ✓ -
xVolum
e
Class
A PV can have a class, which is specified by setting the storageClassName attribute to the
name of a StorageClass. A PV of a particular class can only be bound to PVCs requesting that
class. A PV with no storageClassName has no class and can only be bound to PVCs that
request no particular class.
Reclaim Policy
Current reclaim policies are:
Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk, and Cinder
volumes support deletion.
Mount Options
A Kubernetes administrator can specify additional mount options for when a Persistent
Volume is mounted on a node.
awsElasticBlockStore
azureDisk
azureFile
cephfs
iscsi
nfs
rbd
vsphereVolume
https://kubernetes.io/docs/concepts/storage/_print/ 32/76
6/6/23, 3:54 PM Storage | Kubernetes
Mount options are not validated. If a mount option is invalid, the mount fails.
Node Affinity
Note: For most volume types, you do not need to set this field. It is automatically
populated for AWS EBS, GCE PD and Azure Disk volume block types. You need to explicitly
set this for local volumes.
A PV can specify node affinity to define constraints that limit what nodes this volume can be
accessed from. Pods that use a PV will only be scheduled to nodes that are selected by the
node affinity. To specify node affinity, set nodeAffinity in the .spec of a PV. The
PersistentVolume API reference has more details on this field.
Phase
A volume will be in one of the following phases:
The CLI will show the name of the PVC bound to the PV.
PersistentVolumeClaims
Each PVC contains a spec and status, which is the specification and status of the claim. The
name of a PersistentVolumeClaim object must be a valid DNS subdomain name.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: slow
selector:
matchLabels:
release: "stable"
matchExpressions:
- {key: environment, operator: In, values: [dev]}
Access Modes
Claims use the same conventions as volumes when requesting storage with specific access
modes.
https://kubernetes.io/docs/concepts/storage/_print/ 33/76
6/6/23, 3:54 PM Storage | Kubernetes
Volume Modes
Claims use the same convention as volumes to indicate the consumption of the volume as
either a filesystem or block device.
Resources
Claims, like Pods, can request specific quantities of a resource. In this case, the request is for
storage. The same resource model applies to both volumes and claims.
Selector
Claims can specify a label selector to further filter the set of volumes. Only the volumes whose
labels match the selector can be bound to the claim. The selector can consist of two fields:
All of the requirements, from both matchLabels and matchExpressions , are ANDed together
– they must all be satisfied in order to match.
Class
A claim can request a particular class by specifying the name of a StorageClass using the
attribute storageClassName . Only PVs of the requested class, ones with the same
storageClassName as the PVC, can be bound to the PVC.
PVCs don't necessarily have to request a class. A PVC with its storageClassName set equal to
"" is always interpreted to be requesting a PV with no class, so it can only be bound to PVs
with no class (no annotation or one set equal to "" ). A PVC with no storageClassName is not
quite the same and is treated differently by the cluster, depending on whether the
DefaultStorageClass admission plugin is turned on.
If the admission plugin is turned on, the administrator may specify a default
StorageClass. All PVCs that have no storageClassName can be bound only to PVs of that
default. Specifying a default StorageClass is done by setting the annotation
storageclass.kubernetes.io/is-default-class equal to true in a StorageClass
object. If the administrator does not specify a default, the cluster responds to PVC
creation as if the admission plugin were turned off. If more than one default is specified,
the admission plugin forbids the creation of all PVCs.
If the admission plugin is turned off, there is no notion of a default StorageClass. All
PVCs that have storageClassName set to "" can be bound only to PVs that have
storageClassName also set to "" . However, PVCs with missing storageClassName can
be updated later once default StorageClass becomes available. If the PVC gets updated it
will no longer bind to PVs that have storageClassName also set to "" .
You can create a PersistentVolumeClaim without specifying a storageClassName for the new
PVC, and you can do so even when no default StorageClass exists in your cluster. In this case,
the new PVC creates as you defined it, and the storageClassName of that PVC remains unset
until default becomes available.
When a default StorageClass becomes available, the control plane identifies any existing PVCs
without storageClassName . For the PVCs that either have an empty value for
storageClassName or do not have this key, the control plane then updates those PVCs to set
storageClassName to match the new default StorageClass. If you have an existing PVC where
the storageClassName is "" , and you configure a default StorageClass, then this PVC will not
get updated.
In order to keep binding to PVs with storageClassName set to "" (while a default
StorageClass is present), you need to set the storageClassName of the associated PVC to "" .
This behavior helps administrators change default StorageClass by removing the old one first
and then creating or setting another one. This brief window while there is no default causes
PVCs without storageClassName created at that time to not have any default, but due to the
retroactive default StorageClass assignment this way of changing defaults is safe.
Claims As Volumes
Pods access storage by using the claim as a volume. Claims must exist in the same
namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace
and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the
host and into the Pod.
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim
A Note on Namespaces
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are namespaced
objects, mounting claims with "Many" modes ( ROX , RWX ) is only possible within one
namespace.
https://kubernetes.io/docs/concepts/storage/_print/ 35/76
6/6/23, 3:54 PM Storage | Kubernetes
The following volume plugins support raw block volumes, including dynamic provisioning
where applicable:
AWSElasticBlockStore
AzureDisk
CSI
FC (Fibre Channel)
GCEPersistentDisk
iSCSI
Local volume
OpenStack Cinder
RBD (Ceph Block Device)
VsphereVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: block-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
volumeMode: Block
persistentVolumeReclaimPolicy: Retain
fc:
targetWWNs: ["50060e801049cfd1"]
lun: 0
readOnly: false
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: block-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Block
resources:
requests:
storage: 10Gi
https://kubernetes.io/docs/concepts/storage/_print/ 36/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: pod-with-block-volume
spec:
containers:
- name: fc-container
image: fedora:26
command: ["/bin/sh", "-c"]
args: [ "tail -f /dev/null" ]
volumeDevices:
- name: data
devicePath: /dev/xvda
volumes:
- name: data
persistentVolumeClaim:
claimName: block-pvc
Note: When adding a raw block device for a Pod, you specify the device path in the
container instead of a mount path.
Note: Only statically provisioned volumes are supported for alpha release.
Administrators should take care to consider these values when working with raw block
devices.
https://kubernetes.io/docs/concepts/storage/_print/ 37/76
6/6/23, 3:54 PM Storage | Kubernetes
Volume snapshots only support the out-of-tree CSI volume plugins. For details, see Volume
Snapshots. In-tree volume plugins are deprecated. You can read about the deprecated
volume plugins in the Volume Plugin FAQ.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restore-pvc
spec:
storageClassName: csi-hostpath-sc
dataSource:
name: new-snapshot-test
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Volume Cloning
Volume Cloning only available for CSI volume plugins.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cloned-pvc
spec:
storageClassName: my-csi-plugin
dataSource:
name: existing-src-pvc-name
kind: PersistentVolumeClaim
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Kubernetes supports custom volume populators. To use custom volume populators, you
must enable the AnyVolumeDataSource feature gate for the kube-apiserver and kube-
controller-manager.
https://kubernetes.io/docs/concepts/storage/_print/ 38/76
6/6/23, 3:54 PM Storage | Kubernetes
Volume populators take advantage of a PVC spec field called dataSourceRef . Unlike the
dataSource field, which can only contain either a reference to another
PersistentVolumeClaim or to a VolumeSnapshot, the dataSourceRef field can contain a
reference to any object in the same namespace, except for core objects other than PVCs. For
clusters that have the feature gate enabled, use of the dataSourceRef is preferred over
dataSource .
Kubernetes supports cross namespace volume data sources. To use cross namespace volume
data sources, you must enable the AnyVolumeDataSource and
CrossNamespaceVolumeDataSource feature gates for the kube-apiserver, kube-controller-
manager. Also, you must enable the CrossNamespaceVolumeDataSource feature gate for the
csi-provisioner.
Note: When you specify a namespace for a volume data source, Kubernetes checks for a
ReferenceGrant in the other namespace before accepting the reference. ReferenceGrant
is part of the gateway.networking.k8s.io extension APIs. See ReferenceGrant in the
Gateway API documentation for details. This means that you must extend your
Kubernetes cluster with at least ReferenceGrant from the Gateway API before you can
use this mechanism.
There are two differences between the dataSourceRef field and the dataSource field that
users should be aware of:
The dataSourcefield ignores invalid values (as if the field was blank) while the
dataSourceRef field never ignores values and will cause an error if an invalid value is
used. Invalid values are any core object (objects with no apiGroup) except for PVCs.
The dataSourceRef field may contain different types of objects, while the dataSource
field only allows PVCs and VolumeSnapshots.
The dataSource field only allows local objects, while the dataSourceRef field allows
objects in any namespaces.
When namespace is specified, dataSource and dataSourceRef are not synced.
Users should always use dataSourceRef on clusters that have the feature gate enabled, and
fall back to dataSource on clusters that do not. It is not necessary to look at both fields under
any circumstance. The duplicated values with slightly different semantics exist only for
backwards compatibility. In particular, a mixture of older and newer controllers are able to
interoperate because the fields are the same.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: populated-pvc
spec:
dataSourceRef:
name: example-name
kind: ExampleDataSource
apiGroup: example.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Because volume populators are external components, attempts to create a PVC that uses one
can fail if not all the correct components are installed. External controllers should generate
events on the PVC to provide feedback on the status of the creation, including warnings if the
PVC cannot be created due to some missing component.
You can install the alpha volume data source validator controller into your cluster. That
controller generates warning Events on a PVC in the case that no populator is registered to
handle that kind of data source. When a suitable populator is installed for a PVC, it's the
responsibility of that populator controller to report Events that relate to volume creation and
issues during the process.
Create a ReferenceGrant to allow the namespace owner to accept the reference. You define a
populated volume by specifying a cross namespace volume data source using the
dataSourceRef field. You must already have a valid ReferenceGrant in the source namespace:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
name: allow-ns1-pvc
namespace: default
spec:
from:
- group: ""
kind: PersistentVolumeClaim
namespace: ns1
to:
- group: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: new-snapshot-demo
https://kubernetes.io/docs/concepts/storage/_print/ 40/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: foo-pvc
namespace: ns1
spec:
storageClassName: example
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
dataSourceRef:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: new-snapshot-demo
namespace: default
volumeMode: Filesystem
What's next
Learn more about Creating a PersistentVolume.
Learn more about Creating a PersistentVolumeClaim.
Read the Persistent Storage design document.
API references
Read about the APIs described in this page:
PersistentVolume
PersistentVolumeClaim
https://kubernetes.io/docs/concepts/storage/_print/ 41/76
6/6/23, 3:54 PM Storage | Kubernetes
3 - Projected Volumes
This document describes projected volumes in Kubernetes. Familiarity with volumes is
suggested.
Introduction
A projected volume maps several existing volume sources into the same directory.
secret
downwardAPI
configMap
serviceAccountToken
All sources are required to be in the same namespace as the Pod. For more details, see the
all-in-one volume design document.
apiVersion: v1
kind: Pod
metadata:
name: volume-test
spec:
containers:
- name: container-test
image: busybox:1.28
volumeMounts:
- name: all-in-one
mountPath: "/projected-volume"
readOnly: true
volumes:
- name: all-in-one
projected:
sources:
- secret:
name: mysecret
items:
- key: username
path: my-group/my-username
- downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
- path: "cpu_limit"
resourceFieldRef:
containerName: container-test
resource: limits.cpu
- configMap:
name: myconfigmap
items:
- key: config
path: my-group/my-config
https://kubernetes.io/docs/concepts/storage/_print/ 42/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: volume-test
spec:
containers:
- name: container-test
image: busybox:1.28
volumeMounts:
- name: all-in-one
mountPath: "/projected-volume"
readOnly: true
volumes:
- name: all-in-one
projected:
sources:
- secret:
name: mysecret
items:
- key: username
path: my-group/my-username
- secret:
name: mysecret2
items:
- key: password
path: my-group/my-password
mode: 511
Each projected volume source is listed in the spec under sources . The parameters are nearly
the same with two exceptions:
For secrets, the secretName field has been changed to name to be consistent with
ConfigMap naming.
The defaultMode can only be specified at the projected level and not for each volume
source. However, as illustrated above, you can explicitly set the mode for each individual
projection.
pods/storage/projected-service-account-token.yaml
https://kubernetes.io/docs/concepts/storage/_print/ 43/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: sa-token-test
spec:
containers:
- name: container-test
image: busybox:1.28
volumeMounts:
- name: token-vol
mountPath: "/service-account"
readOnly: true
serviceAccountName: default
volumes:
- name: token-vol
projected:
sources:
- serviceAccountToken:
audience: api
expirationSeconds: 3600
path: token
The example Pod has a projected volume containing the injected service account token.
Containers in this Pod can use that token to access the Kubernetes API server, authenticating
with the identity of the pod's ServiceAccount. The audience field contains the intended
audience of the token. A recipient of the token must identify itself with an identifier specified
in the audience of the token, and otherwise should reject the token. This field is optional and
it defaults to the identifier of the API server.
The expirationSeconds is the expected duration of validity of the service account token. It
defaults to 1 hour and must be at least 10 minutes (600 seconds). An administrator can also
limit its maximum value by specifying the --service-account-max-token-expiration option
for the API server. The path field specifies a relative path to the mount point of the projected
volume.
Note: A container using a projected volume source as a subPath volume mount will not
receive updates for those volume sources.
SecurityContext interactions
The proposal for file permission handling in projected service account volume enhancement
introduced the projected files having the correct owner permissions set.
Linux
In Linux pods that have a projected volume and RunAsUser set in the Pod SecurityContext ,
the projected files have the correct ownership set including container user ownership.
When all containers in a pod have the same runAsUser set in their PodSecurityContext or
container SecurityContext , then the kubelet ensures that the contents of the
serviceAccountToken volume are owned by that user, and the token file has its permission
mode set to 0600 .
Note:
Ephemeral containers added to a Pod after it is created do not change volume
permissions that were set when the pod was created.
https://kubernetes.io/docs/concepts/storage/_print/ 44/76
6/6/23, 3:54 PM Storage | Kubernetes
If a Pod's serviceAccountToken volume permissions were set to 0600 because all other
containers in the Pod have the same runAsUser , ephemeral containers must use the
same runAsUser to be able to read the token.
Windows
In Windows pods that have a projected volume and RunAsUsername set in the Pod
SecurityContext , the ownership is not enforced due to the way user accounts are managed
in Windows. Windows stores and manages local user and group accounts in a database file
called Security Account Manager (SAM). Each container maintains its own instance of the SAM
database, to which the host has no visibility into while the container is running. Windows
containers are designed to run the user mode portion of the OS in isolation from the host,
hence the maintenance of a virtual SAM database. As a result, the kubelet running on the host
does not have the ability to dynamically configure host file ownership for virtualized container
accounts. It is recommended that if files on the host machine are to be shared with the
container then they should be placed into their own volume mount outside of C:\ .
By default, the projected files will have the following ownership as shown for an example
projected volume file:
Path : Microsoft.PowerShell.Core\FileSystem::C:\var\run\secrets\kubernetes.io\s
Owner : BUILTIN\Administrators
Group : NT AUTHORITY\SYSTEM
Access : NT AUTHORITY\SYSTEM Allow FullControl
BUILTIN\Administrators Allow FullControl
BUILTIN\Users Allow ReadAndExecute, Synchronize
Audit :
Sddl : O:BAG:SYD:AI(A;ID;FA;;;SY)(A;ID;FA;;;BA)(A;ID;0x1200a9;;;BU)
This implies all administrator users like ContainerAdministrator will have read, write and
execute access while, non-administrator users will have read and execute access.
Note:
In general, granting the container access to the host is discouraged as it can open the
door for potential security exploits.
Creating a Windows Pod with RunAsUser in it's SecurityContext will result in the Pod
being stuck at ContainerCreating forever. So it is advised to not use the Linux only
RunAsUser option with Windows Pods.
https://kubernetes.io/docs/concepts/storage/_print/ 45/76
6/6/23, 3:54 PM Storage | Kubernetes
4 - Ephemeral Volumes
This document describes ephemeral volumes in Kubernetes. Familiarity with volumes is
suggested, in particular PersistentVolumeClaim and PersistentVolume.
Some application need additional storage but don't care whether that data is stored
persistently across restarts. For example, caching services are often limited by memory size
and can move infrequently used data into storage that is slower than memory with little
impact on overall performance.
Other applications expect some read-only input data to be present in files, like configuration
data or secret keys.
Ephemeral volumes are designed for these use cases. Because volumes follow the Pod's
lifetime and get created and deleted along with the Pod, Pods can be stopped and restarted
without being limited to where some persistent volume is available.
Ephemeral volumes are specified inline in the Pod spec, which simplifies application
deployment and management.
emptyDir: empty at Pod startup, with storage coming locally from the kubelet base
directory (usually the root disk) or RAM
configMap, downwardAPI, secret: inject different kinds of Kubernetes data into a Pod
CSI ephemeral volumes: similar to the previous volume kinds, but provided by special
CSI drivers which specifically support this feature
generic ephemeral volumes, which can be provided by all storage drivers that also
support persistent volumes
Generic ephemeral volumes can be provided by third-party CSI storage drivers, but also by
any other storage driver that supports dynamic provisioning. Some CSI drivers are written
specifically for CSI ephemeral volumes and do not support dynamic provisioning: those then
cannot be used for generic ephemeral volumes.
The advantage of using third-party drivers is that they can offer functionality that Kubernetes
itself does not support, for example storage with different performance characteristics than
the disk that is managed by kubelet, or injecting different data.
Note: CSI ephemeral volumes are only supported by a subset of CSI drivers. The
Kubernetes CSI Drivers list shows which drivers support ephemeral volumes.
Conceptually, CSI ephemeral volumes are similar to configMap , downwardAPI and secret
volume types: the storage is managed locally on each node and is created together with other
local resources after a Pod has been scheduled onto a node. Kubernetes has no concept of
rescheduling Pods anymore at this stage. Volume creation has to be unlikely to fail, otherwise
Pod startup gets stuck. In particular, storage capacity aware Pod scheduling is not supported
for these volumes. They are currently also not covered by the storage resource usage limits of
a Pod, because that is something that kubelet can only enforce for storage that it manages
itself.
Here's an example manifest for a Pod that uses CSI ephemeral storage:
https://kubernetes.io/docs/concepts/storage/_print/ 46/76
6/6/23, 3:54 PM Storage | Kubernetes
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app
spec:
containers:
- name: my-frontend
image: busybox:1.28
volumeMounts:
- mountPath: "/data"
name: my-csi-inline-vol
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-inline-vol
csi:
driver: inline.storage.kubernetes.io
volumeAttributes:
foo: bar
The volumeAttributes determine what volume is prepared by the driver. These attributes
are specific to each driver and not standardized. See the documentation of each CSI driver for
further instructions.
Cluster administrators who need to restrict the CSI drivers that are allowed to be used as
inline volumes within a Pod spec may do so by:
Generic ephemeral volumes are similar to emptyDir volumes in the sense that they provide a
per-pod directory for scratch data that is usually empty after provisioning. But they may also
have additional features:
Example:
https://kubernetes.io/docs/concepts/storage/_print/ 47/76
6/6/23, 3:54 PM Storage | Kubernetes
kind: Pod
apiVersion: v1
metadata:
name: my-app
spec:
containers:
- name: my-frontend
image: busybox:1.28
volumeMounts:
- mountPath: "/scratch"
name: scratch-volume
command: [ "sleep", "1000000" ]
volumes:
- name: scratch-volume
ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: my-frontend-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "scratch-storage-class"
resources:
requests:
storage: 1Gi
That triggers volume binding and/or provisioning, either immediately if the StorageClass uses
immediate volume binding or when the Pod is tentatively scheduled onto a node
( WaitForFirstConsumer volume binding mode). The latter is recommended for generic
ephemeral volumes because then the scheduler is free to choose a suitable node for the Pod.
With immediate binding, the scheduler is forced to select a node that has access to the
volume once it is available.
In terms of resource ownership, a Pod that has generic ephemeral storage is the owner of the
PersistentVolumeClaim(s) that provide that ephemeral storage. When the Pod is deleted, the
Kubernetes garbage collector deletes the PVC, which then usually triggers deletion of the
volume because the default reclaim policy of storage classes is to delete volumes. You can
create quasi-ephemeral local storage using a StorageClass with a reclaim policy of retain :
the storage outlives the Pod, and in this case you need to ensure that volume clean up
happens separately.
While these PVCs exist, they can be used like any other PVC. In particular, they can be
referenced as data source in volume cloning or snapshotting. The PVC object also holds the
current status of the volume.
PersistentVolumeClaim naming
Naming of the automatically created PVCs is deterministic: the name is a combination of Pod
name and volume name, with a hyphen ( - ) in the middle. In the example above, the PVC
name will be my-app-scratch-volume . This deterministic naming makes it easier to interact
with the PVC because one does not have to search for it once the Pod name and volume
name are known.
The deterministic naming also introduces a potential conflict between different Pods (a Pod
"pod-a" with volume "scratch" and another Pod with name "pod" and volume "a-scratch" both
end up with the same PVC name "pod-a-scratch") and between Pods and manually created
https://kubernetes.io/docs/concepts/storage/_print/ 48/76
6/6/23, 3:54 PM Storage | Kubernetes
PVCs.
Such conflicts are detected: a PVC is only used for an ephemeral volume if it was created for
the Pod. This check is based on the ownership relationship. An existing PVC is not overwritten
or modified. But this does not resolve the conflict because without the right PVC, the Pod
cannot start.
Caution: Take care when naming Pods and volumes inside the same namespace, so that
these conflicts can't occur.
Security
Enabling the GenericEphemeralVolume feature allows users to create PVCs indirectly if they
can create Pods, even if they do not have permission to create PVCs directly. Cluster
administrators must be aware of this. If this does not fit their security model, they should use
an admission webhook that rejects objects like Pods that have a generic ephemeral volume.
The normal namespace quota for PVCs still applies, so even if users are allowed to use this
new mechanism, they cannot use it to circumvent other policies.
What's next
Ephemeral volumes managed by kubelet
See local ephemeral storage.
https://kubernetes.io/docs/concepts/storage/_print/ 49/76
6/6/23, 3:54 PM Storage | Kubernetes
5 - Storage Classes
This document describes the concept of a StorageClass in Kubernetes. Familiarity with
volumes and persistent volumes is suggested.
Introduction
A StorageClass provides a way for administrators to describe the "classes" of storage they
offer. Different classes might map to quality-of-service levels, or to backup policies, or to
arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated
about what classes represent. This concept is sometimes called "profiles" in other storage
systems.
The name of a StorageClass object is significant, and is how users can request a particular
class. Administrators set the name and other parameters of a class when first creating
StorageClass objects, and the objects cannot be updated once they are created.
Administrators can specify a default StorageClass only for PVCs that don't request any
particular class to bind to: see the PersistentVolumeClaim section for details.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: Immediate
Provisioner
Each StorageClass has a provisioner that determines what volume plugin is used for
provisioning PVs. This field must be specified.
CephFS - -
FC - -
https://kubernetes.io/docs/concepts/storage/_print/ 50/76
6/6/23, 3:54 PM Storage | Kubernetes
FlexVolume - -
GCEPersistentDisk ✓ GCE PD
iSCSI - -
NFS - NFS
VsphereVolume ✓ vSphere
Local - Local
You are not restricted to specifying the "internal" provisioners listed here (whose names are
prefixed with "kubernetes.io" and shipped alongside Kubernetes). You can also run and
specify external provisioners, which are independent programs that follow a specification
defined by Kubernetes. Authors of external provisioners have full discretion over where their
code lives, how the provisioner is shipped, how it needs to be run, what volume plugin it uses
(including Flex), etc. The repository kubernetes-sigs/sig-storage-lib-external-provisioner
houses a library for writing external provisioners that implements the bulk of the
specification. Some external provisioners are listed under the repository kubernetes-sigs/sig-
storage-lib-external-provisioner.
For example, NFS doesn't provide an internal provisioner, but an external provisioner can be
used. There are also cases when 3rd party storage vendors provide their own external
provisioner.
Reclaim Policy
PersistentVolumes that are dynamically created by a StorageClass will have the reclaim policy
specified in the reclaimPolicy field of the class, which can be either Delete or Retain . If
no reclaimPolicy is specified when a StorageClass object is created, it will default to
Delete .
PersistentVolumes that are created manually and managed via a StorageClass will have
whatever reclaim policy they were assigned at creation.
The following types of volumes support volume expansion, when the underlying StorageClass
has the field allowVolumeExpansion set to true.
gcePersistentDisk 1.11
awsElasticBlockStore 1.11
Cinder 1.11
rbd 1.11
https://kubernetes.io/docs/concepts/storage/_print/ 51/76
6/6/23, 3:54 PM Storage | Kubernetes
Portworx 1.11
FlexVolume 1.13
Note: You can only use the volume expansion feature to grow a Volume, not to shrink it.
Mount Options
PersistentVolumes that are dynamically created by a StorageClass will have the mount
options specified in the mountOptions field of the class.
If the volume plugin does not support mount options but mount options are specified,
provisioning will fail. Mount options are not validated on either the class or PV. If a mount
option is invalid, the PV mount fails.
The Immediate mode indicates that volume binding and dynamic provisioning occurs once
the PersistentVolumeClaim is created. For storage backends that are topology-constrained
and not globally accessible from all Nodes in the cluster, PersistentVolumes will be bound or
provisioned without knowledge of the Pod's scheduling requirements. This may result in
unschedulable Pods.
A cluster administrator can address this issue by specifying the WaitForFirstConsumer mode
which will delay the binding and provisioning of a PersistentVolume until a Pod using the
PersistentVolumeClaim is created. PersistentVolumes will be selected or provisioned
conforming to the topology that is specified by the Pod's scheduling constraints. These
include, but are not limited to, resource requirements, node selectors, pod affinity and anti-
affinity, and taints and tolerations.
AWSElasticBlockStore
GCEPersistentDisk
AzureDisk
CSI volumes are also supported with dynamic provisioning and pre-created PVs, but you'll
need to look at the documentation for a specific CSI driver to see its supported topology keys
and examples.
Note:
https://kubernetes.io/docs/concepts/storage/_print/ 52/76
6/6/23, 3:54 PM Storage | Kubernetes
If you choose to use WaitForFirstConsumer , do not use nodeName in the Pod spec to
specify node affinity. If nodeName is used in this case, the scheduler will be bypassed and
PVC will remain in pending state.
Instead, you can use node selector for hostname in this case as shown below.
apiVersion: v1
kind: Pod
metadata:
name: task-pv-pod
spec:
nodeSelector:
kubernetes.io/hostname: kube-01
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
Allowed Topologies
When a cluster operator specifies the WaitForFirstConsumer volume binding mode, it is no
longer necessary to restrict provisioning to specific topologies in most situations. However, if
still required, allowedTopologies can be specified.
This example demonstrates how to restrict the topology of provisioned volumes to specific
zones and should be used as a replacement for the zone and zones parameters for the
supported plugins.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central-1a
- us-central-1b
Parameters
Storage Classes have parameters that describe volumes belonging to the storage class.
Different parameters may be accepted depending on the provisioner . For example, the
value io1 , for the parameter type , and the parameter iopsPerGB are specific to EBS. When
a parameter is omitted, some default is used.
https://kubernetes.io/docs/concepts/storage/_print/ 53/76
6/6/23, 3:54 PM Storage | Kubernetes
There can be at most 512 parameters defined for a StorageClass. The total length of the
parameters object including its keys and values cannot exceed 256 KiB.
AWS EBS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "10"
fsType: ext4
type: io1 , gp2 , sc1 , st1 . See AWS docs for details. Default: gp2 .
zone (Deprecated): AWS zone. If neither zone nor zones is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster has a node.
zone and zones parameters must not be used at the same time.
zones (Deprecated): A comma separated list of AWS zone(s). If neither zone nor zones
is specified, volumes are generally round-robin-ed across all active zones where
Kubernetes cluster has a node. zone and zones parameters must not be used at the
same time.
iopsPerGB : only for io1 volumes. I/O operations per second per GiB. AWS volume
plugin multiplies this with size of requested volume to compute IOPS of the volume and
caps it at 20 000 IOPS (maximum supported by AWS, see AWS docs). A string is expected
here, i.e. "10" , not 10 .
fsType : fsType that is supported by kubernetes. Default: "ext4" .
encrypted : denotes whether the EBS volume should be encrypted or not. Valid values
are "true" or "false" . A string is expected here, i.e. "true" , not true .
kmsKeyId : optional. The full Amazon Resource Name of the key to use when encrypting
the volume. If none is supplied but encrypted is true, a key is generated by AWS. See
AWS docs for valid ARN value.
Note: zone and zones parameters are deprecated and replaced with allowedTopologies
GCE PD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
fstype: ext4
replication-type: none
zone(Deprecated): GCE zone. If neither zone nor zones is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster has a node.
zone and zones parameters must not be used at the same time.
zones (Deprecated): A comma separated list of GCE zone(s). If neither zone nor zones
is specified, volumes are generally round-robin-ed across all active zones where
Kubernetes cluster has a node. zone and zones parameters must not be used at the
https://kubernetes.io/docs/concepts/storage/_print/ 54/76
6/6/23, 3:54 PM Storage | Kubernetes
same time.
fstype : ext4 or xfs . Default: ext4 . The defined filesystem type must be supported
by the host operating system.
Note: zone and zones parameters are deprecated and replaced with allowedTopologies
NFS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example-nfs
provisioner: example.com/external-nfs
parameters:
server: nfs-server.example.com
path: /share
readOnly: "false"
readOnly : A flag indicating whether the storage will be mounted as read only (default
false).
Kubernetes doesn't include an internal NFS provisioner. You need to use an external
provisioner to create a StorageClass for NFS. Here are some examples:
OpenStack Cinder
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gold
provisioner: kubernetes.io/cinder
parameters:
availability: nova
Note:
FEATURE STATE: Kubernetes v1.11 [deprecated]
https://kubernetes.io/docs/concepts/storage/_print/ 55/76
6/6/23, 3:54 PM Storage | Kubernetes
This internal provisioner of OpenStack is deprecated. Please use the external cloud
provider for OpenStack.
vSphere
There are two types of provisioners for vSphere storage classes:
In-tree provisioners are deprecated. For more information on the CSI provisioner, see
Kubernetes vSphere CSI Driver and vSphereVolume CSI migration.
CSI Provisioner
The vSphere CSI StorageClass provisioner works with Tanzu Kubernetes clusters. For an
example, refer to the vSphere CSI repository.
vCP Provisioner
The following examples use the VMware Cloud Provider (vCP) StorageClass provisioner.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
datastore: VSANDatastore
datastore : The user can also specify the datastore in the StorageClass. The volume will
be created on the datastore specified in the StorageClass, which in this case is
VSANDatastore . This field is optional. If the datastore is not specified, then the volume
will be created on the datastore specified in the vSphere config file used to initialize the
vSphere Cloud Provider.
One of the most important features of vSphere for Storage Management is policy
based Management. Storage Policy Based Management (SPBM) is a storage policy
framework that provides a single unified control plane across a broad range of
https://kubernetes.io/docs/concepts/storage/_print/ 56/76
6/6/23, 3:54 PM Storage | Kubernetes
Vsphere Infrastructure (VI) Admins will have the ability to specify custom Virtual
SAN Storage Capabilities during dynamic volume provisioning. You can now define
storage requirements, such as performance and availability, in the form of storage
capabilities during dynamic volume provisioning. The storage capability
requirements are converted into a Virtual SAN policy which are then pushed down
to the Virtual SAN layer when a persistent volume (virtual disk) is being created.
The virtual disk is distributed across the Virtual SAN datastore to meet the
requirements.
You can see Storage Policy Based Management for dynamic provisioning of
volumes for more details on how to use storage policies for persistent volumes
management.
There are few vSphere examples which you try out for persistent volume management inside
Kubernetes for vSphere.
Ceph RBD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.16.153.105:6789
adminId: kube
adminSecretName: ceph-secret
adminSecretNamespace: kube-system
pool: kube
userId: kube
userSecretName: ceph-secret-user
userSecretNamespace: default
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"
adminId : Ceph client ID that is capable of creating images in the pool. Default is
"admin".
adminSecretName: Secret Name for adminId . This parameter is required. The provided
secret must have type "kubernetes.io/rbd".
userId : Ceph client ID that is used to map the RBD image. Default is the same as
adminId .
userSecretName: The name of Ceph Secret for userId to map RBD image. It must exist
in the same namespace as PVCs. This parameter is required. The provided secret must
have type "kubernetes.io/rbd", for example created in this way:
https://kubernetes.io/docs/concepts/storage/_print/ 57/76
6/6/23, 3:54 PM Storage | Kubernetes
imageFeatures : This parameter is optional and should only be used if you set
imageFormat to "2". Currently supported features are layering only. Default is "", and
no features are turned on.
Azure Disk
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/azure-disk
parameters:
skuName: Standard_LRS
location: eastus
storageAccount: azure_storage_account_name
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/azure-disk
parameters:
storageaccounttype: Standard_LRS
kind: managed
https://kubernetes.io/docs/concepts/storage/_print/ 58/76
6/6/23, 3:54 PM Storage | Kubernetes
Premium VM can attach both Standard_LRS and Premium_LRS disks, while Standard VM
can only attach Standard_LRS disks.
Managed VM can only attach managed disks and unmanaged VM can only attach
unmanaged disks.
Azure File
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azurefile
provisioner: kubernetes.io/azure-file
parameters:
skuName: Standard_LRS
location: eastus
storageAccount: azure_storage_account_name
During storage provisioning, a secret named by secretName is created for the mounting
credentials. If the cluster has enabled both RBAC and Controller Roles, add the create
permission of resource secret for clusterrole system:controller:persistent-volume-
binder .
Portworx Volume
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: portworx-io-priority-high
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "1"
snap_interval: "70"
priority_io: "high"
priority_io : determines whether the volume will be created from higher performance
or a lower priority storage high/medium/low (default: low ).
https://kubernetes.io/docs/concepts/storage/_print/ 59/76
6/6/23, 3:54 PM Storage | Kubernetes
Local
FEATURE STATE: Kubernetes v1.14 [stable]
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
Local volumes do not currently support dynamic provisioning, however a StorageClass should
still be created to delay volume binding until Pod scheduling. This is specified by the
WaitForFirstConsumer volume binding mode.
Delaying volume binding allows the scheduler to consider all of a Pod's scheduling constraints
when choosing an appropriate PersistentVolume for a PersistentVolumeClaim.
https://kubernetes.io/docs/concepts/storage/_print/ 60/76
6/6/23, 3:54 PM Storage | Kubernetes
Background
The implementation of dynamic volume provisioning is based on the API object
StorageClass from the API group storage.k8s.io . A cluster administrator can define as
many StorageClass objects as needed, each specifying a volume plugin (aka provisioner) that
provisions a volume and the set of parameters to pass to that provisioner when provisioning.
A cluster administrator can define and expose multiple flavors of storage (from the same or
different storage systems) within a cluster, each with a custom set of parameters. This design
also ensures that end users don't have to worry about the complexity and nuances of how
storage is provisioned, but still have the ability to select from multiple storage options.
The following manifest creates a storage class "slow" which provisions standard disk-like
persistent disks.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
The following manifest creates a storage class "fast" which provisions SSD-like persistent
disks.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
https://kubernetes.io/docs/concepts/storage/_print/ 61/76
6/6/23, 3:54 PM Storage | Kubernetes
deprecated since v1.9. Users now can and should instead use the storageClassName field of
the PersistentVolumeClaim object. The value of this field must match the name of a
StorageClass configured by the administrator (see below).
To select the "fast" storage class, for example, a user would create the following
PersistentVolumeClaim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim1
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast
resources:
requests:
storage: 30Gi
This claim results in an SSD-like Persistent Disk being automatically provisioned. When the
claim is deleted, the volume is destroyed.
Defaulting Behavior
Dynamic provisioning can be enabled on a cluster such that all claims are dynamically
provisioned if no storage class is specified. A cluster administrator can enable this behavior
by:
Note that there can be at most one default storage class on a cluster, or a
PersistentVolumeClaim without storageClassName explicitly specified cannot be created.
Topology Awareness
In Multi-Zone clusters, Pods can be spread across Zones in a Region. Single-Zone storage
backends should be provisioned in the Zones where Pods are scheduled. This can be
accomplished by setting the Volume Binding Mode.
https://kubernetes.io/docs/concepts/storage/_print/ 62/76
6/6/23, 3:54 PM Storage | Kubernetes
7 - Volume Snapshots
In Kubernetes, a VolumeSnapshot represents a snapshot of a volume on a storage system. This
document assumes that you are already familiar with Kubernetes persistent volumes.
Introduction
Similar to how API resources PersistentVolume and PersistentVolumeClaim are used to
provision volumes for users and administrators, VolumeSnapshotContent and
VolumeSnapshot API resources are provided to create volume snapshots for users and
administrators.
A VolumeSnapshotContent is a snapshot taken from a volume in the cluster that has been
provisioned by an administrator. It is a resource in the cluster just like a PersistentVolume is a
cluster resource.
Volume snapshots provide Kubernetes users with a standardized way to copy a volume's
contents at a particular point in time without creating an entirely new volume. This
functionality enables, for example, database administrators to backup databases before
performing edit or delete modifications.
https://kubernetes.io/docs/concepts/storage/_print/ 63/76
6/6/23, 3:54 PM Storage | Kubernetes
Pre-provisioned
A cluster administrator creates a number of VolumeSnapshotContents . They carry the details
of the real volume snapshot on the storage system which is available for use by cluster users.
They exist in the Kubernetes API and are available for consumption.
Dynamic
Instead of using a pre-existing snapshot, you can request that a snapshot to be dynamically
taken from a PersistentVolumeClaim. The VolumeSnapshotClass specifies storage provider-
specific parameters to use when taking a snapshot.
Binding
The snapshot controller handles the binding of a VolumeSnapshot object with an appropriate
VolumeSnapshotContent object, in both pre-provisioned and dynamically provisioned
scenarios. The binding is a one-to-one mapping.
In the case of pre-provisioned binding, the VolumeSnapshot will remain unbound until the
requested VolumeSnapshotContent object is created.
Delete
Deletion is triggered by deleting the VolumeSnapshot object, and the DeletionPolicy will be
followed. If the DeletionPolicy is Delete , then the underlying storage snapshot will be
deleted along with the VolumeSnapshotContent object. If the DeletionPolicy is Retain ,
then both the underlying snapshot and VolumeSnapshotContent remain.
VolumeSnapshots
Each VolumeSnapshot contains a spec and a status.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: new-snapshot-test
spec:
volumeSnapshotClassName: csi-hostpath-snapclass
source:
persistentVolumeClaimName: pvc-test
https://kubernetes.io/docs/concepts/storage/_print/ 64/76
6/6/23, 3:54 PM Storage | Kubernetes
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: test-snapshot
spec:
source:
volumeSnapshotContentName: test-content
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: snapcontent-72d9a349-aacd-42d2-a240-d775650d2455
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io
source:
volumeHandle: ee0cfb94-f8d4-11e9-b2d8-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotClassName: csi-hostpath-snapclass
volumeSnapshotRef:
name: new-snapshot-test
namespace: default
uid: 72d9a349-aacd-42d2-a240-d775650d2455
volumeHandle is the unique identifier of the volume created on the storage backend and
returned by the CSI driver during the volume creation. This field is required for dynamically
provisioning a snapshot. It specifies the volume source of the snapshot.
For pre-provisioned snapshots, you (as cluster administrator) are responsible for creating the
VolumeSnapshotContent object as follows.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: new-snapshot-content-test
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io
source:
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotRef:
name: new-snapshot-test
namespace: default
https://kubernetes.io/docs/concepts/storage/_print/ 65/76
6/6/23, 3:54 PM Storage | Kubernetes
snapshotHandle is the unique identifier of the volume snapshot created on the storage
backend. This field is required for the pre-provisioned snapshots. It specifies the CSI snapshot
id on the storage system that this VolumeSnapshotContent represents.
sourceVolumeMode is the mode of the volume whose snapshot is taken. The value of the
sourceVolumeMode field can be either Filesystem or Block . If the source volume mode is
not specified, Kubernetes treats the snapshot as if the source volume's mode is unknown.
To check if your cluster has capability for this feature, run the following command:
An example VolumeSnapshotContent resource with this feature enabled would look like:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: new-snapshot-content-test
annotations:
- snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io
source:
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotRef:
name: new-snapshot-test
namespace: default
For more details, see Volume Snapshot and Restore Volume from Snapshot.
https://kubernetes.io/docs/concepts/storage/_print/ 66/76
6/6/23, 3:54 PM Storage | Kubernetes
Introduction
Just like StorageClass provides a way for administrators to describe the "classes" of storage
they offer when provisioning a volume, VolumeSnapshotClass provides a way to describe the
"classes" of storage when provisioning a volume snapshot.
The name of a VolumeSnapshotClass object is significant, and is how users can request a
particular class. Administrators set the name and other parameters of a class when first
creating VolumeSnapshotClass objects, and the objects cannot be updated once they are
created.
Note: Installation of the CRDs is the responsibility of the Kubernetes distribution. Without
the required CRDs present, the creation of a VolumeSnapshotClass fails.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-hostpath-snapclass
driver: hostpath.csi.k8s.io
deletionPolicy: Delete
parameters:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-hostpath-snapclass
annotations:
snapshot.storage.kubernetes.io/is-default-class: "true"
driver: hostpath.csi.k8s.io
deletionPolicy: Delete
parameters:
Driver
Volume snapshot classes have a driver that determines what CSI volume plugin is used for
provisioning VolumeSnapshots. This field must be specified.
https://kubernetes.io/docs/concepts/storage/_print/ 67/76
6/6/23, 3:54 PM Storage | Kubernetes
DeletionPolicy
Volume snapshot classes have a deletionPolicy. It enables you to configure what happens to a
VolumeSnapshotContent when the VolumeSnapshot object it is bound to is to be deleted. The
deletionPolicy of a volume snapshot class can either be Retain or Delete . This field must be
specified.
If the deletionPolicy is Delete , then the underlying storage snapshot will be deleted along
with the VolumeSnapshotContent object. If the deletionPolicy is Retain , then both the
underlying snapshot and VolumeSnapshotContent remain.
Parameters
Volume snapshot classes have parameters that describe volume snapshots belonging to the
volume snapshot class. Different parameters may be accepted depending on the driver .
https://kubernetes.io/docs/concepts/storage/_print/ 68/76
6/6/23, 3:54 PM Storage | Kubernetes
Introduction
The CSI Volume Cloning feature adds support for specifying existing PVCs in the dataSource
field to indicate a user would like to clone a Volume.
The implementation of cloning, from the perspective of the Kubernetes API, adds the ability to
specify an existing PVC as a dataSource during new PVC creation. The source PVC must be
bound and available (not in use).
Provisioning
Clones are provisioned like any other PVC with the exception of adding a dataSource that
references an existing PVC in the same namespace.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clone-of-pvc-1
namespace: myns
spec:
accessModes:
- ReadWriteOnce
storageClassName: cloning
resources:
requests:
storage: 5Gi
dataSource:
kind: PersistentVolumeClaim
name: pvc-1
Note: You must specify a capacity value for spec.resources.requests.storage, and the
value you specify must be the same or larger than the capacity of the source volume.
The result is a new PVC with the name clone-of-pvc-1 that has the exact same content as
the specified source pvc-1 .
https://kubernetes.io/docs/concepts/storage/_print/ 69/76
6/6/23, 3:54 PM Storage | Kubernetes
Usage
Upon availability of the new PVC, the cloned PVC is consumed the same as other PVC. It's also
expected at this point that the newly created PVC is an independent object. It can be
consumed, cloned, snapshotted, or deleted independently and without consideration for it's
original dataSource PVC. This also implies that the source is not linked in any way to the newly
created clone, it may also be modified or deleted without affecting the newly created clone.
https://kubernetes.io/docs/concepts/storage/_print/ 70/76
6/6/23, 3:54 PM Storage | Kubernetes
10 - Storage Capacity
Storage capacity is limited and may vary depending on the node on which a pod runs:
network-attached storage might not be accessible by all nodes, or storage is local to a node to
begin with.
This page describes how Kubernetes keeps track of storage capacity and how the scheduler
uses that information to schedule Pods onto nodes that have access to enough storage
capacity for the remaining missing volumes. Without storage capacity tracking, the scheduler
may choose a node that doesn't have enough capacity to provision a volume and multiple
scheduling retries will be needed.
API
There are two API extensions for this feature:
CSIStorageCapacity objects: these get produced by a CSI driver in the namespace where
the driver is installed. Each object contains capacity information for one storage class
and defines which nodes have access to that storage.
The CSIDriverSpec.StorageCapacity field: when set to true , the Kubernetes scheduler
will consider storage capacity for volumes that use the CSI driver.
Scheduling
Storage capacity information is used by the Kubernetes scheduler if:
the CSIDriver object for the driver has StorageCapacity set to true.
In that case, the scheduler only considers nodes for the Pod which have enough storage
available to them. This check is very simplistic and only compares the size of the volume
against the capacity listed in CSIStorageCapacity objects with a topology that includes the
node.
For volumes with Immediate volume binding mode, the storage driver decides where to
create the volume, independently of Pods that will use the volume. The scheduler then
schedules Pods onto nodes where the volume is available after the volume has been created.
For CSI ephemeral volumes, scheduling always happens without considering storage capacity.
This is based on the assumption that this volume type is only used by special CSI drivers
which are local to a node and do not need significant resources there.
Rescheduling
When a node has been selected for a Pod with WaitForFirstConsumer volumes, that decision
is still tentative. The next step is that the CSI storage driver gets asked to create the volume
with a hint that the volume is supposed to be available on the selected node.
https://kubernetes.io/docs/concepts/storage/_print/ 71/76
6/6/23, 3:54 PM Storage | Kubernetes
Because Kubernetes might have chosen a node based on out-dated capacity information, it is
possible that the volume cannot really be created. The node selection is then reset and the
Kubernetes scheduler tries again to find a node for the Pod.
Limitations
Storage capacity tracking increases the chance that scheduling works on the first try, but
cannot guarantee this because the scheduler has to decide based on potentially out-dated
information. Usually, the same retry mechanism as for scheduling without any storage
capacity information handles scheduling failures.
One situation where scheduling can fail permanently is when a Pod uses multiple volumes:
one volume might have been created already in a topology segment which then does not
have enough capacity left for another volume. Manual intervention is necessary to recover
from this, for example by increasing capacity or deleting the volume that was already created.
What's next
For more information on the design, see the Storage Capacity Constraints for Pod
Scheduling KEP.
https://kubernetes.io/docs/concepts/storage/_print/ 72/76
6/6/23, 3:54 PM Storage | Kubernetes
Cloud providers like Google, Amazon, and Microsoft typically have a limit on how many
volumes can be attached to a Node. It is important for Kubernetes to respect those limits.
Otherwise, Pods scheduled on a Node could get stuck waiting for volumes to attach.
Custom limits
You can change these limits by setting the value of the KUBE_MAX_PD_VOLS environment
variable, and then starting the scheduler. CSI drivers might have a different procedure, see
their documentation on how to customize their limits.
Use caution if you set a limit that is higher than the default limit. Consult the cloud provider's
documentation to make sure that Nodes can actually support the limit you set.
Amazon EBS
Google Persistent Disk
Azure Disk
CSI
For volumes managed by in-tree volume plugins, Kubernetes automatically determines the
Node type and enforces the appropriate maximum number of volumes for the node. For
example:
For Amazon EBS disks on M5,C5,R5,T3 and Z1D instance types, Kubernetes allows only
25 volumes to be attached to a Node. For other instance types on Amazon Elastic
Compute Cloud (EC2), Kubernetes allows 39 volumes to be attached to a Node.
On Azure, up to 64 disks can be attached to a node, depending on the node type. For
more details, refer to Sizes for virtual machines in Azure.
https://kubernetes.io/docs/concepts/storage/_print/ 73/76
6/6/23, 3:54 PM Storage | Kubernetes
If a CSI storage driver advertises a maximum number of volumes for a Node (using
NodeGetInfo ), the kube-scheduler honors that limit. Refer to the CSI specifications for
details.
For volumes managed by in-tree plugins that have been migrated to a CSI driver, the
maximum number of volumes will be the one reported by the CSI driver.
https://kubernetes.io/docs/concepts/storage/_print/ 74/76
6/6/23, 3:54 PM Storage | Kubernetes
CSI volume health monitoring allows CSI Drivers to detect abnormal volume conditions from
the underlying storage systems and report them as events on PVCs or Pods.
If a CSI Driver supports Volume Health Monitoring feature from the controller side, an event
will be reported on the related PersistentVolumeClaim (PVC) when an abnormal volume
condition is detected on a CSI volume.
The External Health Monitor controller also watches for node failure events. You can enable
node failure monitoring by setting the enable-node-watcher flag to true. When the external
health monitor detects a node failure event, the controller reports an Event will be reported
on the PVC to indicate that pods using this PVC are on a failed node.
If a CSI Driver supports Volume Health Monitoring feature from the node side, an Event will be
reported on every Pod using the PVC when an abnormal volume condition is detected on a
CSI volume. In addition, Volume Health information is exposed as Kubelet VolumeStats
metrics. A new metric kubelet_volume_stats_health_status_abnormal is added. This metric
includes two labels: namespace and persistentvolumeclaim . The count is either 1 or 0. 1
indicates the volume is unhealthy, 0 indicates volume is healthy. For more information, please
check KEP.
Note: You need to enable the CSIVolumeHealth feature gate to use this feature from the
node side.
What's next
See the CSI driver documentation to find out which CSI drivers have implemented this feature.
https://kubernetes.io/docs/concepts/storage/_print/ 75/76
6/6/23, 3:54 PM Storage | Kubernetes
13 - Windows Storage
This page provides an storage overview specific to the Windows operating system.
Persistent storage
Windows has a layered filesystem driver to mount container layers and create a copy
filesystem based on NTFS. All file paths in the container are resolved only within the context
of that container.
With Docker, volume mounts can only target a directory in the container, and not an
individual file. This limitation does not apply to containerd.
Volume mounts cannot project files or directories back to the host filesystem.
Read-only filesystems are not supported because write access is always required for the
Windows registry and SAM database. However, read-only volumes are supported.
Volume user-masks and permissions are not available. Because the SAM is not shared
between the host & container, there's no mapping between them. All permissions are
resolved within the context of the container.
Volume subpath mounts: only the entire volume can be mounted in a Windows
container
Subpath volume mounting for Secrets
Host mount projection
Read-only root filesystem (mapped volumes still support readOnly )
Block device mapping
Memory as the storage medium (for example, emptyDir.medium set to Memory )
File system features like uid/gid; per-user Linux filesystem permissions
Setting secret permissions with DefaultMode (due to UID/GID dependency)
NFS based storage/volume support
Expanding the mounted volume (resizefs)
Kubernetes volumes enable complex applications, with data persistence and Pod volume
sharing requirements, to be deployed on Kubernetes. Management of persistent volumes
associated with a specific storage back-end or protocol includes actions such as
provisioning/de-provisioning/resizing of volumes, attaching/detaching a volume to/from a
Kubernetes node and mounting/dismounting a volume to/from individual containers in a pod
that needs to persist data.
Volume management components are shipped as Kubernetes volume plugin. The following
broad classes of Kubernetes volume plugins are supported on Windows:
FlexVolume plugins
Please note that FlexVolumes have been deprecated as of 1.23
CSI Plugins
awsElasticBlockStore
azureDisk
azureFile
gcePersistentDisk
vsphereVolume
https://kubernetes.io/docs/concepts/storage/_print/ 76/76