4. Using ROBIN CNS in Kubernetes¶
The Container Storage Interface (CSI) is a standard for exposing storage to workloads on Kubernetes. To enable automatic creation/deletion of volumes for CSI Storage, a Kubernetes resource called StorageClass must be created and registered within the Kubernetes cluster. Associated with the StorageClass is a CSI provisioner plugin that does the heavy lifting at disk and storage management layers to provision storage volumes based on the various attributes defined in the StorageClass. Kubernetes CSI was introduced in Kubernetes v1.9 release, promoted to beta in Kuberentes v1.10 release as CSI v0.3, followed by a GA release in Kubernetes v1.13 as CSI v1.0.
Kubernetes CSI broke compatibility between CSI v1.0 and CSI v0.3 and hence one must implement two different StorageClasses, one for implementing v0.3 and v1.0 version of the Spec. To facilitate this ROBIN, ships with two StorageClasses:
robin-0-3
- The StorageClass that is compatible with Kubernetes versions lower than v1.13robin
- The StorageClass that is comptabile with Kubernetes versions v1.13 and above
Both storage classes follow the same parameters as described below:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: robin
provisioner: robin
reclaimPolicy: Delete
parameters:
# media: <media>
# blocksize: <blocksize>
# fstype: <fstype>
# replication: <replication>
# faultdomain: <faultdomain>
# compression: <compression>
# encryption: <encryption>
# snapshot_space_limit: <snapshot_space_limit>
# rpool: <rpool_name>
|
The media type ROBIN should use to allocate PersistentVolumes.
Two values are supported: |
|
By default ROBIN uses |
|
By default the logical block device created by ROBIN is formatted
using |
|
By default ROBIN does not enable replication for the logical block device. It can be set to ‘2’ or ‘3’ to setup 2-way or 3-way replication. Robin implements a strictly consistent data replication guarantee. Which means that a write IO is NOT acknowledged back to the client until it is made durable on all replicas. |
|
The fault domain to be used when “replication” is turned on. Setting the
right fault domain maximizes data safety. Setting it to |
|
By default inline data compression is disabled. It can be enabled by
setting it to |
|
By default data-at-rest encryption is not enabled. To enable it set it
to |
|
This is how much space that is set aside for snapshots for
this volume. For example, if volume size is 100GB,
value of “30” here would be 30GB space reserved for snapshots.
New snapshot creation will fail once this limit is reached.
Default is |
|
Resource pools are a construct in Robin which allow you to group nodes in
the cluster together for allocation purposes. Pools provide resource isolation. The default resource pool is |
Note
Make sure that for blocksize
and replication
, the values are passed as quoted strings to adhere to CSI spec. That is, blocksize should be passed as “4096” (quoted) and NOT as 4096 (unquoted)
4.1. Using Robin CNS Storage Class to Provision Storage¶
4.1.1. Basic Use Case¶
Creating a PVC with Robin CNS StorageClass:
First configure YAML similar to the one shown below for a PersistentVolumeClaim (PVC) using the Robin CNS StorageClass.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc annotations: volume.beta.kubernetes.io/storage-class: robin spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
Run the following command to actually create the PVC:
$ kubectl create -f mypvc.yaml persistentvolumeclaim/mypvc created
Note
Notice that under metadata/annotations we have spcified the storage class as
volume.beta.kubernetes.io/storage-class: robin
. This results in the ROBIN CNS Storage Class to be be picked up. For Kubernetes versions less than v1.13 one should instead usevolume.beta.kubernetes.io/storage-class: robin-0-3
.Verify the desired PVC exists and was created successfully by running the following command:
$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Pending robin 7s
Attach the PersistentVolumeClaim to a simple Pod:
Configure a Pod YAML, similar to the one showcased below, wherein which the volume we created previously is referenced.
kind: Pod apiVersion: v1 metadata: name: myweb spec: volumes: - name: htdocs persistentVolumeClaim: claimName: mypvc containers: - name: myweb0 image: nginx ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/usr/share/nginx/html" name: htdocs
Run the following command to actually create the Pod:
$ kubectl create -f mypod.yaml
We can confirm that the PersistentVolumeClaim is bound to the pod and a PersistantVolume is created by issuing the following commands:
$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Bound pvc-7a18d80c-6c26-4585-a949-24d9005e3d7f 10Gi RWO robin 6m1s $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-7a18d80c-6c26-4585-a949-24d9005e3d7f 10Gi RWO Delete Bound default/mypvc robin 5m32s
4.1.2. Customizing Volume Provisioning¶
Let’s say that we’d like to create a PVC which meets the following requirements:
Data is replicated 3-ways
The Pod should continue to have access to data even if 2 of the 3 disks or the nodes on which these disks are hosted go down
The data must be compressed
The data should only reside on SSD media
This is accomplished by specifying these requirements under metadata/annotations
section of the PVC Spec as described in the YAML below. Please notice that each annotations are prefixed with robin.io/
. Annotations can take exact same parameters as in ROBIN CNS Storage Class YAML detailed above and would override the corrosponding parameters specified in the StorageClass.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: proteced-compressed-pvc
annotations:
volume.beta.kubernetes.io/storage-class: robin
robin.io/replication: "3"
robin.io/faultdomain: host
robin.io/compression: LZ4
robin.io/media: SSD
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Run the following command to actually create the PVC:
$ kubectl create -f newpvc.yaml
persistentvolumeclaim/proteced-compressed-pvc created
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mypvc Bound pvc-7a18d80c-6c26-4585-a949-24d9005e3d7f 10Gi RWO robin 62m
proteced-compressed-pvc Pending robin 47s
Note
Note that the number 3 is quoted as “3” when specifying the robin.io/replication:
annotation. This is per the Kubernetes Spec. Not doing so would result in an error being thrown by Kuberentes
4.1.3. Using ROBIN CNS in a StatefulSet¶
In a StatefulSet a PVC is not directly referenced as in the above examples, but instead a volumeClaimTemplate is used to describe the type of PVC that needs to be created as part of the creation of the StatefulSet resource. This is accomplished via the following YAML:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
annotations:
volume.beta.kubernetes.io/storage-class: robin
robin.io/replication: "2"
robin.io/media: SSD
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
The following commands can be used to create the Statefulset and ensure the correct PVCs are used:
$ kubectl create -f myweb.yaml
service/nginx created
statefulset.apps/web created
$ kubectl get statefulset
NAME READY AGE
web 2/2 12s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
www-web-0 Bound pvc-2b97d8fc-479d-11e9-bac1-00155d61160d 1Gi RWO robin 8s
www-web-1 Bound pvc-436536e6-479d-11e9-bac1-00155d61160d 1Gi RWO robin 8s
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-2b97d8fc-479d-11e9-bac1-00155d61160d 1Gi RWO Delete Bound default/www-web-0 robin 10s
pvc-436536e6-479d-11e9-bac1-00155d61160d 1Gi RWO Delete Bound default/www-web-1 robin 10s
4.1.4. Provisioning Storage for Helm Charts¶
Helm charts are a popular way to deploy an entire stack of Kubernetes resources in one shot. A helm chart is installed using helm install
command. To use Robin CNS for persistent storage one needs to pass it as --set persistence.storageClass==robin
command line option as shown below:
$ helm install --name pgsqldb stable/mysql --set persistence.storageClass=robin
This would result in Robin being used as the storage provisioner for PersistentVolumeClaims created by this helm chart.
4.2. Protecting PVCs using ROBIN’s Volume Replication¶
Robin uses storage volume-level replication to ensure that data is always available in the event of nodes and disk failures. When replication is configured to 2, at least 2 copies of the volume on maintained on different disks, if set to 3 at least 3 copies are maintained. This ensures that the volume’s data is available in the event of 1 or 2 disk/node failures. Configuring replication is done by annotating the PVC spec with robin.io/replication: "<count>"
and optionally robin.io/faultdomain: disk|host|rack
as shown in the YAML below:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: replicated-pvc
annotations:
volume.beta.kubernetes.io/storage-class: robin
robin.io/replication: "3"
robin.io/faultdomain: host
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Setting the correct value for robin.io/fautdomain
to either disk
or host
or rack
ensures that this PVC’s data is available in the event of just a disk or also node failures.
How are faults handled?
ROBIN uses strict-consistency semantics to guarantee correctness for your mission critical stateful applications. Which means that a “write” IO is not ackowledged back to the application until it has been made durable on all the healthy replicas disks. It is possible that one or more replica disks for a volume can go down for short periods of time (node going through a reboot cycle), or for longer periods of time (node has a hardware fault and can’t be brought online unti the part is replaced). ROBIN handles both cases gracefully. When a replica disk becomes available during IO, ROBIN automatically evicts it from the replication group. IOs continue to go to the remaining healthy replicas. When the faulted disks becomes available ROBIN automatically brings it up to the same state as the other healthy disks before adding it back into the replication group. This is automatically handled and transparent to the application.
When a disk suffers a more serious error. For example, an IO error is returned by the disk during a write or read operation. In this case ROBIN marks that disk as faulted and generates an alert for the storage admin to investigate. The storage admin can then determine the nature of the error and then mark that disk as healthy, in which case ROBIN adds it back into the replication group and initiates a data resync to bring it up to the same level as the other healthy disks. If the error is serious (e.g., SMART counters returns corruption), or if the node has a motherboard or IO card fault that needs to be replaced, the storage admin can permanently decommison that disk or node from the Kubernetes cluster. Doing so would also automatically evict that disk from the replication group of the PVC. The storage admin can then add a new healthy disk to the replication group so that the PVC can be brought back to the same level of availability as before.
There is a practical reason why ROBIN doesn’t automatically trigger rebuilds of fauluted disks. ROBIN is currently being used in mission critical workloads with multiple-petabytes under management by the ROBIN storage stack. We have seen scenarios where an IO controller card has failed while it has 12 disks of 10TiB each. That is 120 TiB of storage capacity under a single IO controller card. Rebuilding 120 TiB of data takes more time than replacing a faulted IO controller card with a healthy one. Also, moving 120 TiB of data over the network from healthy disks on other nodes puts a significant load on the network switches and the applications running on the nodes from which the data is pulled. This results in noticeable performance deradation. With our experience managing storage under large scale deployments and taking feedback from admins managing those cluters we have determined that it is best to inform an admin of a failure and let them decide, based on cost and time, wheather they want to replace a faulty hardware or want ROBIN to initiate a rebuild.
4.3. Making Robin the default StorageClass¶
To avoid typing the name of the StorageClass each time a new chart is deployed, it is highly recommend to set Robin’s StorageClass as the default Kubernetes StorageClass. This can be done as follows:
Check for the current default StorageClass:
Inspect if there is already a different StorageClass marked as default by running the following command:
$ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 8d robin robin Delete WaitForFirstConsumer true 5d5h
Set the non-Robin StorageClass as “non-default”:
In order to mark the current default StorageClass as “non-default” run the following command:
$ kubectl patch storageclass gp2 \ -p '{"metadata": {"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"false"}}}'
Note
Before patching the storage class ensure that the annotation specified is correct. The above example is specific to a GKE cluster running version 1.12 of Kubernetes.
Mark Robin as the new default StorageClass:
To set the Robin CNS native StorageClass as the default for the cluster, run the following command:
$ kubectl patch storageclass robin \ -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Note
Before patching the ROBIN storage class ensure that name specified is correct as for Kubernetes versions newer than 1.13 it appears as robin but for older versions it is displayed as robin-0-3
Verify that Robin is now the default StorageClass:
Issue the following command to confirm that Robin is now the default StorageClass:
$ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 8d robin (default) robin Delete WaitForFirstConsumer true 5d5h
To learn more see official documentation on how to Change the default StorageClass.
4.4. Snapshot Volumes¶
Just like storage management, which is done by an external storage provisioner such as Robin, taking snapshots of a volume is also done using a Snapshoting provisioner that is registered with Kubernetes. See more details on the official documentation on Volume Snapshots. Robin supports Kubernetes snapshots for Kubernetes versions v1.13 and beyond.
Register a SnapshotClass with Kubernetes:
First configure an appropriate YAML (an example is given below) representing the SnapshotClass.
apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshotClass metadata: name: robin-snapshotclass labels: app.kubernetes.io/instance: robin app.kubernetes.io/managed-by: robin.io app.kubernetes.io/name: robin driver: robin deletionPolicy: Delete
Create the SnapshotClass by running the following command:
$ kubectl create -f csi-robin-snapshotclass.yaml volumesnapshotclass.snapshot.storage.k8s.io/robin-snapshotclass created
Moreover confirm that a SnapshotClass is registered via the following command:
$ kubectl get volumesnapshotclass NAME DRIVER DELETIONPOLICY AGE robin-snapshotclass robin Delete 18s
Take a snapshot of a PersistentVolumeClaim:
In order to actually take a snapshot of PVC, first configure a YAML with the name of the SnapshotClass and the PVC that needs to be snapshotted like below.
apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshot metadata: name: snapshot-mypvc labels: app.kubernetes.io/instance: robin app.kubernetes.io/managed-by: robin.io app.kubernetes.io/name: robin spec: volumeSnapshotClassName: robin-snapshotclass source: persistentVolumeClaimName: mypvc
Run the following command to actually create the snapshot:
$ kubectl create -f take-snapshot.yaml volumesnapshot.snapshot.storage.k8s.io/snapshot-mypvc created
Lastly verify that the VolumeSnapshot for the PersistentVolumeClaim is created with the following command:
$ kubectl get volumesnapshot NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE snapshot-mypvc false mypvc robin-snapshotclass snapcontent-06c17c2b-e7bb-4dc9-86df-e5fd05821977 4m28s $ kubectl get volumesnapshotcontent NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE snapcontent-06c17c2b-e7bb-4dc9-86df-e5fd05821977 Delete robin robin-snapshotclass snapshot-mypvc 41s
4.5. Clone Volumes¶
Robin has the capability of creating a clone from a snapshot of a volume. We allow users to have RW clone so that old data can be read from parent snapshot and new data can be overwritten on the newly provisioned cloned volume. See more details on the official Kubernetes documentation on Volume Snapshot Restores and Clones. ROBIN supports Kubernetes Clones for Kubernetes v1.13 and beyond.
Note
The clone functionality is still an Alpha feature in Kubernetes so it requires VolumeSnapshotDataSource
feature gate be enabled on the apiserver and controller-manager. More documentation on how to enable the feature can be found here.
Clone a VolumeSnapshot:
Configure a YAML as below in order to clone a VolumeSnapshot.
Note
All
robin.io
annotations showcased below refer to the same options described in the Robin CNS StorageClass YAML.apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc-clone-snap1 #annotations: # robin.io/media: <SSD, HDD> # robin.io/replication: <"2", "3"> # robin.io/faultdomain: <disk, host> // default disk # robin.io/encryption: <CHACHA20, AES256, AES128> # robin.io/snapshot_space_limit: "50" // default 40%. Percentage of Vol size. spec: storageClassName: robin dataSource: name: mypvc-snap1 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
Create the clone by running the following command:
$ kubectl create -f take-clone.yaml persistentvolumeclaim/mypvc-clone-snap1 created
Confirm that the PersistentVolumeClaim for Clone is created:
One can verify that the clone was successfully created by issuing the following command:
$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Bound pvc-83ed719a-5500-11e9-a0b7-00155d320462 1Gi RWO robin 49m mypvc-clone-snap1 Bound pvc-6dd554d1-5506-11e9-a0b7-00155d320462 1Gi RWO robin 7m19s
4.6. Expand Volumes¶
Robin supports volume expansion. To expand a pvc do the following.
List the PersistentVolumes:
In order to list all the available PV’s available on the cluster, run the following command:
$ kubectl get pv -n robinapps NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-651722f9-dad2-4d62-85d9-de556bb8d555 8Gi RWO Delete Bound robinapps/mysqldb robin 14h
Edit the PersistentVolume:
Next we need to edit the desired PersistentVolume. Under the
spec
section change thestorage
attribute undercapacity
field to the desired value as hightlighted below:$ kubectl edit persistentVolume/pvc-651722f9-dad2-4d62-85d9-de556bb8d555 -n robinapps
------- # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: robin creationTimestamp: "2020-10-06T04:44:39Z" finalizers: - kubernetes.io/pv-protection - external-attacher/robin managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:finalizers: v:"external-attacher/robin": {} manager: csi-attacher operation: Update time: "2020-10-06T04:44:39Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:pv.kubernetes.io/provisioned-by: {} f:spec: f:accessModes: {} f:capacity: {} f:claimRef: .: {} f:apiVersion: {} f:kind: {} f:name: {} f:namespace: {} f:resourceVersion: {} f:uid: {} f:csi: .: {} f:driver: {} f:fsType: {} f:volumeAttributes: .: {} f:csi.storage.k8s.io/pv/name: {} f:csi.storage.k8s.io/pvc/name: {} f:csi.storage.k8s.io/pvc/namespace: {} f:storage.kubernetes.io/csiProvisionerIdentity: {} f:volumeHandle: {} f:persistentVolumeReclaimPolicy: {} f:storageClassName: {} f:volumeMode: {} manager: csi-provisioner operation: Update time: "2020-10-06T04:44:39Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:phase: {} manager: kube-controller-manager operation: Update time: "2020-10-06T04:44:39Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:capacity: f:storage: {} manager: kubectl operation: Update time: "2020-10-06T19:25:31Z" name: pvc-651722f9-dad2-4d62-85d9-de556bb8d555 resourceVersion: "4678372" selfLink: /api/v1/persistentvolumes/pvc-651722f9-dad2-4d62-85d9-de556bb8d555 uid: 0151065d-fd69-4479-85e2-d4c47c414a90 spec: accessModes: - ReadWriteOnce capacity: storage: 16Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: mysqldb namespace: robinapps resourceVersion: "4415500" uid: 651722f9-dad2-4d62-85d9-de556bb8d555 csi: driver: robin fsType: ext4 volumeAttributes: csi.storage.k8s.io/pv/name: pvc-651722f9-dad2-4d62-85d9-de556bb8d555 csi.storage.k8s.io/pvc/name: mysqldb csi.storage.k8s.io/pvc/namespace: robinapps storage.kubernetes.io/csiProvisionerIdentity: 1601911186270-8081-robin volumeHandle: "1601911167:5" persistentVolumeReclaimPolicy: Delete storageClassName: robin volumeMode: Filesystem status: phase: Bound ------- persistentvolume/pvc-651722f9-dad2-4d62-85d9-de556bb8d555 edited
Verify the change to the PersistentVolume:
Lastly confirm that the PersistantVolume’s capacity has been increased by running the following command:
$ kubectl get pv -n robinapps NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-651722f9-dad2-4d62-85d9-de556bb8d555 16Gi RWO Delete Bound robinapps/mysqldb robin 14h
4.7. Handling Disruptions¶
With Robin, highly available applications can be deployed on Kubernetes as ROBIN can handle failures of drives, rack or hosts automatically. On a Baremetal setup, volumes can be setup with a replication factor of 2 or 3 to ensure that storage is available even if a drive fails. Users can also choose the fault domain to be ‘host’ to protect against node reboots or lost.
However, in a public cloud environment the cloud disks can be detached from one cloud node and reattached to another one. For example, in AWS an EBS volume can be detached on one EC2 host and reattached to a different EC2 host. Same with GCP where a PD can be moved across GCE nodes. If a cloud node (EC2, GCE, Azure VM) is terminated or rebooted, one would want any cloud drive attached to them (EBS, PD, Block) to be moved to the one or more of the remaining healthy nodes automatically. This is not limited to just cloud disks, but also SAN LUNS that are offered to ROBIN as disks. The SAN LUNS can also be multi-mounted onto multiple nodes or moved around from node to node. User can still choose to replicate volume on public cloud as it takes sometime to detach and attach drives on cloud platforms.
Just having the storage available during a disruption will not help if Kubernetes can not access it from the Pod. For example a Kubernetes StatefulSet serializes the mounting and unmounting of a volume to protect against possible corruptions. ROBIN utilizes smart detection techniques to ensure that even if a volume is mounted on multiple nodes, it can differentiate the IOs issued from the previous stale mount and the new mount. With this consistency guarantees, ROBIN enables the Kubernetes StatefulSet to unmount a volume from a dead node and remount it on a healthy node where the Pod is scheduled to run. ROBIN actively monitors these events to allow for the fast failover of the Pods without user intervention and consequently enables users to reliably deploy highly available stateful applications on Kubernetes.