20. Release Notes

20.1. Robin Cloud Native Storage v6.1.0

20.1.1. Robin CNS v6.1.0 Release Notes

The Robin CNS v6.1.0-203 Release Notes document provides information about upgrade paths, new features, improvements, fixed issues, and known issues.

Release Date: June 01, 2026

20.1.2. Upgrade Paths

The following are the supported upgrade paths for Robin CNS v6.1.0-203:

  • Robin CNS v5.4.18-257 to Robin CNS v6.1.0-203

  • Robin CNS v6.0.0-226 to Robin CNS v6.1.0-203

  • Robin CNS v6.0.0-211 to Robin CNS v6.1.0-203

Note

  • After upgrading to Robin CNS v6.1.0-203, if you are using the Robin Client outside the robincli pod, you must upgrade to the latest version of the Robin Client.

  • If you have installed Robin CNS with the skip_postgres_operator parameter to use the Zalando PostgreSQL operator, then you must first upgrade the Zalando PostgreSQL operator to v1.11.0 or later before upgrading to Robin CNS v6.1.0-203.

  • After upgrading from any supported Robin CNS version, starting with Robin CNS v5.4.18, certificates will be automatically renewed.

20.1.3. New Features

20.1.3.1. New migration-webhook Deployment for API conversion

Starting with Robin CNS v6.1.0, during the upgrade process, Robin CNS automatically deploys a new migration-webhook. This webhook handles the conversion between the manage.robin.io/v1 and manage.robin.io/v2 versions of RobinCluster CRs, allowing both API versions to coexist after the upgrade without any manual intervention.

During conversion, Robin CNS automatically maps the following v1 spec fields to their custom_config equivalents when the CR is converted to v2:

v1 field

Mapped to in v2

spec.requests / spec.limits

custom_config.robin-iomgr.containers.iomgr.resources

spec.master_tolerations

custom_config.robin-master.tolerations

spec.master_node_selector

custom_config.robin-master.node_selector

spec.patroni_tolerations

custom_config.robin-patroni.tolerations

spec.patroni_node_selector

custom_config.robin-patroni.node_selector

For more information, see Migration Webhook Deployment.

20.1.3.2. Skip Robin StorageClasses and CSI components deployment

Starting with Robin CNS v6.1.0, you can skip the deployment of the following CSI (Container Storage Interface) components and Robin StorageClasses during installation:

CSI components

  • CSI Provisioner

  • CSI Attacher

  • CSI Resizer

  • CSI Snapshotter

  • CSI Nodeplugin

Robin StorageClasses

  • robin

  • robin-repl-3

  • robin-immediate

  • robin-rwx

This option is useful when CSI components and Robin StorageClasses should not be deployed on the cluster, such as when using an alternative storage provisioner or where CSI is not needed.

To skip deployment of CSI components and Robin StorageClasses on a Robin CNS cluster, you must uncomment the following parameter in the Options section of the robin.yaml file:

  • skip_csi_deploy

Note

This option is applicable only for new installations, not for upgrades.

The following are the valid values:

  • 0 - Deploy CSI components and Robin StorageClasses. This is the default value.

  • 1 - Skip deployment of CSI components and Robin StorageClasses.

Example

spec:
  options:
    skip_csi_deploy: "1"

For more information, see the Install options.

20.1.3.3. Metric for volume data locality

Starting with Robin CNS v6.1.0, the following new Prometheus metric is added for volume data locality:

  • robin_vol_mount_data_locality

The metric value is an integer (0–100) representing the percentage of volume data physically stored on the node where the volume is currently mounted. A value of 100 indicates full data locality.

robin_vol_mount_data_locality{name="pvc-xx", volid="101", mount_node="worker-1", mount_node_id="1"} 85

For more information, see Volume Data Locality.

20.1.3.4. Metrics for Stormgr, RIO, and RDVM process lifecycle

Starting with Robin CNS v6.1.0, new metrics are added to track the process lifecycle of the following Robin storage daemons:

  • Store Manager (Stormgr) on the master node

  • Robin I/O (RIO) on each worker node

  • Robin Distributed Volume Manager (RDVM) on each worker node

The following are the newly added metrics:

Metric

Description

robin_stormgr_start_time_epoch_seconds

Unix timestamp (in seconds) when Stormgr last started.

robin_stormgr_ready_time_epoch_seconds

Unix timestamp (in seconds) when Stormgr reached the ready state after its last start.

robin_rio_start_time_epoch_seconds

Unix timestamp (in seconds) when the RIO process last started.

robin_rio_ready_time_epoch_seconds

Unix timestamp (in seconds) when the RIO completed initialization after its last start.

robin_rdvm_start_time_epoch_seconds

Unix timestamp (in seconds) when the RDVM process last started.

robin_rdvm_ready_time_epoch_seconds

Unix timestamp (in seconds) when the RDVM completed initialization after its last start.

Note

All metrics carry an instance label set to the node’s hostname.

Example of Prometheus queries:

  • Detect iomgr restarts: changes(robin_iomgr_start_time_epoch_seconds[5m]) > 0

  • Detect stormgr restarts: changes(robin_stormgr_start_time_seconds[5m]) > 0

For more information, see Storage Manager Metrics.

20.1.3.5. Support to customize Pod configurations for Robin components

Starting with Robin CNS v6.1.0, RobinCluster CR v2 includes a custom_config field that allows you to customize the pod configuration for the following Robin components independently:

Supported Robin components

  • robin-patroni

  • postgres-operator

  • robin-master

  • robin-worker

  • robin-iomgr

  • robin-tcmuproxy

  • csi-nodeplugin-robin

  • csi-attacher-robin

  • csi-provisioner-robin

  • csi-resizer-robin

  • csi-snapshotter-robin

Configurable settings per component

  • Tolerations — Add pod tolerations to schedule component pods on tainted nodes.

  • Node selector — Constrain component pods to nodes matching specific labels.

  • Pod security context — Set pod-level security settings such as runAsUser, runAsGroup, and fsGroup.

  • Container security context — Set container-level security settings such as allowPrivilegeEscalation.

  • Resource requests and limits — Set CPU and memory requests and limits per container.

  • Pod annotations — Add Pod annotations such as AppArmor profiles (container.apparmor.security.beta.kubernetes.io/...).

Example

spec:
  custom_config:
    robin-patroni:
      annotations:
        container.apparmor.security.beta.kubernetes.io/patroni: runtime/default
    robin-worker:
      tolerations:
        key: dedicated
          operator: Equal
          value: storage
          effect: NoSchedule
    robin-master:
      containers:
        robinrcm:
          resources:
            requests:
              cpu: "500m"
              memory: "2Gi"
            limits:
              cpu: "2"
              memory: "8Gi"

For more information, see Custom Pod Configuration.

20.1.3.6. New metrics

Starting with Robin CNS v6.1.0, the following metrics are added for the Prometheus scraping:

  • robin_manager_services_robin_dbconnect - The status of the robin-dbconnect service. Value 0 = up and 1 = down.

  • robin_vol_iostall_pending_ios - Reflects the pending IOs on a volume.

For more information, see Storage Metrics.

20.1.3.7. Automatic log collection cleanup with TTL

Starting with Robin CNS v6.1.0, Robin CNS supports automatic cleanup of log collections based on a configurable Time-To-Live (TTL) retention period. This feature helps in managing storage space by automatically removing old log collections after their retention period expires.

Key features

  • Default retention period (7 days): Log collections are automatically cleaned up after the default retention period (7 days).

  • Configurable retention period: You can update the default retention period of log collections cluster-wide using the log_collection_retention_days config attribute. If you set it to 0, then log collections are retained indefinitely.

  • Scheduled cleanup: An automated cleanup job runs daily at 2:00 AM to remove expired log collections.

  • Expiration tracking: The robin log list command displays the expiration date for each log in the Expires At column.

To update the retention period for log collections cluster-wide, run the following command:

# robin config update log_collection_retention_days 14

Note

The TTL mechanism applies only to log collections of type LOG_COLLECTION. The existing log collections created before the upgrade do not have an expiration date and are retained until you manually delete them.

20.1.3.8. TLS encryption for RWX volumes

Starting with Robin CNS v6.1.0, Robin CNS supports in-transit encryption for ReadWriteMany (RWX) volumes using the mutual TLS protocol. It uses Network File System (NFS) as the underlying transport for volumes. When this feature is enabled, a Stunnel sidecar container is deployed in the NFS server pod to encrypt traffic between the NFS server and all mounting clients.

Note

This feature is applicable only for exclusive NFS server Pods, not for shared NFS server Pods.

Prerequisites:

  • An exclusive NFS server Pod.

  • Linux kernel 6.x or later must be running on all nodes.

  • The tls kernel module must be loaded on all nodes.

Configuration methods:

The following are the methods to enable TLS encryption for RWX volumes:

  • Cluster level: To configure at the cluster level, run the robin config update nfs tls_state enable command.

  • StorageClass level: To configure at the StorageClass level, create a StorageClass using tls_state: "enable" and nfs-server-type: "exclusive" parameters and then create RWX volumes using this StorageClass.

  • Volume level: To configure at the volume level for an individual volume, run the robin nfs_export update <volume_name> --tls_state enable command.

Note

StorageClass-level encryption overrides the cluster-level encryption.

For more information, see Enable TLS Encryption for RWX Volumes.

20.1.4. Improvements

20.1.4.1. Kubernetes RBAC hardening

Starting with Robin CNS v6.1.0, Kubernetes privileges are reduced for Robin components.

Robin Operator

The Robin Operator ServiceAccount no longer uses the cluster-admin ClusterRoleBinding. It now uses a specific ClusterRole that grants only the permissions required for installation and runtime operations.

Dedicated ServiceAccounts per component

Each Robin component uses its own dedicated ServiceAccount that grants only the minimum required Kubernetes API permissions:

Component

New Permission Scope

robin-operator

Scoped ClusterRole (operator ops only)

robin-worker

Namespace-scoped + PV/PVC read

robin-iomgr

Namespace-scoped (configmaps, pods/exec)

robin-file-server

Namespace-scoped read-only

robin-monitor

No Kubernetes API access

robin-patroni-monitor

No Kubernetes API access

robin-tcmuproxy

No Kubernetes API access

robin-nfs-server

No Kubernetes API access

cAdvisor

No Kubernetes API access

Auto-mounting disabled for ServiceAccount token

The automountServiceAccountToken parameter is set to false for Robin components that do not require Kubernetes API access. This setting prevents the API server credential from mounting to the container filesystem, which limits potential security risks if a container is compromised.

Privilege escalation disabled for CSI sidecar containers

The securityContext for non-privileged CSI sidecar containers (csi-provisioner, csi-resizer, and csi-snapshotter) is set to allowPrivilegeEscalation: false.

Note

The new ServiceAccounts and their RBAC bindings are created automatically by the Robin Operator during upgrade. No manual steps are required.

20.1.4.2. Monitor Thick volume clone hydration status

Robin CNS 6.1.0 provides a feature for you to monitor the status of the volume clone hydration operation using PersistentVolumeClaim (PVC) annotations and the Robin CLI. To track a thick volume clone hydration status, you must add the required annotations to the PVC YAML file before you create the clone.

Annotations for monitoring

Use the following annotations to track hydration details:

  • robin.io/hydration-status: Indicates the current state, such as in-progress, completed, or failed.

  • robin.io/hydration-progress: Displays the completion percentage, such as 25%.

  • robin.io/hydration-start-time: The timestamp when the hydration began.

  • robin.io/hydration-end-time: The timestamp when the hydration finished.

  • robin.io/hydration-error: Displays a detailed error message if the operation fails.

Note

You can run kubectl describe pvc <pvc-name> command to view the status using annotations. Also, the robin volume info <volume-id> command displays the percentage of hydration completion status.

For more information, see Monitor hydration status

20.1.4.3. Added execution timestamps to Robin CLI output

Starting with Robin CNS v6.1.0, to improve issue triaging and troubleshooting, the Robin CLI now provides an execution timestamp. To view the timestamp, you must pass the --urlinfo option with your command.

The timestamp is generated by the server and uses the YYYY-MM-DD HH:MM:SS,mmm format (for example, 2026-05-10 23:45:06,067).

Robin containers such as robin-master, robin-worker, and iomgr also display the system time and timezone by default before the container name.

Example

[2026-05-27 11:56:13 PDT][robinmaster@master ~]#

20.1.5. Fixed Issues

Reference ID

Description

RSD-11687

When removing an active master node from a Robin CNS cluster after a failover, the robin-master Pod is continuously crashing with the following error while the HostRemove job is running: KeyError: ‘HostVolumeDrain’. This issue is fixed.

RSD-11556

The issue of the robin-bootstrap failing when the UserAdd job for the admin user is interrupted by the robin-master pod failover or the robin-server restart during initial deployment is fixed.

RSD-11499

The issue of data being written to a raw block device mounted as a ReadOnlyMany (ROX) volume is fixed.

RSD-11022

The issue of the robin-bootstrap process entering an infinite retry loop when the Robin master fails over to a node with a faulted root disk is fixed. Robin CNS now limits the number of bootstrap restart attempts. If the threshold is reached, it will automatically fail over to a healthy master node.

RSD-11726

After an application failover, the Pod got stuck in a restart loop, which triggered continuous volume mount and unmount operations. This caused high memory usage in the csi-nodeplugin process during the NodePublish volume operations. This issue is fixed.

RSD-11755

The robin cert renew command was failing if the robin_image_registry_prefix option was not set in the robin-operator YAML. This issue is fixed. The robin cert renew command uses the image_registry_path option set in the robin-operator YAML.

RSD-10232

During thick clone operations, a Persistent Volume Claim (PVC) might transition to the Bound state while volume hydration is still running. This issue is fixed and the PVC will move to the Bound state only after hydration is completed.

RSD-11263

If a volume deletion fails, for example, due to an active snapshot still being mounted, the command incorrectly displays The volume is deleted successfully along with a failure message. The issue is fixed and the CLI now accurately reflects the result of the deletion operation.

RSD-11857

Brief network disruptions during Robin installation or upgrade would cause the process to stall and take longer to complete. This issue is fixed.

PP-39467

When deploying applications with RWX PVCs, application Pods fail to mount volumes and get stuck in the ContainerCreating state because RPC requests are stuck in I/O operations on the volumes, leading to degraded volumes and faulted storage drives. This issue is fixed.

RSD-11589

In a rare scenario, when a node is unresponsive or in the frozen state, the I/O operations might hang. This issue is fixed. Robin CNS now enforces a default client I/O timeout of 5 seconds to ensure that unhealthy nodes do not cause application inconsistency.

20.1.6. Known Issues

Reference ID

Description

PP-40480

Symptom

In a rare scenario, you might observe that one of the Pods is stuck in the ContainerCreating state, and the kubectl describe pod command shows the following volume mount error:

Failed to mount volume pvc-d16fa6b1-5bcb-4c69-805d-ab4df9018cee: Node <default:vnode-87-237> has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.

Workaround

Bounce the worker Pod running on the affected node.

PP-39632

Symptom

After upgrading to Robin CNS v6.0.0, NFS client might hang with no pending IO message.

For no pending IO, refer this path : /var/log/robin/nodeplugin/robin-csi.log with the following message:

CsiServer_9 - robin.utils - INFO - Executing command
/usr/bin/nc -z -w 6 2049 with timeout 60 seconds
CsiServer_9 - robin.utils - INFO - Command
/usr/bin/nc -z -w 6 2049 completed with return code 0.
CsiServer_9 - robin.utils - INFO - Standard out:

Also, you can find the following message in the dmesg:

nfs: server 192.02.1.218 not responding, timed out
nfs: server 192.02.1.218 not responding, timed out
nfs: server 192.02.1.218 not responding, timed out

Workaround

  1. Check the node provisioner logs where the PVC is checking for the path and it is hung.

  2. For the deployment/statefulset that is using the problematic PVC, scale down the replica count to 0.

  3. Ensure all Pods associated with the application have terminated.

  4. Scale up the replica count back to the original value.

PP-42418

Symptom

If you try to delete a volume snapshot after a failed clone operation, the kubectl delete volumesnapshot command might hang indefinitely.

This issue occurs because the Kubernetes finalizer provisioner.storage.kubernetes.io/volumesnapshot-as-source-protection isn’t removed if the associated Persistent Volume Claim (PVC) is deleted while in an unbound or unprovisioned state.

Make sure that you have already deleted the underlying snapshot in Robin.

When this issue occurs, the following warning appears:

Snapshot is being used to restore a PVC.

Workaround

To delete the snapshot, manually remove the finalizer by following these steps:

  1. Remove the finalizer from the snapshot by running the following command:

    # kubectl patch volumesnapshot <snapshot_name> \
    -p '{"metadata":{"finalizers":[]}}' \
    --type=merge
    

    Replace snapshot_name with the name of your snapshot.

  2. Delete the snapshot:

    # kubectl delete volumesnapshot <snapshot_name> -n <namespace>
    

PP-34414

Symptom

In a rare scenario, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issues:

  • All app Pods restart, and some app Pods get stuck in the ContainerCreating state.

To confirm the above issues, complete the following steps:

  1. Check for the EVENT_DISK_FAULTED event type in the disk events:

    # robin event list --type EVENT_DISK_FAULTED
    
  2. If you see the disk is faulted error, check the IOMGR logs for dev_open() error and Failed to exclusively open message on the node where disks are present.

    # cat iomgr.log.0 | grep <device> | grep "dev_open"
    
  3. If you see the Device or resource busy error in the log file, use fuser command with the device path to confirm whether the device is in use:

    # fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5
    

Workaround

If the device is not in use, restart the IOMGR service on the respective node:

# supervisorctl restart iomgr-server

PP-42723

During the Robin CNS v6.1.0 installation process, the robin-patroni-pre-install-hook Pod might go into the Pending state and the installation gets stuck. If you observe this issue, apply the following workaround.

Workaround

  1. Check the robin-operator logs for the following message:

Failed to deploy patroni helm chart.

  1. Find the node that has the patroni-pre-install-hook.

    # kubectl get node -l=patroni-pv-node=true
    
  2. Identify which node is hosting the robin-patroni-pre-install-hook Pod that is stuck in Pending status.

    # kubectl get pods -n robinio | grep pre-install
    
  3. Run the following command to remove the has-run-patroni-pre-install-job` label on the node associated with the pending Pod to allow it to reschedule.

    # kubectl label node <node-name> has-run-patroni-pre-install-job-
    

PP-42666

Symptom

When you try to expand a volume, a discrepancy can occur where the Kubernetes API reports a successful resize, but the actual block device or file system inside the Pod remains at the original size. This issue typically occurs during mounted volume operations under conditions of storage timeouts or network instability.

If you observe this issue, you can apply the following workarounds based on the volume type.

Workaround

RWX volume resize for file system volumes

If an ReadWriteMany (RWX) volume expansion fails or fails to update the filesystem size inside the pod despite a successful Kubernetes resize, follow these steps to manually update the volume:

  1. Identify the PersistentVolumeClaim (PVC) name.

    # kubectl describe pod <application-pod>
    
  2. Find the actual volume name associated with the PVC.

    # kubectl get pvc <pvc-name>
    
  3. Locate the NFS server pod name.

    # robin nfs export-list
    
  4. Cordon the node that is running the NFS server pod.

    # kubectl cordon <node-name>
    
  5. Restart the NFS server Pod by deleting it.

    # kubectl delete pod <nfs-server-pod>
    
  6. Restart the application Pod by deleting it.

    # kubectl delete pod <application-pod>
    
  7. Uncordon the node that you cordoned in step 4.

    # kubectl uncordon <node-name>
    
  8. Edit your PVC YAML file and increase the volume size by 1GB more than your initial size. (For example, if you were trying to increase the volume size to 5GB initially, now increase it to 6GB.)

  9. Apply the PVC YAML file.

    # kubectl apply -f <your-pvc-file.yaml>
    
  10. Run lsblk to check the updated size.

RWO volume resize for file system volumes

If an ReadWriteOnce (RWO) file system volume expansion fails or fails to update the filesystem size inside the pod despite a successful Kubernetes resize, follow these steps to manually update the volume:

  1. Cordon the node where the volume is mounted:

    # kubectl cordon <node-name>
    
  2. Bounce the application Pod.

    # kubectl delete pod <app-pod-name>
    
  3. Uncordon the node that you cordoned in step 1.

    # kubectl uncordon <node-name>
    
  4. Identify the PVC/volume name.

    • Find the PVC claim name.

      # kubectl describe pod <pod-name>
      
    • Find the volume name.

      # kubectl get pvc <claim-name>
      
  5. Edit your PVC YAML file and increase the volume size by 1GB more than your initial size. (For example, if you were trying to increase the volume size to 5GB initially, now increase it to 6GB.)

  6. Apply the PVC YAML file.

    # kubectl apply -f <your-pvc-file.yaml>
    
  7. Verify the updated size inside the application Pod.

    # kubectl exec -it <app-pod-name> -- df -Th
    

RWO volume resize for block volumes

If an RWO block volume expansion fails or fails to update the filesystem size inside the Pod despite a successful Kubernetes resize, follow these steps to manually update the volume:

  1. Cordon the node where the volume is mounted.

    # kubectl cordon <node-name>
    
  2. Bounce the application Pod.

    # kubectl delete pod <app-pod-name>
    
  3. Uncordon the node that you cordoned in step 1.

    #  kubectl uncordon <node-name>
    
  4. Run lsblk to check the updated size.

20.1.7. Technical Support

Contact Robin Technical support for any assistance.