20. Release Notes

20.1. Robin Cloud Native Storage v5.4.16-163

The Robin CNS v5.4.16-163 Release Notes document provides information about upgrade path, a new feature, improvements, fixed issues, and known issues.

Release Date: December 15, 2025

20.1.1. Upgrade Path

The following is the supported upgrade path for Robin CNS v5.4.16-163:

  • Robin CNS v5.4.16-105 to Robin CNS v5.4.16-163

Note

  • After upgrading to Robin CNS v5.4.16-163, if you are using the Robin Client outside the robincli Pod, you must upgrade to the latest version of the Robin Client.

  • If you have installed Robin CNS with the skip_postgres_operator parameter to use the Zalando PostgreSQL operator, then you must first upgrade the Zalando PostgreSQL operator to v1.11.0 or later before upgrading to Robin CNS v5.4.16-163.

20.1.2. New Feature

20.1.2.1. Patroni with failsafe mode enabled

Starting with Robin CNS v5.4.16, by default the failsafe mode is enabled in Patroni 3.2.2.

The failsafe mode prevents the Patroni leader from demoting itself during temporary network disruptions or when it loses access to the Kubernetes control plane or etcd. It ensures continuous database availability by having uninterrupted communication with other Patroni replicas.

However, even when the failsafe mode is enabled, the Patroni leader will demote itself in the following scenarios:

  • Network partitions

  • DCS is down

20.1.3. Improvements

20.1.3.1. New Volume Metrics

Starting with Robin CNS v5.4.16, the following two new metrics are introduced. a new metric, robin_vol_psize is introduced.

  • robin_vol_psize

  • robin_vol_serving

robin_vol_psize

It represents the physical (or raw) storage space (in bytes) used by a single replica of the volume. This the metric provides further insight into storage consumption.

Example:

[robinmaster@master ~]# curl -k https://localhost:29446/metrics
robin_vol_rawused{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 134217728
robin_vol_size{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 1073741824
robin_vol_psize{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 67108864

In the above output, the value 67108864 for robin_vol_psize represents the physical (or raw) storage space (in bytes) used by a single replica of the volume.

robin_vol_serving

The robin_vol_serving metric reflects the health of volume replicas and whether the volume replicas are in sync, serving read and write operations.

The metric robin_vol_serving displays the following statuses:

  • Status 0 = Unknown

  • Status 1 = Serving

  • Status 2 = Degraded

  • Status 3 = Not Serving

Example:

robin_vol_serving{name="pvc-fad3245e-3a8c-42ab-a2ae-2b290bf450da",volid="5160"} 1

20.1.3.2. Improved Stability and Performance for Windows VMs

In Robin CNS, I/O operations can experience some latencies during storage device initialization (e.g., after a node restart or during recovery). This impacts application response and causes Windows VMs to freeze.

The following improvements are made in the Robin CNS to solve this issue:

  • Ensure that the volume mount and volume slice leader are on the same node for optimal I/O performance.

  • Fixed the Block map cache assertion to prevent IOMGR restarts.

  • Improved the Block map load time.

  • Optimized Garbage Collection (GC) for volumes with 512 block size.

  • Avoid IOMGR restarts during Robin master Pod failovers.

20.1.4. Fixed Issues

Reference ID

Description

RSD-8821

The robin_vol_total_snapshot_count metric is incorrectly displaying the snapshot count as 1 even though there are no snapshots. This issue is fixed.

RSD-8378

Several vulnerabilities related to Apache server is fixed.

RSD-9247

Volume detach and attach operations may take longer than expected after a node disconnection event, such as a node power-off or network disconnect. This issue is fixed.

RSD-8083

To ensure quicker failover, tasks related to device slice leader change have been optimized. This is especially beneficial during node reboots in environments with nodes containing many large devices, as following slice operations can now finish faster.

RSD-9886

The RPC client is dropping pending I/O requests without processing the received response. Because of these pending I/O requests, the virtual machine instance (VMI) Pod cannot reconcile the state and shows the following error:

unknown error encountered sending command SyncVMI: rpc error: code = DeadlineExceeded desc = context deadline exceeded

The issue of the VM instance hitting the SyncVMI reconciliation error is fixed.

RSD-9809, RSD-10087

The out-of-sync issue with Patroni Pods that led to a Robin Service Outage is fixed.

RSD-10021

The Patroni cluster might go down if some of the nodes are cordoned in a rare scenario. This issue is fixed.

RSD-10248

When a node abruptly powers off, Pods that use persistent volumes are rescheduled to other nodes. However, in some cases, some of these Pods fail to start on the new nodes with a Multi-Attach error because the volume remains exclusively attached to the powered-off node. When you run kubectl get events, you see the following error:

Multi-Attach error for volume … Volume is already exclusively attached to one node and can’t be attached to another.

This issue is fixed.

PP-38537

After deleting a backup, unregistering a storage repo fails with the following error message:

Storage repo is associated with volume group

This issue is fixed.

20.1.5. Known Issues

Reference ID

Description

PP-40480

Symptom

In rare scenarios, you might observe that one of the Pods is stuck in the ContainerCreating state, and the kubectl describe pod command shows the following volume mount error:

Failed to mount volume pvc-d16fa6b1-5bcb-4c69-805d-ab4df9018cee: Node <default:vnode-87-237> has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.

Workaround

Bounce the worker Pod running on the affected node.

PP-40715

Symptom

The IOMGR service fails to retry the volume remount operation due to the flaky Kubernetes API service when the cluster recovers from the network partition.

Steps to identify the issue:

  1. Check whether any Pod is stuck in the Terminating state, and the kubectl describe pod command shows the following error:

    error killing pod: [failed to “KillContainer” for “compute” with KillContainerError: “rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container to be killed: wait container: context deadline exceeded”, failed to “KillPodSandbox” for “6644d850-dcd3-4ee6-a66e-0950057fc711” with KillPodSandboxError: “rpc error: code = DeadlineExceeded desc = context deadline exceeded”]

  2. Check whether the kubectl logs command shows the following error logs:

{“component”:”virt-launcher”,”level”:”error”,”msg”:”Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats)”, “pos”:”virDomainObjBeginJobInternal:467”,”subcomponent”:”libvirt”,”thread”:”30”}

Workaround

If you notice the above error, restart the IOMGR server on the node where the volume remount operation failed:

# supervisorctl restart iomgr-server

20.1.6. Technical Support

Contact Robin Technical support for any assistance.