20. Release Notes¶

20.1. Robin Cloud Native Storage v5.4.16-167¶

The Robin CNS v5.4.16-167 Release Notes document provides information about upgrade path, a new feature, improvements, fixed issues, and known issues.

Release Date: January 12, 2026

20.1.1. Upgrade Path¶

The following is the supported upgrade path for Robin CNS v5.4.16-167:

Robin CNS v5.4.16-105 to Robin CNS v5.4.16-167

Note

After upgrading to Robin CNS v5.4.16-167, if you are using the Robin Client outside the robincli Pod, you must upgrade to the latest version of the Robin Client.
If you have installed Robin CNS with the skip_postgres_operator parameter to use the Zalando PostgreSQL operator, then you must first upgrade the Zalando PostgreSQL operator to v1.11.0 or later before upgrading to Robin CNS v5.4.16-167.

20.1.2. New Feature¶

20.1.2.1. Patroni with failsafe mode enabled¶

Starting with Robin CNS v5.4.16, by default the failsafe mode is enabled in Patroni 3.2.2.

The failsafe mode prevents the Patroni leader from demoting itself during temporary network disruptions or when it loses access to the Kubernetes control plane or etcd. It ensures continuous database availability by having uninterrupted communication with other Patroni replicas.

However, even when the failsafe mode is enabled, the Patroni leader will demote itself in the following scenarios:

Network partitions
DCS is down

20.1.3. Improvements¶

20.1.3.1. New Volume Metrics¶

Starting with Robin CNS v5.4.16, the following two new metrics are introduced. a new metric, robin_vol_psize is introduced.

robin_vol_psize
robin_vol_serving

robin_vol_psize

It represents the physical (or raw) storage space (in bytes) used by a single replica of the volume. This the metric provides further insight into storage consumption.

Example:

[robinmaster@master ~]# curl -k https://localhost:29446/metrics
robin_vol_rawused{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 134217728
robin_vol_size{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 1073741824
robin_vol_psize{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 67108864

In the above output, the value 67108864 for robin_vol_psize represents the physical (or raw) storage space (in bytes) used by a single replica of the volume.

robin_vol_serving

The robin_vol_serving metric reflects the health of volume replicas and whether the volume replicas are in sync, serving read and write operations.

The metric robin_vol_serving displays the following statuses:

Status 0 = Unknown
Status 1 = Serving
Status 2 = Degraded
Status 3 = Not Serving

Example:

robin_vol_serving{name="pvc-fad3245e-3a8c-42ab-a2ae-2b290bf450da",volid="5160"} 1

20.1.3.2. Improved Stability and Performance for Windows VMs¶

In Robin CNS, I/O operations can experience some latencies during storage device initialization (e.g., after a node restart or during recovery). This impacts application response and causes Windows VMs to freeze.

The following improvements are made in the Robin CNS to solve this issue:

Ensure that the volume mount and volume slice leader are on the same node for optimal I/O performance.
Fixed the Block map cache assertion to prevent IOMGR restarts.
Improved the Block map load time.
Optimized Garbage Collection (GC) for volumes with 512 block size.
Avoid IOMGR restarts during Robin master Pod failovers.

20.1.4. Fixed Issues¶

Reference ID	Description
RSD-8821	The `robin_vol_total_snapshot_count` metric is incorrectly displaying the snapshot count as 1 even though there are no snapshots. This issue is fixed.
RSD-8378	Several vulnerabilities related to Apache server is fixed.
RSD-9247	Volume detach and attach operations may take longer than expected after a node disconnection event, such as a node power-off or network disconnect. This issue is fixed.
RSD-8083	To ensure quicker failover, tasks related to device slice leader change have been optimized. This is especially beneficial during node reboots in environments with nodes containing many large devices, as following slice operations can now finish faster.
RSD-9886	The RPC client is dropping pending I/O requests without processing the received response. Because of these pending I/O requests, the virtual machine instance (VMI) Pod cannot reconcile the state and shows the following error: unknown error encountered sending command SyncVMI: rpc error: code = DeadlineExceeded desc = context deadline exceeded The issue of the VM instance hitting the SyncVMI reconciliation error is fixed.
RSD-9809, RSD-10087	The out-of-sync issue with Patroni Pods that led to a Robin Service Outage is fixed.
RSD-10021	The Patroni cluster might go down if some of the nodes are cordoned in a rare scenario. This issue is fixed.
RSD-10248	When a node abruptly powers off, Pods that use persistent volumes are rescheduled to other nodes. However, in some cases, some of these Pods fail to start on the new nodes with a Multi-Attach error because the volume remains exclusively attached to the powered-off node. When you run `kubectl get events`, you see the following error: Multi-Attach error for volume … Volume is already exclusively attached to one node and can’t be attached to another. This issue is fixed.
PP-38537	After deleting a backup, unregistering a storage repo fails with the following error message: Storage repo is associated with volume group This issue is fixed.

20.1.5. Known Issues¶

Reference ID

Description

PP-40480

Symptom

In rare scenarios, you might observe that one of the Pods is stuck in the ContainerCreating state, and the kubectl describe pod command shows the following volume mount error:

Failed to mount volume pvc-d16fa6b1-5bcb-4c69-805d-ab4df9018cee: Node <default:vnode-87-237> has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.

Workaround

Bounce the worker Pod running on the affected node.

PP-40715

Symptom

The IOMGR service fails to retry the volume remount operation due to the flaky Kubernetes API service when the cluster recovers from the network partition.

Steps to identify the issue:

Check whether any Pod is stuck in the Terminating state, and the kubectl describe pod command shows the following error:

error killing pod: [failed to “KillContainer” for “compute” with KillContainerError: “rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container to be killed: wait container: context deadline exceeded”, failed to “KillPodSandbox” for “6644d850-dcd3-4ee6-a66e-0950057fc711” with KillPodSandboxError: “rpc error: code = DeadlineExceeded desc = context deadline exceeded”]
Check whether the kubectl logs command shows the following error logs:

{“component”:”virt-launcher”,”level”:”error”,”msg”:”Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetJobStats)”, “pos”:”virDomainObjBeginJobInternal:467”,”subcomponent”:”libvirt”,”thread”:”30”}

Workaround

If you notice the above error, restart the IOMGR server on the node where the volume remount operation failed:

# supervisorctl restart iomgr-server

20.1.6. Technical Support¶

Contact Robin Technical support for any assistance.