25. Release Notes¶

25.1. Robin Cloud Native Platform v5.6.0¶

The Robin Cloud Native Platform (CNP) v5.6.0 release notes has pre- and post-upgrade steps, new features, improvements, fixed issues, and known issues.

Release Date: June 25, 2025

25.1.1. Infrastructure Versions¶

The following software applications are included in this CNP release:

Software Application	Version
Kubernetes	1.32.4
Docker	25.0.2
Prometheus	2.39.1
Prometheus Adapter	0.10.0
Node Exporter	1.4.0
Calico	3.28.2
HAProxy	2.4.7
PostgreSQL	14.12
Grafana	9.2.3
CRI Tools	1.32.0

25.1.2. Supported Operating Systems¶

The following are the supported operating systems and kernel versions for Robin CNP v5.6.0:

OS Version	Kernel Version
RHEL 8.10	4.18.0-553.el8_10.x86_64
Rocky Linux 8.10	4.18.0-553.el8_10.x86_64

25.1.3. Upgrade Paths¶

The following are the supported upgrade paths for Robin CNP v5.6.0:

Robin CNP v5.4.3 HF4 to Robin CNP v5.6.0-128
Robin CNP v5.4.3 HF4 PP2 to Robin CNP v5.6.0-128
Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128
Robin CNP v5.4.3 HF5 PP1 to Robin CNP v5.6.0-128

25.1.3.1. Pre-upgrade considerations¶

For a successful upgrade, you must run the possible_job_stuck.py script before and after the upgrade. Contact the Robin Support team for the upgrade procedure using the script.
When upgrading from supported Robin CNP versions to Robin CNP v5.6.0, if your cluster already has cert-manager installed, you must uninstall it before upgrading to Robin CNP v5.6.0.

25.1.3.2. Post-upgrade considerations¶

After upgrading to Robin CNP v5.6.0, you must run the robin schedule update K8sResSync k8s_resource_sync 60000 command to update the robin schedule K8sResSync.
After upgrading to Robin CNP v5.6.0, you must run the robin-server validate-role-bindings command. To run this command, you need to log in to the robin-master Pod. This command verifies the roles assigned to each user in the cluster and corrects them if necessary.
After upgrading to Robin CNP v5.6.0, the k8s_auto_registration config parameter is disabled by default. The config setting is deactivated to prevent all Kubernetes apps from automatically registering and consuming resources. The following are the points you must be aware of with this change:
- You can register the Kubernetes apps using the robin app register command manually and use Robin CNP for snapshots, clones, and backup operations of the Kubernetes app.
- As this config parameter is disabled, when you run the robin app nfs-list command, the mappings between Kubernetes apps and NFS server Pods are not listed in the command output.
- If you need mapping between Kubernetes app and NFS server Pod when the k8s_auto_registration config parameter is disabled or the k8s app is not manually registered, get the PVC name from the Pod YAML file (kubectl get pod -n <name> -o YAML) and run the robin nfs export list | grep <pvc name> command.
- The robin nfs export list command output displays the PVC name and namespace.

25.1.3.3. Pre-upgrade steps¶

Upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128

Before upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, perform the following steps:

Update the value of the suicide_threshold config parameter to 1800:
```
# robin config update agent suicide_threshold 1800
```

Disable the NFS Server Monitor schedule:

# robin schedule disable "NFS Server" Monitor

Set the toleration seconds for all NFS server Pods to 86400 seconds. After upgrade, you must change the tolerations seconds according to the post-upgrade steps.

for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds to 86400";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done

Upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0-128

Before upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0, perform the following steps:

Update the value of the suicide_threshold config parameter to 1800:
```
# robin config update agent suicide_threshold 1800
```

Set the NFS Server schedule CronJob to at least more than 6 months:

# rbash master
# rsql
# update schedule set kwargs='{"cron":"1 1 1 1 *"}' where callback='nfs_server_monitor';
# \q
# systemctl restart robin-server

Set the toleration seconds for all NFS server Pods to 86400 seconds. After upgrade, you must change the tolerations seconds according to the post-upgrade steps.

for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds to 86400";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done

25.1.3.4. Post-upgrade steps¶

After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128

After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, perform the following steps:

Update the value of the suicide_threshold config parameter to 40:
```
# robin config update agent suicide_threshold 40
```

Enable the NFS Server Monitor schedule:

# robin schedule enable "NFS Server" Monitor

Set the check_helm_apps config parameter to False:

# robin config update cluster check_helm_apps False

Set the chargeback_track_k8s_resusage config parameter to False:

# robin config update server chargeback_track_k8s_resusage False

Set the robin_k8s_extension config parameter to True:

# robin config update manager robin_k8s_extension True

Verify whether the following mutating webhooks are present:

# kubectl get mutatingwebhookconfigurations -A | grep robin
k8srobin-deployment-mutating-webhook   1          20d
k8srobin-ds-mutating-webhook           1          20d
k8srobin-pod-mutating-webhook          1          20d
k8srobin-sts-mutating-webhook          1          20d
robin-deployment-mutating-webhook      1          20d
robin-ds-mutating-webhook              1          20d
robin-pod-mutating-webhook             1          20d
robin-sts-mutating-webhook             1          20d

If above k8srobin-* mutating webhooks are not present then bounce the robink8s-serverext Pods:
```
# kubectl delete pod -n robinio -l app=robink8s-serverext
```

Verify whether the following validating webhooks are present:

# kubectl get validatingwebhookconfigurations
NAME                             WEBHOOKS   AGE
cert-manager-webhook             1          45h
controllers-validating-webhook   1          31h
ippoolcr-validating-webhook      1          31h
namespaces-validating-webhook    1          31h
pods-validating-webhook          1          31h
pvcs-validating-webhook          1          31h

If robin-* mutating webhooks displayed in the step 6 output and validating webhooks displayed in the step 8 output are not present on your setup, then restart the robin-server-bg service:
```
# rbash master
# supervisorctl restart robin-server-bg
```

Set the toleration seconds for all NFS server Pods to 60 seconds when the node is in the notready state and set to 0 seconds, when the node is unreachable state.

for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null

After upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0-128

After upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0, perform the following steps:

Update the value of the suicide_threshold config parameter to 40:
```
# robin config update agent suicide_threshold 40
```

Enable the NFS Server Monitor schedule:

# robin schedule enable "NFS Server" Monitor

Set the check_helm_apps config parameter to False:

# robin config update cluster check_helm_apps False

Set the chargeback_track_k8s_resusage config parameter to False:

# robin config update server chargeback_track_k8s_resusage False

Set the robin_k8s_extension config parameter to True:

# robin config update manager robin_k8s_extension True

Delete the NFS Server schedule CronJob and restart the robin-server and robin-server-bg services:

# rbash master
# rsql
# DELETE from schedule where callback='nfs_server_monitor';
# \q
# supervisorctl restart robin-server
# supervisorctl restart robin-server-bg

Verify whether the following mutating webhooks are present:

# kubectl get mutatingwebhookconfigurations -A | grep robin
k8srobin-deployment-mutating-webhook   1          20d
k8srobin-ds-mutating-webhook           1          20d
k8srobin-pod-mutating-webhook          1          20d
k8srobin-sts-mutating-webhook          1          20d
robin-deployment-mutating-webhook      1          20d
robin-ds-mutating-webhook              1          20d
robin-pod-mutating-webhook             1          20d
robin-sts-mutating-webhook             1          20d

If above k8srobin-* mutating webhooks are not present then bounce the robink8s-serverext Pods:
```
# kubectl delete pod -n robinio -l app=robink8s-serverext
```

Verify whether the following validating webhooks are present:

# kubectl get validatingwebhookconfigurations
NAME                             WEBHOOKS   AGE
cert-manager-webhook             1          45h
controllers-validating-webhook   1          31h
ippoolcr-validating-webhook      1          31h
namespaces-validating-webhook    1          31h
pods-validating-webhook          1          31h
pvcs-validating-webhook          1          31h

If robin-* mutating webhooks displayed in the step 7 output and validating webhooks displayed in the step 9 output are not present on your setup, then restart the robin-server-bg service:
```
# rbash master
# supervisorctl restart robin-server-bg
```

Set the toleration seconds for all NFS server Pods to 60 seconds when the node is in the notready state and set to 0 seconds, when the node is unreachable state.

for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null

25.1.4. New Features¶

25.1.4.1. Robin Certificate Management¶

Starting with Robin CNP v5.6.0, you can manage all certificates for your cluster without manual intervention using the Robin certificate management feature. Robin CNP uses the functionality of cert-manager for this feature. The cert-manager feature is a native Kubernetes certificate management controller. It helps in issuing certificates from various certificate authorities, such as Let’s Encrypt, Entrust, DigiCert, HashiCorp Vault, and Venafi. It can also issue certificates from a local CA (self-signed).

cert-manager adds Certificate and Issuer resources in Kubernetes clusters, which simplifies the process of obtaining, generating, and renewing the certificates for the cluster. For more information, see cert-manager.

The Robin certificate management feature manages certificates only for Robin internal services deployed in the robinio namespace. It also ensures that all certificates are valid and up-to-date. It automatically renews certificates before they expire.

The Robin certificate management feature has the following certificate issuers:

cluster-issuer - it is responsible for all certificates used internally by the various control plane services.
ident-issuer - it is responsible for the Cluster Identity certificate used by all outward-facing services such as Kubernetes API Server, Robin client, and GUI.

Points to consider for Robin Certificate Management

When you install or upgrade to Robin CNP v5.6.0, cert-manager is deployed by default, and a new service named robin-cert-monitor is deployed to monitor the state of all certificates required by various Pods and containers in the Robin CNP cluster, ensuring that all required certificates exist and are valid.
During installation or upgrade to Robin CNP v5.6.0, only the cert-manager option is supported. If you want to manage certificates of your cluster using the local control mode, you can use the robin cert reset-cluster-certs to enable local control mode.
You can have only one cert-manager instance in a cluster.
If your cluster is already installed with a Cluster Identity certificate signed by an external CA, you must reconfigure it using the robin cert reset-cluster-identity command after updating to Robin CNP v5.6.0.
If you want to utilize a Cluster Identity certificate signed by an external CA after installing Robin CNP v5.6.0, you can use the robin cert reset-cluster-identity command to configure it.
If you want to install Robin CNP v5.6.0 with both (a Cluster Identity certificate signed by an external CA and cert-manager), you must pass the following options in the config.json file for one of the master nodes. For more information, see Installation with Custom Cluster Identity certificate.
- ident-ca-path
- ident-cert-path
- ident-key-path
You cannot install your own cert-manager on a Robin CNP cluster. If you want to utilize the functionality of cert-manager, you must use the cert-manager deployed as part of the Robin certificate management feature to create Issuers and Certificates in other namespaces.

For more information, see Robin Certificate Management.

25.1.4.2. Recreate a Faulted Volume for Helm Apps¶

Robin CNP v5.6.0 enables you to recreate a volume that is in the Faulted status using the same configuration as that of the faulted one. The feature is only supported for volumes used by Helm applications. To support this feature, the following new command is made available:

# robin volume recreate --name <faulted volume name> or --pvc-name <PVC name of a faulted volume> --force

Note

You must use the --force command option along with the command.

When you recreate a new volume in place of a faulted volume, you lose the complete data permanently. For more information, see Recreate a Faulted Volume for Helm Apps.

25.1.4.3. Memory Manager Integration¶

Robin CNP integrates the Kubernetes Memory Manager plugin starting with Robin CNP v5.6.0.

The Memory Manager plugin allocates guaranteed memory and hugepages for guaranteed QoS Pods at the NUMA level.

The Memory Manager plugin works along with the CPU Manager and Topology Manager. It provides hints to the Topology manager and enables resource allocations. The Memory Manager plugin ensures that the memory requested by a Pod is allocated from a minimum number of Non-Uniform Memory Access (NUMA) nodes.

Note

Robin CNP supports only the Static policy for Memory Manager and supports only Pod as the scope for Topology Manager(topology-manager-scope=Pod).

You can enable this plugin using the "memory-manager-policy":"Static" parameter as part of config.json file during Robin CNP installation or when upgrading to Robin CNP v5.6.0 from a supported version. For more information, see Memory Manager.

25.1.4.4. Integrating Helm Support¶

Starting with Robin CNP v5.6.0, Robin CNP introduces native support for Helm chart management. The feature allows you to easily deploy, manage, and upgrade applications packaged as Helm charts within the CNP environment. A new CLI (robin helm) is available to support this feature. For more information, see Helm Operations.

25.1.4.5. Istio Integration¶

Robin CNP supports integration of Istio 1.23. You can install Istio after installing or upgrading to Robin CNP v5.6.0.

Istio is a service mesh that helps in managing the communications between microservices in distributed applications. For more information, see Istio.

After installing the Istio control plane, you must install Ingress and Egress gateways to manage the incoming and outgoing traffic. For more information, see Integrate Istio with Robin CNP.

25.1.4.6. Dual Stack (IPv4 & IPv6) Support¶

Starting with Robin CNP v5.6.0, Robin CNP supports dual-stack networking on the Calico interface for a cluster, allowing it to accept traffic from both Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6) devices. For more information, see Pv4/IPv6 dual-stack.

Dual-stack Pod networking assigns both IPv4 and IPv6 Calico addresses to Pods. A service can utilize an IPv4 address, an IPv6 address, or both. For more information, see Services. Pod Egress routing works through both IPv4 and IPv6 interfaces.

You can enable the dual-stack networking feature during the Robin CNP installation only, not during the upgrade of an existing Robin CNP cluster. To enable this feature, you must specify the following option in the Config JSON file for one of the master nodes:

"ip-protocol":"dualstack"

Note

Hosts must have dual-stack (IPv4 and IPv6) network interfaces.

For more information, see Dual-stack (IPv4 and IPv6) installation.

25.1.4.7. Auto Release Static IP of Terminating Pod¶

Starting with Robin CNP v5.6.0, Robin CNP supports automatically releasing the static IP address of a Pod that is stuck in the terminating state on a node with the NotReady status. If a Pod with a static IP address is stuck in the terminating state, Kubernetes cannot assign this static IP address to a new Pod because the IP address remains in use by the terminating Pod. The IP address must be released before it can be reassigned to any Pod.

To address this, Robin CNP deploys a system service named robin-kubelet-watcher. This service monitors the health and connectivity of API server with kubelet, CRI, and Docker services on the Notready nodes every 10 seconds. If any of these services are unhealthy for 60 seconds, the robin-kubelet-watcher will terminate all Pods running on that node, releasing their IP addresses.

For more information, see Auto Release Static IP address of Terminating Pod.

25.1.4.8. Secure communication between Kubelet and Kube apiserver¶

Starting with Robion CNP v5.6.0, Robin CNP supports secure communication between kubelet and kube-apiserver. In a Kubernetes cluster, the kubelet and kube-apiserver communicate with each other securely using TLS certificates. This communication is secured through mutual TLS, meaning both the kubelet and kube-apiserver present their certificates to verify each other’s identity. This ensures that only authorized kubelets connect to the kube-apiserver and communication between them is secure.

By default, kubelet’s server certificate is self-signed meaning it is signed by a temporary Certificate Authority (CA) that is created on the fly and then discarded. To enable secure communication between the kubelet and kube-apiserver, you must configure the kubelet to obtain its server certificate by issuing a Certificate Signing Request (CSR), rather than using a server certificate signed by a self-signed CA. After configuring the kubelet, you must also configure the kube-apiserver to process and approve the CSR.

For more information, see Secure communication between kubelet and kube-apiserver.

25.1.4.9. Large cluster support¶

Starting with Robin CNP v5.6.0, support for large clusters is available. You can now have a Robin CNP cluster with up to 110 nodes.

25.1.5. Improvements¶

25.1.5.1. Persistent Prometheus Configuration¶

Robin CNP v5.6.0 provides an improvement to keep the Prometheus configuration persistent when you stop and start metrics.

With this improvement, when you update any of the following Prometheus-related configuration parameters, they will be persistent across metrics feature stop and start sessions.

node_exporter_ds_cpu_limit
node_exporter_ds_memory_limit
prom_evaluation_interval
prom_scrape_interval
prom_scrape_timeout

25.1.5.2. New Volume Metrics¶

Starting with Robin CNP v5.6.0, the robin_vol_psize metric is introduced.

robin_vol_psize

It represents the physical (or raw) storage space (in bytes) used by a single replica of the volume. This metric provides further insight into storage consumption.

Example:

# curl -k https://localhost:29446/metrics
robin_vol_rawused{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 134217728
robin_vol_size{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 1073741824
robin_vol_psize{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 67108864

In the above example, the value 67108864 for robin_vol_psize represents the physical (or raw) storage space (in bytes) used by a single replica of the volume.

25.1.5.3. Helm Version Upgrade¶

Starting with Robin CNP v5.6.0, the Helm version is upgraded from v3.6.3 to v3.16.1.

25.1.5.4. New Node Level Events¶

Robin CNP v5.6.0 provides the following new events to enhance the system’s ability to monitor and detect node readiness issues at both the Kubernetes and service/component levels:

EVENT_NODE_K8S_NOTREADY - This event is generated when a node is marked as down due to an issue with a Kubernetes component. It is a warning alert.
EVENT_NODE_K8S_READY - This event is generated when a node is up after being marked as down. It is an info alert.
EVENT_NODE_NOTREADY - This event is generated when a node is marked as not ready due to an unhealthy service or component. It is a warning alert.
EVENT_NODE_READY - This event is generated when a node is ready after being marked as not ready. It is an info alert.

25.1.5.5. Updated the Default Reclaim Policy for `robin-patroni` PVs¶

Starting with Robin CNP v5.6.0, the reclaim policy for robin-patroni PVs is now set to Retain by default.

25.1.5.6. HTTPS support for license proxy server¶

Starting from Robin CNP v5.6.0, Robin CNP supports Hypertext Transfer Protocol Secure (HTTPS) for the license proxy server to activate and renew Robin CNP cluster’s licenses.

25.1.5.7. VDI access support for Windows VMs¶

Starting with Robin CNP v5.6.0, you can access Windows-based VMs using the RDP console from the Robin UI.

25.1.5.8. KVM console access for tenant users¶

Starting with Robin CNP v5.6.0, tenant admins and tenant users can access the KVM application console from the Robin UI.

25.1.5.9. Events for certificates add and remove¶

Robin CNP generates an event when you add or remove a certificate. The following new Info events are added as part of this release:

EVENT_CERT_ADDED - This is generated when a certificate is added.
EVENT_CERT_REMOVED - This is generated when a certificate is removed.

25.1.5.10. Archive failed job logs¶

Starting with Robin CNP v5.6.0, Robin CNP automatically archives failed job logs. A new config parameter failed_job_archive_age is added to archive failed job logs. The default value of this parameter is 3 days, which means failed job logs older than 3 days will be automatically archived.

25.1.5.11. Relaxation in NIC bonding policy¶

Starting with Robin CNP v5.6.0, Robin CNP considers the NIC bonding interface operational and up when at least one interface from the two interfaces used to create the bond interface is up.

25.1.5.12. Resume upgrade after a failure¶

The Robin CNP upgrade process is idempotent starting with Robin CNP v5.6.0 and allows you to resume it after a failure.

25.1.5.13. Support to provide static IP when creating an app from backup¶

When you are creating an app from a backup, you can provide static IPs from an IP pool starting from Robin CNP v5.6.0.

The following new option is added to the existing robin app create from-backup command:

--static-ips

Note

You must use the --ip-pools option along with the --static-ips option.

The following is the format for this new option:

<ippool1>@<ip1/ip2>

Note

You can only provide multiple IPs from the same IP pool by separating the list of IPs using the “/” symbol.

Example

--static-ips ovs-2@192.0.2.14/192.0.2.15/192.0.2.16

25.1.5.14. MetalLB new install options¶

Starting with Robin CNP v5.6.0, the following new install options are added for MetalLB:

metallb-skip-nodes – Skip nodes from deploying MetalLB speaker Pods.
metallb-skip-controlplane – Skip master nodes from deploying MetalLB controller Pods
metallb-k8sfrr-mode - Deploy MetalLB using the K8s-FRR mode instead of the default FRR mode.

25.1.5.15. Patroni and Robin Manager Services metrics¶

Robin CNP v5.6.0 provides support for Patroni metrics and Robin manager service metrics. For more information, see Patroni and service metrics.

25.1.6. Fixed Issues¶

Reference ID	Description
RSD-8287	Under specific conditions, volumes are unable to recover from a fault, leading them to enter a `DEGRADED` state. This issue is fixed.
RSD-3885	The `robin host remove-vlans` command returns an error when attempting to remove VLANs by specifying “ALL” with the `--vlans` option. This issue is fixed.
RSD-4634	When Robin CNP is running on SuperMicro nodes, the IPMI tool is incorrectly displaying the BMC IPV6 address as follows: `ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff` instead of the actual BMC IPv6 address. This issue is fixed.
RSD-4584	If you have added a range of blacklisted IPs in an unexpanded form, Robin CNP does not allow you to remove a range of blacklisted IPs from the IP Pool. This issue is fixed.
RSD-5771	IPv6 IP pool creation failing with gateway the same as the broadcast address for the IP pool subnet. This issue is fixed.
RSD-8104	The issue of the `VolumeCreate` job taking longer than expected is fixed.
RSD-7814	The issue of the application creation operation failing with the following error is now fixed. Failed to mount volume <volume-name>: Node <node-name> has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.`
RSD-7499	There is an issue with storage creation request calculation between Robin CNP and Kubernetes. Due to this mismatched calculation, some of the application Pod are failing to deploy as desired. This issue is fixed.
RSD-9323	When you try to restore an application from a backup that previously had a static IP address, the restore process fails to honor the `--ip-pool` value provided during deployment. Instead, the restore process attempts to allocate a non-static IP from a historic IP pool, resulting in the following type of error: Non static IP allocations cannot be done from non-range(network) IP pools -> ‘nc-bss-ov-internal-mgmt-int-v6’. This issue is fixed.
PP-34457	When the Metrics feature is enabled, the Grafana metrics application is not displaying. This issue is fixed.
PP-38087	In certain cases, the snapshot size allocated to a volume could be less than what is requested. This occurs when the volume is allocated from multiple disks. This issue is fixed.
PP-38397	Robin CNP upgrade failing due to a Docker installation failure. The failure is caused by missing `fuse-overlayfs` and `slirp4netns` dependencies required by the updated Docker version. This issue is fixed.
PP-38071	The issue of application creation might fail with the following error is fixed: Failed to mount volume : Node has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.

25.1.7. Known Issues¶

Reference ID	Description
PP-35015	Symptom After renewing the expired Robin license successfully, Robin CNP incorrectly displays the `License Violation` error when you try to add a new user to the cluster. If you notice this issue, apply the following workaround. Workaround You need to restart the robin-server-bg service. # rbash master # supervisorctl restart robin-server-bg
PP-21916	Symptom A Pod IP is not pingable from any other node in the cluster, apart from the node where it is running. Workaround Bounce the Calico Pod running on the node where the issue is seen.
PP-30247	Symptom After upgrading from Robin CNP v5.4.3HF5 to Robin CNP v5.6.0, the RWX apps might report the following error event type: wrong fs type, bad option, bad superblock on /dev/sdj, missing codepage or helper program, or other error Workaround To resolve this issue, contact the Robin Customer Support team.
PP-30398	Symptom After removing an offline master node from the cluster and power cycling it, the removed master node is automatically added back as a worker node. Workaround Run the following command to remove the host: # robin host remove Run the following command to remove the node: # kubectl delete node Run `k8s-script cleanup` and `host-script cleanup` on the to-be-removed node
PP-34226	Symptom When a PersistentVolumeClaim (PVC) is created, the CSI provisioner initiates a `VolumeCreate` job. If this job fails, the CSI provisioner calls a new `VolumeCreate` job again for the same PVC. However, if the PVC is deleted during this process, the CSI provisioner will continue to call the `VolumeCreate` job because it does not verify the existence of the PVC before calling the `VolumeCreate` job. Workaround Bounce the CSI provisioner Pod. # kubectl delete pod -n robinio
PP-34414	Symptom In rare scenarios, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issue: Some app Pods get stuck in the `ContainerCreating` state after restarting. Steps to identify the issue: Check the following type of faulted error in the `EVENT_DISK_FAULTED` event type in the `robin event list` command: disk /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 on node default:poch06 is faulted # robin event list --type EVENT_DISK_FAULTED If you see the disk is faulted error, check the IOMGR logs for dev_open() and Failed to exclusively open error messages on the node where disks are present. # cat iomgr.log.0 \| grep scsi-SATA_Micron_M500_MTFD_1401096049D5 \| grep "dev_open" If you see the Device or resource busy error message in the log file, use `fuser` command to confirm whether the device is in use: # fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 Workaround If the device is not in use, restart the IOMGR service on the respective node: # supervisorctl restart iomgr
PP-34451	Symptom In rare scenarios, the RWX Pod might be stuck in the `ContainerCannotRun` state and display the following error in the Pod’s event: mount.nfs: mount system call failed Perform the following steps to confirm the issue: Run the `robin volume info` command and check for the following details: Check the status of the volume. It should be in the `ONLINE` status. Check whether the respective volume mount path exists. Check the physical and logical sizes of the volume. If the physical size of the volume is greater than the logical size, then the volume is full. Run the following command to check whether any of the disks for the volume are running out of space: # robin disk info Run the `lsblk` and `blkid` commands to check whether the device mount path works fine on the nodes where the volume is mounted. Run the `ls` command to check if accessing the respective filesystem mount path gives any input and output errors. If you notice any input and output errors in step 4, apply the following workaround: Workaround Find all the Pods that are using the respective PVC: # kubectl get pods --all-namespaces -o=jsonpath='{range .items[]} {.metadata.namespace} /{.metadata.name}{"\t"}{.spec.volumes[]. persistentVolumeClaim.claimName}{"\n"}{end}' \| grep <pvc_nmae> Bounce all the Pods identified in step 1: # kubectl delete pod -n
PP-34492	Symptom When you run the `robin host list` command and if you notice a host is in the `NotReady` and `PROBE_PENDING` states, follow these workaround steps to diagnose and recover the host: Workaround Run the following command to check which host is in the `NotReady` and `PROBE_PENDING` states: # robin host list Run the following command to check the current (`Curr`) and desired (`Desired`) states of the host in the Agent Process (AP) report: # robin ap report \| grep <hostname> Run the following command to probe the host and recover it: # robin host probe <hostname> --wait This command forces a probe of the host and updates its state in the cluster. Run the following command to verify the host’s state: # robin host list The host should now transition to the `Ready` state. In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes: Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed Steps to identify the issue: Check the following error log in the `/var/log/robin-install.log` file to know why the kubeadm upgrade failed. static Pod hash for component kube-scheduler on Node sm-compute04 did not change after 5m0s: timed out waiting for the condition Note The above error logs may appear for any of K8s control plane components (API server, etcd, scheduler, controller manager). If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the `Exited` state. # docker ps -a \| grep schedule Workaround If you notice the above error, restart the kubelet and rerun the upgrade: # systemctl restart kubelet
PP-35478	Symptom In rare scenarios, the kube-scheduler may not function as expected when many Pods are deployed in a cluster due to issues with the kube-scheduler lease. Workaround Complete the following workaround steps to resolve issues with the kube-scheduler lease: Run the following command to identify the node where the kube-scheduler Pod is running with the lease: # kubectl get lease -n kube-system Log in to the node identified in the previous step. Check if the kube-scheduler Pod is running using the following command: # docker ps \| grep kube-scheduler As the kube-scheduler is a static Pod, move its configuration file to temporarily stop the Pod: # mv /etc/kubernetes/manifests/kube-scheduler.yaml /root Run the following command to confirm that the kube-scheduler Pod is deleted. This may take a few minutes. # docker ps \| grep kube-scheduler Verify that the kube-scheduler lease is transferred to a different Pod: # kubectl get lease -n kube-system Copy the static Pod configuration file back to its original location to redeploy the kube-scheduler Pod: # mv /root/kube-scheduler.yaml /etc/kubernetes/manifests/ Confirm that the kube-scheduler container is running: # docker ps \| grep kube-scheduler
PP-36865	Symptom After rebooting a node, the node might not come back online after a long time, and the host BMC console displays the following message for RWX PVCs mounted on that node: `Remounting nfs rwx pic timed out, issugin SIGKILL` Workaround Power cycle the host system.
PP-37330	Symptom During or after upgrading to Robin CNP v5.6.0, the `NFSAgentAddExport` job might fail with an error message similar to the following: /bin/mount /dev/sdn /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41 -o discard failed with return code 32: mount: /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error. Workaround If you notice this issue, contact the Robin Customer Support team for assistance.
PP-37416	Symptom In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes: Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed Steps to identify the issue: Check the `/var/log/robin-install.log` file to know why the kubeadm upgrade failed. Example [upgrade/staticpods] Moved new manifest to “/etc/kubernetes/manifests/kube-scheduler.yaml” and backed up old manifest to “/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-31-01-03-52/kube-scheduler.yaml” [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) static Pod hash for component kube-scheduler on Node sm-compute04 did not change after 5m0s: timed out waiting for the condition Note You can get the above error log for any static manifests of api-server, etcd, scheduler, and controller-manager. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the `Exited` state. # docker ps -a \| grep schedule Workaround If you notice the above error, restart the kubelet: # systemctl restart kubelet
PP-37965	Symptom In Robin CNP v5.6.0, when you scale up a Robin Bundle app, it is not considering the existing CPU cores and memory already in use by a vnode. As a result, Robin CNP is not able to find a suitable host, even though there are additional resources available. Workaround If you notice this issue, apply the following workaround: Scale up the resources using the following command. # robin app computeqos <appname> --role <rolename> --cpus <newcnt> --memory <newmem> -- wait If the scale-up operation fails, stop the app using the following command. # robin app stop <appname> --wait Try to scale up the resources again. # robin app computeqos <appname> --role <rolename> --cpus <newcnt> -- memory <newmem> --wait
PP-38039	Symptom During node reboot or power reset scenarios, application volumes may force shutdown due to I/O errors. As a result, application Pods might get stuck in the `ContainerCreating` state with the following mount failure error: Context Deadline Exceeded. On the affected node where the volume is mounted or the application Pod is scheduled, the following error might be observed in the `dmesg` output: Log I/O Error Detected. Shutting down filesystem Workaround If you notice this issue, contact the Robin Customer Support team for assistance
PP-38044	Symptom When attempting to detach a repository from a hydrated Helm application, the operation might fail with the following error: Can’t detach repo as the application is in IMPORTED state, hydrate it in order to detach the repo from it. This issue occurs even if the application has already been hydrated. The system incorrectly marks the application in the `IMPORTED` state, preventing the repository from being detached. Workaround To detach the repository, manually rehydrate the application and then retry the detach operation: Run the following command to rehydrate the application. # robin app hydrate --wait Once the hydration is complete, detach the repository. # robin app detach-repo - -wait –y
PP-38078	Symptom After a network partition, the robin-agent and iomgr-server may not restart automatically, and stale devices may not be cleaned up.This issue occurs because the consulwatch thread responsible for monitoring Consul and triggering restarts may fail to detect the network partition. As a result, stale devices may not be cleaned up, potentially leading to resource contention and other issues. Workaround Manually restart the robin-agent and iomgr-server using `supervisorctl`: # supervisorctl restart robin-agent iomgr-server
PP-38471	Symptom When StatefulSet Pods restart, the Pods might get stuck in the `ContainerCreating` state with the error: CSINode <node_name> does not contain driver robin due to stale NFS mount points and failure of the `csi-nodeplugin-robin` Pod due to `CrashLoopBackOff` state. Workaround If you notice this issue, restart the `csi-nodeplugin` Pod. # kubectl delete pod <csi-nodeplugin> -n robinio
PP-39098	When you create a Robin bundle app with an affinity rule, the bundle app Pod might get stuck in the ContainerCreating and Terminating states in a continuous loop after a node reboot. If you notice this issue, apply the following workaround. You need to restart the robin-server-bg service. # rbash master # supervisorctl restart robin-server-bg
PP-38924	After you delete multiple Helm applications, one of the Pods might get stuck in the “`Error`” state, and one or more ReadWriteMany (RWX) volumes might get stuck in the “`Terminating`” state. Workaround On the node where the Pod stuck in the Error state, restart Docker and Kubelet.
PP-38524	When you upgrade your cluster from any supported Robin CNP version to Robin CNP v5.6.0, the upgrade process might get stuck while upgrading Kubernetes and display this error: `ERROR: Failed to execute K8S upgrade actions`, and Calico Pods might be stuck in the `Terminating` or `ContainerCreating` state. Workaround Restart the Calico Pods by performing a rolling restart of the calico-node DaemonSet: # kubectl rollout restart ds -n kube-system calico-node
PP-39200	After upgrading a non-HA (single-node) Robin cluster from a supported version to Robin CNP v5.6.0, application deployments and scaling operations might fail with the following error: Failed to download file_object, not accessible at this point.
PP-38411	Symptom After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the `robin ip-pool delete` command may fail with the following error message: ERROR - ippoolcr-validating-webhook not found. Please wait for Robin Server Start up to complete. This issue occurs because the necessary validating webhooks for Robin’s IP Pool Custom Resource Definition (CRD) are not properly created during the upgrade process. Workaround To resolve this issue, enable the `robin_k8s_extension` configuration variable after the upgrade. This will trigger the creation of the missing validating webhooks. Verify the existence of Robin’s validating webhooks: # kubectl get validatingwebhookconfigurations -A NAME WEBHOOKS AGE cert-manager-webhook 1 11h If the output does not list any webhooks related to robin, proceed to the next step. Enable the `robin_k8s_extension` variable: # robin config update manager robin_k8s_extension True This will add the `register_webhook schedule` task, which creates the missing webhooks. Verify that the register_webhook task has been scheduled: # robin schedule list \| grep -i webhook
PP-39087	Symptom In a scenario where there are multiple placement constraints with Pod-level anti-affinity for each role and role affinity (co-locate the roles) with explicit tags limiting the placement of Pods and Roles, the application deployment fails. Workaround Use tags, maintenance mode, taints, and tolerances to manage placement of Pods.
PP-39188	Symptom After a Pod using an RWX volume is bounced (deleted and recreated), the new Pod may become stuck in the `ContainerCreating` state. The PersistentVolumeClaim (PVC) describe command output shows that `VolumeFailoverAddNFSExport` and `VolumeAddNFSExport` jobs are stuck in the `WAITING` state. Workaround Identify the Pod in the ContainerCreating state. # kubectl get pod -n <namesapce> Identify the stuck job ID. # kubectl describe pod -n <namespace> <pod name> From the output, identify the VolumeFailoverAddNFSExport job ID that is holding the lock. Identify the AGENT_WAIT sub-job. # robin job info <VolumeFailoverAddNFSExport_job_ID> From the output, identify the sub-job in # AGENT_WAIT state. Cancel the stuck sub-job. # robin job cancel <AGENT_WAIT_job_ID> After canceling the job, the pod should eventually transition to the Running state.
PP-37652	Symptom When you deploy a multi-container application using Helm with static IPs assigned from an IP pool, only a subset of the Pods appear on the Robin CNP UI. Workaround Run the following CLI command to view all the Pods: # robin app info <appname> --status
PP-39260	Symptom Backup operations for applications with sidecar containers are not supported. Contact the Robin Customer Support team for further queries.
PP-39263	Symptom When you try to create a volume using the `robin volume create` command with the `GiB` unit, the volume creation fails with this error message: `ERROR - Invalid unit GI`. Workaround Use the unit `G` or `GI` when creating a volume.
PP-39264	Symptom In the Robin UI, when you have an empty Helm chart, the Helm Charts UI page displays the following error. Failed to fetch the helm charts Workaround You can ignore the error message.
PP-39265	Symptom When you try to share a Helm app using the Robin UI, the Share button in the UI does not respond. Workaround Use the following CLI command to share the Helm app. # robin app share <name> <user name> --all-tenant-users

25.1.8. Technical Support¶

Contact Robin Technical support for any assistance.

25. Release Notes¶

25.1. Robin Cloud Native Platform v5.6.0¶

25.1.1. Infrastructure Versions¶

25.1.2. Supported Operating Systems¶

25.1.3. Upgrade Paths¶

25.1.3.1. Pre-upgrade considerations¶

25.1.3.2. Post-upgrade considerations¶

25.1.3.3. Pre-upgrade steps¶

25.1.3.4. Post-upgrade steps¶

25.1.4. New Features¶

25.1.4.1. Robin Certificate Management¶

25.1.4.2. Recreate a Faulted Volume for Helm Apps¶

25.1.4.3. Memory Manager Integration¶

25.1.4.4. Integrating Helm Support¶

25.1.4.5. Istio Integration¶

25.1.4.6. Dual Stack (IPv4 & IPv6) Support¶

25.1.4.7. Auto Release Static IP of Terminating Pod¶

25.1.4.8. Secure communication between Kubelet and Kube apiserver¶

25.1.4.9. Large cluster support¶

25.1.5. Improvements¶

25.1.5.1. Persistent Prometheus Configuration¶

25.1.5.2. New Volume Metrics¶

25.1.5.3. Helm Version Upgrade¶

25.1.5.4. New Node Level Events¶

25.1.5.5. Updated the Default Reclaim Policy for robin-patroni PVs¶

25.1.5.6. HTTPS support for license proxy server¶

25.1.5.7. VDI access support for Windows VMs¶

25.1.5.8. KVM console access for tenant users¶

25.1.5.9. Events for certificates add and remove¶

25.1.5.10. Archive failed job logs¶

25.1.5.11. Relaxation in NIC bonding policy¶

25.1.5.12. Resume upgrade after a failure¶

25.1.5.13. Support to provide static IP when creating an app from backup¶

25.1.5.14. MetalLB new install options¶

25.1.5.15. Patroni and Robin Manager Services metrics¶

25.1.6. Fixed Issues¶

25.1.7. Known Issues¶

25.1.8. Technical Support¶

25.1.5.5. Updated the Default Reclaim Policy for `robin-patroni` PVs¶