26. Release Notes¶
26.1. Robin Cloud Native Platform v5.7.2¶
The Robin Cloud Native Platform (CNP) v5.7.2 release notes document has pre- and post-upgrade considerations, improvements, fixed issues, and known issues.
Release Date: April 22, 2026
26.1.1. Infrastructure Versions¶
The following software applications are included in this CNP release:
Software Application |
Version |
|---|---|
Kubernetes |
1.33.5 |
Docker |
25.0.2 (RHEL 8.10 or Rocky Linux 8.10) |
Podman |
5.4.0 (RHEL 9.6) |
Prometheus |
2.39.1 |
Prometheus Adapter |
0.10.0 |
Node Exporter |
1.4.0 |
Calico |
3.28.2 |
HAProxy |
2.4.7 |
PostgreSQL |
14.12 |
Grafana |
9.2.3 |
CRI Tools |
1.33.0 |
cert-manager |
1.19.1 |
26.1.2. Supported Operating Systems¶
The following are the supported operating systems and kernel versions for Robin CNP v5.7.2:
OS Version |
Kernel Version |
|---|---|
Red Hat Enterprise Linux 8.10 |
4.18.0-553.el8_10.x86_64 |
Rocky Linux 8.10 |
4.18.0-553.el8_10.x86_64 |
Red Hat Enterprise Linux 9.6 |
5.14.0-570.24.1.el9_6.x86_64+rt |
Note
Robin CNP supports both RT and non-RT kernels on the above supported operating systems.
26.1.3. Upgrade Paths¶
The following are the supported upgrade paths for Robin CNP v5.7.2:
Robin CNP v5.4.3 HF5+PP to Robin CNP v5.7.2-330
Robin CNP v5.4.3 HF6 to Robin CNP v5.7.2-330
Robin CNP v5.4.3 HF7 to Robin CNP v5.7.2-330
Robin CNP v5.5.1-1950 to Robin CNP v5.7.2-330 (For CNO v4.1.0 support only)
26.1.3.1. Pre-upgrade considerations¶
For a successful upgrade, you must run the
possible_job_stuck.pyscript before and after the upgrade. Contact the Robin Support team for the upgrade procedure using the script.When upgrading from supported Robin CNP versions to Robin CNP v5.7.2, if your cluster already has cert-manager installed, you must uninstall it before upgrading to Robin CNP v5.7.2.
Before upgrading to Robin CNP v5.7.2, if the
robin-certs-checkjob or CronJob is running, you must stop it. To stop therobin-certs-checkjob, run thekubectl delete job robin-certs-check -n robiniocommand, and to stop therobin-certs-checkCronJob, run therobin cert check --stop-cronjobcommand.
26.1.3.2. Post-upgrade considerations¶
After upgrading to Robin CNP v5.7.2, verify that the value of the
k8s_resource_syncconfig parameter is set to60000using therobin schedule list | grep -i K8sResSynccommand. If it is not set, you must run therobin schedule update K8sResSync k8s_resource_sync 60000command to update the value of therobin schedule K8sResSyncconfig parameter.After upgrading to Robin CNP v5.7.2, you must run the
robin-server validate-role-bindingscommand. To run this command, you need to log in to therobin-masterPod. This command verifies the roles assigned to each user in the cluster and corrects them if necessary.After upgrading to Robin CNP v5.7.2, the
k8s_auto_registrationconfig parameter is disabled by default. The config setting is deactivated to prevent all Kubernetes apps from automatically registering and consuming resources. The following are the points you must be aware of with this change:You can register the Kubernetes apps using the
robin app registercommand manually and use Robin CNP for snapshots, clones, and backup operations of the Kubernetes app.As this config parameter is disabled, when you run the
robin app nfs-listcommand, the mappings between Kubernetes apps and NFS server Pods are not listed in the command output.If you need mapping between a Kubernetes app and an NFS server Pod when the
k8s_auto_registration configparameter is disabled or the k8s app is not manually registered, get the PVC name from the Pod YAML file(kubectl get pod -n <name> -o YAML)and run therobin nfs export list | grep <pvc name>command.The
robin nfs export listcommand output displays the PVC name and namespace.
After upgrading to Robin CNP v5.7.2, you must start the
robin-certs-checkCronJob using therobin cert check -stat-cronjobcommand.
26.1.3.3. Pre-upgrade steps¶
Upgrading from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2-330
Before upgrading from Robin CNP v5.4.3 to Robin CNP v5.7.2, perform the following steps:
Update the value of the
suicide_thresholdconfig parameter to1800:# robin config update agent suicide_threshold 1800
Set the toleration seconds for all NFS server Pods to
86400seconds. After the upgrade, you must change the toleration seconds according to the post-upgrade steps.for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds to 86400"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; doneCheck the toleration seconds for Robin master Pods:
# kubectl get pod -n robinio robin-master-xxxx-xxx -o yaml | grep -i tolera
Edit the Pod YAML to update the toleration seconds for nodes that are in the NotReady state from
60seconds to600seconds:# kubectl edit pod -n robinio robin-master-xxx-xxx
Verify the webhooks are enabled by running the
robin config list | grep -I robin_k8s_extensioncommand. It should betrue. If it is disabled, then enable it:# robin config update manager robin_k8s_extension True
Copy the new
robin-serverbinary to the/usr/local/robin/patchdirectory.Create a
preentry.shfile in the/usr/local/robindirectory on all master nodes and add the following:#!/bin/bash ver=$(cd /opt/robin/ && readlink current) if [[ $ver == "5.4.3-<current pre upgrade version>" ]]; then if [ -f /usr/local/robin/patch/robin-server ]; then if [[ ! -f /opt/robin/current/bin/robin-server.backup ]]; then /usr/bin/cp /opt/robin/current/bin/robin-server /opt/robin/current/bin/robin-server.backup fi /usr/bin/cp /usr/local/robin/patch/robin-server /opt/robin/current/bin/robin-server fi fiChange permissions for the
preentry.shandrobin-serverbinary:# chmod 755 preentry.sh # chmod 755 robin-server
Bounce the Robin master Pod to apply the
robin-serverbinary:# kubectl delete pod -n robinio -l app=robin-master
Verify the md5sum of the new Robin master Pod:
# rbash master # md5sum /opt/robin/current/bin/robin-server
The output of the above command should match with the following:
Robin CNP v5.4.3 HF5+PP - b1b7fb80c5e14b17c4d26a345efe8a3f
Robin CNP v5.4.3 HF6 - 54604d7f3f06584711d4fa864ee8b787
Robin CNP v5.4.3 HF7 - 17312566c198b3732857ba50b8d60b8f
26.1.3.4. Post-upgrade steps¶
Upgrading from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2-330
After upgrading from Robin CNP v5.4.3 to Robin CNP v5.7.2, perform the following steps:
Verify the state of the hosts, Pods, services, apps, and nodes.
Update the value of the
suicide_thresholdconfig parameter to40:# robin config update agent suicide_threshold 40
Set the
check_helm_appsconfig parameter toFalse:# robin config update cluster check_helm_apps False
Verify the
robin_k8s_extensionconfig parameter is set toTrue. If not, set it toTrue.# robin config update manager robin_k8s_extension True
Set the toleration seconds for all NFS server Pods to
60seconds when the node is in thenotreadystate and set it to0seconds when the node is in theunreachablestate.# for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null
26.1.4. Improvements¶
26.1.4.1. Memory Manager support during upgrades¶
Robin CNP v5.7.2 now supports and validates enabling the Memory Manager during cluster upgrades from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2.
You can provide a configuration JSON file with the memory-manager-policy set to Static using the --config-json flag, the upgrade process automatically initializes the Memory Manager on the specified nodes.
Post-upgrade validation confirms that the policy is correctly reflected in the robin config list output and the Kubelet state file located at /var/lib/kubelet/memory_manager_state.
Enable the Memory Manager during cluster upgrade
To enable the Memory Manager when upgrading a cluster from supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2, follow these steps:
Create a configuration JSON file (for example, config-upg.json) that specifies the policy for each host:
{ "<host-ip>": { "memory-manager-policy": "Static" } }
Run the upgrade-cluster command and include the
--config-jsonflag:# ./gorobin upgrade-cluster --config-json config-upg.json
Example
# ./gorobin_5.7.2-328 onprem upgrade-cluster --hosts-json host.json --gorobintar gorobintar-5.7.2-328-bin.tar --ignore-warnings --config-json config-upg.json --robin-admin-user admin --robin-ad min-passwd Robin123
After the upgrade completes, verify the configuration:
Run
robin config list | grep policyand confirmmem_manager_policyis set to Static.Verify the Kubelet state by checking the following file:
/var/lib/kubelet/memory_manager_state.
26.1.4.2. Secure LDAP (LDAPS) Support¶
Robin CNP v5.7.2 supports Lightweight Directory Access Protocol Secure (LDAPS) connections on port 636. LDAPS uses TLS or SSL to encrypt communication between clients and servers. This feature protects sensitive user credentials and directory information by encrypting communication between Robin CNP and your LDAP server.
To configure LDAPS, specify port 636 when you add your LDAP server and ensure proper certificate validation. For more information, see Add an LDAP server.
26.1.5. Fixed Issues¶
Reference ID |
Description |
|---|---|
RSD-11580 |
Robin KVM manifest prevents KVMs from booting in UEFI (Unified Extensible Firmware Interface) mode with multiple qcow2 disks. This issue is fixed. With Robin CNP v5.7.2, support for the The following qcow2 disks are supported in KVM manifest file:
|
RSD-11238, RSD-11474 |
For a multi-attach volume, when one of the nodes became faulty, in addition to the faulty node, the volume was incorrectly unmounted on other nodes. This issue is fixed. |
RSD-11346 |
When you deploy KVM applications with Burstable QoS, the application deployment fails with the following error due to an internal incorrect validation: CPU min and max values for vnode test1_server01 do not match This issue is fixed. |
RSD-11007 |
In rare scenarios, after you perform an application upgrade on Robin CNP, the application Pod might enter into a |
RSD-10166 |
Burstable Pods deployment fails with Insufficient CPU or No host found errors if their total resource requests (CPU or huge pages) exceed the capacity of a single NUMA node while using a Robin IP pool. This issue is fixed. Robin CNP now allows burstable pods to span multiple NUMA nodes, even when the |
RSD-11284, RSD-11374 |
When upgrading to Robin CNP v5.7.1, the upgrade fails due to SSH key authentication. This occurs because the cluster is enabled with global cryptography policies which force it to use only strong SSH keys created using elliptic curve cryptography (ECC) algorithms such as Ed25519. This issue is fixed. |
RSD-11584 |
The issue of the |
RSD-11015 |
The issue of pod-related events not being displayed for users when running the |
RSD-11050 |
The issue of Open vSwitch |
RSD-11543 |
The issue of KVM application deployments failing with a |
26.1.6. Known Issues¶
Reference ID |
Description |
|---|---|
PP-42237 |
Symptom When you try to deploy a KVM application with multiple vnodes that consume all available vfio-pci resources on a host, the application fails to restart. If you attempt to restart the application, it might result in displaying this error: Failed to allocate resources for App, leaving the application in a FAULTED or NOTREADY state. Workaround You need to stop and start the application.
|
PP-41172 |
Symptom After upgrading from a supported Robin CNP version to Robin CNP v5.7.2, NFS mounts on client nodes can become unresponsive, leading to critical issues such as the following:
This problem is primarily observed when the NFS client (the node where the PVC is mounted) experiences prolonged unresponsiveness from the NFS server. Workaround If an NFS mount is hung, you can recover the system by forcing new NFS sessions:
|
PP-41195 |
Symptom After you perform a force unmount operation for an RWX (ReadWriteMany) volume on a host where it was previously mounted, or during certain failover scenarios involving RWX volumes, the associated Robin NFS server Pod might transition into an ASSIGNED_ERR state. When the Pod is in this state, the NFS server Pod is unable to export the volume, rendering the volume inaccessible via NFS. Workaround Contact the Robin Customer Support team to resolve this issue. |
PP-41192 |
Symptom When creating a KVM using the Robin bundle, CPU core count must be specified in even numbers. If odd numbers are specified, the KVM will not be deployed, however, the respective Pod might be in the |
PP-41159 |
Symptom After upgrading to Robin CNP v5.7.2, some Pods might get stuck in the Target /var/lib/robin/nfs/robin-nfs-shared-107/ganesha/pvc-b84ed376-2a58-484f-8031-4530c1899b2c is busy, please retry later. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Workaround Apply the following workaround steps:
|
PP-42238 |
Symptom You might observe a mismatch of Helm versions between the host and the robin-client or the robin-master Pod. Workaround Download the correct Helm binary directly from the Robin master pod:
|
PP-35015 |
Symptom After renewing the expired Robin license successfully, Robin CNP incorrectly displays the License Violation error when you try to add a new user to the cluster. If you notice this issue, apply the following workaround. Workaround You need to restart the robin-server-bg service.
|
PP-39901 |
Symptom After rebooting a worker node that is hosting Pods with Robin RWX volumes, one or more application Pods using these volumes might get stuck in the ContainerCreating state indefinitely. Workaround If you notice the above issue, contact the Robin CS team. |
PP-39645 |
Symptom Robin CNP v5.7.2 may rarely fail to honor soft Pod anti-affinity, resulting in uneven Pod distribution on labeled nodes. When you deploy an application with the recommended Workaround Bounce the Pod that has not honored soft affinity. |
PP-34226 |
Symptom When a PersistentVolumeClaim (PVC) is created, the CSI provisioner initiates a Workaround Bounce the CSI provisioner Pod. # kubectl delete pod -n robinio <csi-provisioner-robin>
|
PP-34414 |
Symptom In rare scenarios, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issue: Some app Pods get stuck in the Steps to identify the issue: Check the following type of faulted error in the disk /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 on node default:poch06 is faulted # robin event list --type EVENT_DISK_FAULTED
If you see the disk is faulted error, check the IOMGR logs for dev_open() and Failed to exclusively open error messages on the node where disks are present. # cat iomgr.log.0 | grep scsi-SATA_Micron_M500_MTFD_1401096049D5
| grep "dev_open"
If you see the Device or resource busy error message in the log file, use # fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5
Workaround If the device is not in use, restart the # supervisorctl restart iomgr
|
PP-39632 |
Symptom After upgrading to Robin CNP v5.7.2, NFS client might hang with no pending IO message. For no pending IO, refer to this path: CsiServer_9 - robin.utils - INFO - Executing command /usr/bin/nc -z -w 6 172.19.149.161 2049 with timeout 60 seconds CsiServer_9 - robin.utils - INFO - Command /usr/bin/nc -z -w 6 172.19.149.161 2049 completed with return code 0. CsiServer_9 - robin.utils - INFO - Standard out: Also, you can find the following message in the nfs: server 172.19.131.218 not responding, timed out nfs: server 172.19.131.218 not responding, timed out nfs: server 172.19.131.218 not responding, timed out Workaround
|
PP-34492 |
Symptom When you run the Workaround Run the following command to check which host is in the # robin host list
Run the following command to check the current ( # robin ap report | grep <hostname>
Run the following command to probe the host and recover it: # robin host probe <hostname> --wait
This command forces a probe of the host and updates its state in the cluster. Run the following command to verify the host’s state: # robin host list
The host should now transition to the |
PP-35478 |
Symptom In rare scenarios, the kube-scheduler may not function as expected when many Pods are deployed in a cluster due to issues with the Workaround Complete the following workaround steps to resolve issues with the Run the following command to identify the node where the # kubectl get lease -n kube-system
Log in to the node identified in the previous step. Check if the # docker ps | grep kube-scheduler
As the # mv /etc/kubernetes/manifests/kube-scheduler.yaml /root
Run the following command to confirm that the # docker ps | grep kube-scheduler
Verify that the # kubectl get lease -n kube-system
Copy the static Pod configuration file back to its original location to redeploy the # mv /root/kube-scheduler.yaml /etc/kubernetes/manifests/
Confirm that the # docker ps | grep kube-scheduler
|
PP-36865 |
Symptom After rebooting a node, the node might not come back online after a long time, and the host BMC console displays the following message for RWX PVCs mounted on that node: Remounting nfs rwx pic timed out, issugin SIGKILL Workaround Power cycle the host system. |
PP-37330 |
Symptom During or after upgrading to Robin CNP v5.7.2, the /bin/mount /dev/sdn /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41 -o discard failed with return code 32: mount: /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error. Workaround If you notice this issue, contact the Robin Customer Support team for assistance. |
PP-37416 |
Symptom In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.7.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes: Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed Steps to identify the issue:
Example etcd container: {etcd_container_id} and exited status: {is_exited} Killing progress PID 4168272 Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed Install logs can be found at /var/log/robin-install.log Caught EXIT signal. exit_code: 1 Note You can get the above error logs for any static manifests of api-server, etcd, scheduler, and controller-manager.
# docker ps -a | grep schedule
Workaround If you notice the above error, restart the kubelet: # systemctl restart kubelet
|
PP-38044 |
Symptom When attempting to detach a repository from a hydrated Helm application, the operation might fail with the following error: Can’t detach repo as the application is in This issue occurs even if the application has already been hydrated. The system incorrectly marks the application in the Workaround To detach the repository, manually rehydrate the application and then retry the detach operation:
|
PP-38251 |
Symptom When evacuating a disk from an offline node in the large cluster, the robin drive evacuate command fails with the following error message: “Json deserialize error: invalid value: integer -10, expected u64 at line 1 column 2440. Workaround If you notice the above issue, contact the Robin CS team. |
PP-38471 |
Symptom When StatefulSet Pods restart, the Pods might get stuck in the Workaround If you notice this issue, restart the # kubectl delete pod <csi-nodeplugin> -n robinio
|
PP-38087 |
Symptom In certain cases, the snapshot size allocated to a volume could be less than what is requested. This occurs when the volume is allocated from multiple disks. |
PP-38924 |
Symptom After you delete multiple Helm applications, one of the Pods might get stuck in the Workaround On the node where the Pod stuck in the Error state, restart Docker and Kubelet. |
PP-34451 |
Symptom In rare scenarios, the RWX Pod might be stuck in the mount.nfs: mount system call failed Perform the following steps to confirm the issue:
If you notice any input and output errors in step 4, apply the following workaround: Workaround
|
PP-21916 |
Symptom A pod IP is not pingable from any other node in the cluster, apart from the node where it is running. Workaround Bounce the Calico pod running on the node where the issue is seen. |
PP-40819 |
Symptom From the Robin CNP UI, when you try to deploy an application by cloning from a snapshot, the operation might fail with the following similar error message indicating an invalid negative CPU value: Invalid value: “-200m”: must be greater than or equal to 0. You might observe this issue specifically when the application has sidecar containers configured with CPU requests/limits. This is a CNP UI issue. You can use the CNP CLI to perform the same operation successfully. Workaround Use the following Robin CLI command to clone the snapshot and create an app: # robin app create from-snapshot <new_app_name>
<snapshot_id> --rpool default --wait
|
PP-41022 |
Symptom The Workaround If you notice this issue, apply the workaround. Restart kubelet on the affected node: # systemctl restart kubelet
|
PP-40993 |
Symptom During large cluster upgrades, the upgrade might fail during Robin pre‑upgrade actions if Robin Auto Pilot creates active jobs. This occurs when multiple Robin Auto Pilot watchers are configured for a single pod, resulting in lingering jobs (for example, VnodeDeploy) that block the upgrade process. Workaround Restart the |
PP-39467 |
Symptom When deploying applications with RWX PVCs, application Pods fail to mount volumes and stuck in the Workaround Reboot the host that is in the |
26.1.7. Technical Support¶
Contact Robin Technical support for any assistance.