************* Release Notes ************* =================================== Robin Cloud Native Platform v5.7.2 =================================== The Robin Cloud Native Platform (CNP) v5.7.2 release notes document has pre- and post-upgrade considerations, improvements, fixed issues, and known issues. **Release Date:** April 22, 2026 Infrastructure Versions ======================= The following software applications are included in this CNP release: ==================== ================== Software Application Version ==================== ================== Kubernetes 1.33.5 Docker 25.0.2 (RHEL 8.10 or Rocky Linux 8.10) Podman 5.4.0 (RHEL 9.6) Prometheus 2.39.1 Prometheus Adapter 0.10.0 Node Exporter 1.4.0 Calico 3.28.2 HAProxy 2.4.7 PostgreSQL 14.12 Grafana 9.2.3 CRI Tools 1.33.0 cert-manager 1.19.1 ==================== ================== Supported Operating Systems =========================== The following are the supported operating systems and kernel versions for Robin CNP v5.7.2: ============================= ======================================= OS Version Kernel Version ============================= ======================================= Red Hat Enterprise Linux 8.10 4.18.0-553.el8_10.x86_64 Rocky Linux 8.10 4.18.0-553.el8_10.x86_64 Red Hat Enterprise Linux 9.6 5.14.0-570.24.1.el9_6.x86_64+rt ============================= ======================================= .. Note:: Robin CNP supports both RT and non-RT kernels on the above supported operating systems. Upgrade Paths ============= The following are the supported upgrade paths for Robin CNP v5.7.2: * Robin CNP v5.4.3 HF5+PP to Robin CNP v5.7.2-330 * Robin CNP v5.4.3 HF6 to Robin CNP v5.7.2-330 * Robin CNP v5.4.3 HF7 to Robin CNP v5.7.2-330 Pre-upgrade considerations -------------------------- * For a successful upgrade, you must run the ``possible_job_stuck.py`` script before and after the upgrade. Contact the Robin Support team for the upgrade procedure using the script. * When upgrading from supported Robin CNP versions to Robin CNP v5.7.2, if your cluster already has cert-manager installed, you must uninstall it before upgrading to Robin CNP v5.7.2. * Before upgrading to Robin CNP v5.7.2, if the ``robin-certs-check`` job or CronJob is running, you must stop it. To stop the ``robin-certs-check`` job, run the ``kubectl delete job robin-certs-check -n robinio`` command, and to stop the ``robin-certs-check`` CronJob, run the ``robin cert check --stop-cronjob`` command. Post-upgrade considerations --------------------------- * After upgrading to Robin CNP v5.7.2, verify that the value of the ``k8s_resource_sync`` config parameter is set to ``60000`` using the ``robin schedule list | grep -i K8sResSync`` command. If it is not set, you must run the ``robin schedule update K8sResSync k8s_resource_sync 60000`` command to update the value of the ``robin schedule K8sResSync`` config parameter. * After upgrading to Robin CNP v5.7.2, you must run the ``robin-server validate-role-bindings`` command. To run this command, you need to log in to the ``robin-master`` Pod. This command verifies the roles assigned to each user in the cluster and corrects them if necessary. * After upgrading to Robin CNP v5.7.2, the ``k8s_auto_registration`` config parameter is **disabled** by default. The config setting is deactivated to prevent all Kubernetes apps from automatically registering and consuming resources. The following are the points you must be aware of with this change: - You can register the Kubernetes apps using the ``robin app register`` command manually and use Robin CNP for snapshots, clones, and backup operations of the Kubernetes app. - As this config parameter is disabled, when you run the ``robin app nfs-list`` command, the mappings between Kubernetes apps and NFS server Pods are not listed in the command output. - If you need mapping between a Kubernetes app and an NFS server Pod when the ``k8s_auto_registration config`` parameter is disabled or the k8s app is not manually registered, get the PVC name from the Pod YAML file ``(kubectl get pod -n -o YAML)`` and run the ``robin nfs export list | grep `` command. - The ``robin nfs export list`` command output displays the PVC name and namespace. * After upgrading to Robin CNP v5.7.2, you must start the ``robin-certs-check`` CronJob using the ``robin cert check -stat-cronjob`` command. Pre-upgrade steps ------------------ **Upgrading from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2-330** Before upgrading from Robin CNP v5.4.3 to Robin CNP v5.7.2, perform the following steps: #. Update the value of the ``suicide_threshold`` config parameter to ``1800``: .. code-block:: text # robin config update agent suicide_threshold 1800 #. Set the toleration seconds for all NFS server Pods to ``86400`` seconds. After the upgrade, you must change the toleration seconds according to the post-upgrade steps. .. code-block:: text for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds to 86400"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done #. Check the toleration seconds for Robin master Pods: .. code-block:: text # kubectl get pod -n robinio robin-master-xxxx-xxx -o yaml | grep -i tolera #. Edit the Pod YAML to update the toleration seconds for nodes that are in the NotReady state from ``60`` seconds to ``600`` seconds: .. code-block:: text # kubectl edit pod -n robinio robin-master-xxx-xxx #. Verify the webhooks are enabled by running the ``robin config list | grep -I robin_k8s_extension`` command. It should be ``true``. If it is disabled, then enable it: .. code-block:: text # robin config update manager robin_k8s_extension True #. Copy the new ``robin-server`` binary to the ``/usr/local/robin/patch`` directory. #. Create a ``preentry.sh`` file in the ``/usr/local/robin`` directory on all master nodes and add the following: .. code-block:: text #!/bin/bash ver=$(cd /opt/robin/ && readlink current) if [[ $ver == "5.4.3-" ]]; then if [ -f /usr/local/robin/patch/robin-server ]; then if [[ ! -f /opt/robin/current/bin/robin-server.backup ]]; then /usr/bin/cp /opt/robin/current/bin/robin-server /opt/robin/current/bin/robin-server.backup fi /usr/bin/cp /usr/local/robin/patch/robin-server /opt/robin/current/bin/robin-server fi fi #. Change permissions for the preentry.sh and robin-server binary: .. code-block:: text chmod 755 preentry.sh chmod 755 robin-server #. Bounce the Robin master Pod to apply robin-server binary: .. code-block:: text # kubectl delete pod -n robinio -l app=robin-master #. Verify the md5sum of the new Robin master Pod: .. code-block:: text rbash master md5sum /opt/robin/current/bin/robin-server The output of the above command should match with the following: * Robin CNP v5.4.3 HF5+PP - b1b7fb80c5e14b17c4d26a345efe8a3f * Robin CNP v5.4.3 HF6 - 54604d7f3f06584711d4fa864ee8b787 * Robin CNP v5.4.3 HF7 - 17312566c198b3732857ba50b8d60b8f Post-upgrade steps ------------------- **Upgrading from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2-330** After upgrading from Robin CNP v5.4.3 to Robin CNP v5.7.2, perform the following steps: #. Verify the state of the hosts, Pods, services, apps, and nodes. #. Update the value of the ``suicide_threshold`` config parameter to ``40``: .. code-block:: text # robin config update agent suicide_threshold 40 #. Set the ``check_helm_apps`` config parameter to ``False``: .. code-block:: text # robin config update cluster check_helm_apps False #. Verify the ``robin_k8s_extension`` config parameter is set to ``True``. If not, set it to ``True``. .. code-block:: text # robin config update manager robin_k8s_extension True #. Set the toleration seconds for all NFS server Pods to ``60`` seconds when the node is in the ``notready`` state and set it to ``0`` seconds when the node is in the ``unreachable`` state. .. code-block:: text # for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null Improvements ============= Memory Manager support during upgrades ---------------------------------------- Robin CNP v5.7.2 now supports and validates enabling the Memory Manager during cluster upgrades from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2. You can provide a configuration JSON file with the memory-manager-policy set to ``Static`` using the ``--config-json`` flag, the upgrade process automatically initializes the Memory Manager on the specified nodes. Post-upgrade validation confirms that the policy is correctly reflected in the ``robin config list`` output and the Kubelet state file located at ``/var/lib/kubelet/memory_manager_state``. **Enable the Memory Manager during cluster upgrade** To enable the Memory Manager when upgrading a cluster from supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2, follow these steps: 1. Create a configuration JSON file (for example, config-upg.json) that specifies the policy for each host: .. code-block:: json { "": { "memory-manager-policy": "Static" } } 2. Run the upgrade-cluster command and include the ``--config-json`` flag: .. code-block:: text # ./gorobin upgrade-cluster --config-json config-upg.json [additional-flags] **Example** .. code-block:: text # ./gorobin_5.7.2-328 onprem upgrade-cluster --hosts-json host.json --gorobintar gorobintar-5.7.2-328-bin.tar --ignore-warnings --config-json config-upg.json --robin-admin-user admin --robin-ad min-passwd Robin123 3. After the upgrade completes, verify the configuration: - Run ``robin config list | grep policy`` and confirm ``mem_manager_policy`` is set to Static. - Verify the Kubelet state by checking the following file: ``/var/lib/kubelet/memory_manager_state``. Secure LDAP (LDAPS) Support ----------------------------- Robin CNP v5.7.2 supports Lightweight Directory Access Protocol Secure (LDAPS) connections on port ``636``. LDAPS uses TLS or SSL to encrypt communication between clients and servers. This feature protects sensitive user credentials and directory information by encrypting communication between Robin CNP and your LDAP server. To configure LDAPS, specify port ``636`` when you add your LDAP server and ensure proper certificate validation. For more information, see `Add an LDAP server `__. Fixed Issues ============ ======================= ================================================================================================================================================================================================================================================================================================================================================================================================================================================================================ Reference ID Description ======================= ================================================================================================================================================================================================================================================================================================================================================================================================================================================================================ RSD-11580 Robin KVM manifest prevents KVMs from booting in UEFI (Unified Extensible Firmware Interface) mode with multiple qcow2 disks. This issue is fixed. With Robin CNP v5.7.2, support for the ``boot: uefi`` parameter in the ``Image`` section and the ability to define both image and empty qcow2 disks in the ``storage`` section of the manifest file are added. The following qcow2 disks are supported in KVM manifest file: - **qcow2 disk with pre-loaded image**: specify an ``image`` parameter with ``name`` and ``version`` with a storage entry to attach a qcow2 disk with a pre-loaded image. The disk is automatically resized to match the Robin volume size. - **Empty qcow2 disk**: specify ``blank: true`` and ``format: qcow2`` with a storage entry to attach an empty unpartitioned disk. The disk is sized to match the Robin volume and is ready for guest OS initialization during the first boot. RSD-11238, RSD-11474 For a multi-attach volume, when one of the nodes became faulty, in addition to the faulty node, the volume was incorrectly unmounted on other nodes. This issue is fixed. RSD-11346 When you deploy KVM applications with Burstable QoS, the application deployment fails with the following error due to an internal incorrect validation: *CPU min and max values for vnode test1_server01 do not match* This issue is fixed. RSD-11007 In rare scenarios, after you perform an application upgrade on Robin CNP, the application Pod might enter into a ``flapping`` state where it repeatedly restarts. However, after a few restarts, the Pod becomes stable on its own without any intervention. This issue is fixed. RSD-10166 Burstable Pods deployment fails with *Insufficient CPU* or *No host found* errors if their total resource requests (CPU or huge pages) exceed the capacity of a single NUMA node while using a Robin IP pool. This issue is fixed. Robin CNP now allows burstable pods to span multiple NUMA nodes, even when the ``topology_manager_policy`` is set to ``restricted``. RSD-11284, RSD-11374 When upgrading to Robin CNP v5.7.1, the upgrade fails due to SSH key authentication. This occurs because the cluster is enabled with global cryptography policies which force it to use only strong SSH keys created using elliptic curve cryptography (ECC) algorithms such as Ed25519. This issue is fixed. RSD-11584 The issue of the ``robin cert list`` command displaying incorrect certificate expiry dates for Robin control plane components is fixed. RSD-11015 The issue of pod-related events not being displayed for users when running the ``robin event list`` command is fixed. RSD-11050 The issue of Open vSwitch ``(OVS)`` retaining stale ports for a KVM app after an abrupt node shutdown or crash is fixed. These stale ports caused connectivity loss by associating ``MAC`` addresses with incorrect ``VLAN`` tags. RSD-11543 The issue of KVM application deployments failing with a ``CreateContainerError`` when using generic OS variant names such as rhel7 is fixed. ======================= ================================================================================================================================================================================================================================================================================================================================================================================================================================================================================ Known Issues ============= ============= ============================================================================================================================================================================================================================================================================================================================================================================================================ Reference ID Description ============= ============================================================================================================================================================================================================================================================================================================================================================================================================ PP-42237 **Symptom** When you try to deploy a KVM application with multiple vnodes that consume all available vfio-pci resources on a host, the application fails to restart. If you attempt to restart the application, it might result in displaying this error: Failed to allocate resources for App, leaving the application in a FAULTED or NOTREADY state. **Workaround** You need to stop and start the application. 1. Run the following command to stop the app: ``# robin app stop `` 2. Run the following command to start the app: ``# robin app start `` PP-41172 **Symptom** After upgrading from a supported Robin CNP version to Robin CNP v5.7.2, NFS mounts on client nodes can become unresponsive, leading to critical issues such as the following: * kubelet instability (frequent restarts) * patronictl command failures (connection refused) * kubectl exec operations failing for pods on affected nodes. This problem is primarily observed when the NFS client (the node where the PVC is mounted) experiences prolonged unresponsiveness from the NFS server. **Workaround** If an NFS mount is hung, you can recover the system by forcing new NFS sessions: 1. Identify hung NFS Mounts: * Attempt to access the NFS mount path. **Example** */var/lib/kubelet/pods//volumes/kubernetes.io~csi//mount.* * If the command hangs (for example, ``ls /path/to/mount`` with no output and requiring Ctrl+C to exit), the mount is hung. **Example** .. code-block:: text $ ls /var/lib/pods/0cab5468-b43f-4afd-bad3/volumes/ kubernetes.io~csi/pvc-7f31e2fc-b5b7-4991-ab97/mount^C 2. Confirm that the NFS export for the hung PVC is in a ``READY`` state using ``robin nfs export-list``. **Example** .. code-block:: text $ robin nfs export-list|grep pvc-7f31e2fc-b5b7-4991-ab97 |READY|19|pvc-7f31e2fc-b5b7-4991-ab97|robin-nfs-shared-23| ["sm-compute02"]|192.02.204.31:/pvc-7f31e2fc-b5b7-4991-ab97| loaddbehg-fio|sachin| 3. From the ``robin nfs export-list`` output, note the NFS Server Pod name serving the hung export. For example, in the above output, the NFS server Pod is ``robin-nfs-shared-23``. 4. Delete the identified NFS server pod. This action forces new NFS sessions and typically resolves the hung mount issue: .. code-block:: text # kubectl delete pod -n robinio **Example** .. code-block:: text # kubectl delete pod robin-nfs-shared-23 -n robinio PP-41195 **Symptom** After you perform a force unmount operation for an RWX (ReadWriteMany) volume on a host where it was previously mounted, or during certain failover scenarios involving RWX volumes, the associated Robin NFS server Pod might transition into an ASSIGNED_ERR state. When the Pod is in this state, the NFS server Pod is unable to export the volume, rendering the volume inaccessible via NFS. **Workaround** Contact the Robin Customer Support team to resolve this issue. PP-41192 **Symptom** When creating a KVM using the Robin bundle, CPU core count must be specified in even numbers. If odd numbers are specified, the KVM will not be deployed, however, the respective Pod might be in the ``Running`` state. PP-41159 **Symptom** After upgrading to Robin CNP v5.7.2, some Pods might get stuck in the ``ContainerCreating`` state because the VolumeUnmount job holds a lock on the volume, and the ``VolumeUnmount`` job shows the following error: *Target /var/lib/robin/nfs/robin-nfs-shared-107/ganesha/pvc-b84ed376-2a58-484f-8031-4530c1899b2c is busy, please retry later. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))* **Workaround** Apply the following workaround steps: 1. Verify no pending I/Os at the RIO layer: .. code-block:: text # rio snapshot iolist 2. Verify no in-flight I/Os on the relevant block device: .. code-block:: text # /sys/block//inflight 3. If there are no pending and in-flight I/Os, do a lazy unmount with the path shown in the ``VolumeUnmount`` job error message: .. code-block:: text # umount -l PP-42238 **Symptom** You might observe a mismatch of Helm versions between the host and the robin-client or the robin-master Pod. PP-35015 **Symptom** After renewing the expired Robin license successfully, Robin CNP incorrectly displays the License Violation error when you try to add a new user to the cluster. If you notice this issue, apply the following workaround. **Workaround** You need to restart the robin-server-bg service. .. code-block:: text # rbash master # supervisorctl restart robin-server-bg PP-39901 **Symptom** After rebooting a worker node that is hosting Pods with Robin RWX volumes, one or more application Pods using these volumes might get stuck in the ContainerCreating state indefinitely. **Workaround** If you notice the above issue, contact the Robin CS team. PP-39645 **Symptom** Robin CNP v5.7.2 may rarely fail to honor soft Pod anti-affinity, resulting in uneven Pod distribution on labeled nodes. When you deploy an application with the recommended ``preferred DuringSchedulingIgnoredDuringExecution`` soft Pod Anti-Affinity, pods may not be uniformly distributed across the available, labeled nodes as expected. Kubernetes routes nodes to Robin CNP for pod scheduling. In some situations, a request to the Robin CNP from Kubernetes may not have the required node to honor soft affinity. **Workaround** Bounce the Pod that has not honored soft affinity. PP-34226 **Symptom** When a PersistentVolumeClaim (PVC) is created, the CSI provisioner initiates a ``VolumeCreate`` job. If this job fails, the CSI provisioner calls a new ``VolumeCreate`` job again for the same PVC. However, if the PVC is deleted during this process, the CSI provisioner will continue to call the ``VolumeCreate`` job because it does not verify the existence of the PVC before calling the ``VolumeCreate`` job. **Workaround** Bounce the CSI provisioner Pod. .. code-block:: text # kubectl delete pod -n robinio PP-34414 **Symptom** In rare scenarios, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issue: Some app Pods get stuck in the ``ContainerCreating`` state after restarting. Steps to identify the issue: Check the following type of faulted error in the ``EVENT_DISK_FAULTED`` event type in the ``robin event list`` command: disk /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 on node default:poch06 is faulted .. code-block:: text # robin event list --type EVENT_DISK_FAULTED If you see the disk is faulted error, check the IOMGR logs for **dev_open()** and **Failed to exclusively open** error messages on the node where disks are present. .. code-block:: text # cat iomgr.log.0 | grep scsi-SATA_Micron_M500_MTFD_1401096049D5 | grep "dev_open" If you see the Device or resource busy error message in the log file, use ``fuser`` command to confirm whether the device is in use: .. code-block:: text # fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 **Workaround** If the device is not in use, restart the ``IOMGR`` service on the respective node: .. code-block:: text # supervisorctl restart iomgr PP-39632 **Symptom** After upgrading to Robin CNP v5.7.2, NFS client might hang with no pending IO message. For no pending IO, refer to this path: ``/var/log/robin/nodeplugin/robin-csi.log`` with the following message: CsiServer_9 - robin.utils - INFO - Executing command /usr/bin/nc -z -w 6 172.19.149.161 2049 with timeout 60 seconds CsiServer_9 - robin.utils - INFO - Command /usr/bin/nc -z -w 6 172.19.149.161 2049 completed with return code 0. CsiServer_9 - robin.utils - INFO - Standard out: Also, you can find the following message in the ``dmesg``: nfs: server 172.19.131.218 not responding, timed out nfs: server 172.19.131.218 not responding, timed out nfs: server 172.19.131.218 not responding, timed out **Workaround** 1. Check the node provisioner logs where the PVC is checking for the path and it is hung. 2. For the deployment/statefulset that is using the problematic PVC, scale down the replica count to ``0``. 3. Ensure all Pods associated with the application have terminated. 4. Scale up the replica count back to the original value. PP-34492 **Symptom** When you run the ``robin host list`` command and if you notice a host is in the ``NotReady`` and ``PROBE_PENDING`` states, follow these workaround steps to diagnose and recover the host: **Workaround** Run the following command to check which host is in the ``NotReady`` and ``PROBE_PENDING`` states: .. code-block:: text # robin host list Run the following command to check the current (``Curr``) and desired (``Desired``) states of the host in the Agent Process (AP) report: .. code-block:: text # robin ap report | grep Run the following command to probe the host and recover it: .. code-block:: text # robin host probe --wait This command forces a probe of the host and updates its state in the cluster. Run the following command to verify the host’s state: .. code-block:: text # robin host list The host should now transition to the ``Ready`` state. PP-35478 **Symptom** In rare scenarios, the kube-scheduler may not function as expected when many Pods are deployed in a cluster due to issues with the ``kube-scheduler`` lease. **Workaround** Complete the following workaround steps to resolve issues with the ``kube-scheduler`` lease: Run the following command to identify the node where the ``kube-scheduler`` Pod is running with the lease: .. code-block:: text # kubectl get lease -n kube-system Log in to the node identified in the previous step. Check if the ``kube-scheduler`` Pod is running using the following command: .. code-block:: text # docker ps | grep kube-scheduler As the ``kube-scheduler`` is a static Pod, move its configuration file to temporarily stop the Pod: .. code-block:: text # mv /etc/kubernetes/manifests/kube-scheduler.yaml /root Run the following command to confirm that the ``kube-scheduler`` Pod is deleted. This may take a few minutes. .. code-block:: text # docker ps | grep kube-scheduler Verify that the ``kube-scheduler`` lease is transferred to a different Pod: .. code-block:: text # kubectl get lease -n kube-system Copy the static Pod configuration file back to its original location to redeploy the ``kube-scheduler`` Pod: .. code-block:: text # mv /root/kube-scheduler.yaml /etc/kubernetes/manifests/ Confirm that the ``kube-scheduler`` container is running: .. code-block:: text # docker ps | grep kube-scheduler PP-36865 **Symptom** After rebooting a node, the node might not come back online after a long time, and the host BMC console displays the following message for RWX PVCs mounted on that node: *Remounting nfs rwx pic timed out, issugin SIGKILL* **Workaround** Power cycle the host system. PP-37330 **Symptom** During or after upgrading to Robin CNP v5.7.2, the ``NFSAgentAddExport`` job might fail with an error message similar to the following: /bin/mount /dev/sdn /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41 -o discard failed with return code 32: mount: /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error. **Workaround** If you notice this issue, contact the Robin Customer Support team for assistance. PP-37416 **Symptom** In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.7.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes: Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed Steps to identify the issue: 1. Check the ``/var/log/robin-install.log`` file to know why the upgrade failed. **Example** etcd container: {etcd_container_id} and exited status: {is_exited} Killing progress PID 4168272 Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed Install logs can be found at /var/log/robin-install.log Caught EXIT signal. exit_code: 1 .. Note:: You can get the above error logs for any static manifests of api-server, etcd, scheduler, and controller-manager. 2. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the ``Exited`` state. .. code-block:: text # docker ps -a | grep schedule **Workaround** If you notice the above error, restart the kubelet: .. code-block:: text # systemctl restart kubelet PP-38044 **Symptom** When attempting to detach a repository from a hydrated Helm application, the operation might fail with the following error: Can’t detach repo as the application is in ``IMPORTED`` state, hydrate it in order to detach the repo from it. This issue occurs even if the application has already been hydrated. The system incorrectly marks the application in the ``IMPORTED`` state, preventing the repository from being detached. **Workaround** To detach the repository, manually rehydrate the application and then retry the detach operation: 1. Run the following command to rehydrate the application. .. code-block:: text # robin app hydrate --wait 2. Once the hydration is complete, detach the repository. .. code-block:: text # robin app detach-repo --wait –y PP-38251 **Symptom** When evacuating a disk from an offline node in the large cluster, the robin drive evacuate command fails with the following error message: “Json deserialize error: invalid value: integer -10, expected u64 at line 1 column 2440. **Workaround** If you notice the above issue, contact the Robin CS team. PP-38471 **Symptom** When StatefulSet Pods restart, the Pods might get stuck in the ``ContainerCreating`` state with the error: CSINode does not contain driver robin due to stale NFS mount points and failure of the ``csi-nodeplugin-robin`` Pod due to ``CrashLoopBackOff`` state. **Workaround** If you notice this issue, restart the ``csi-nodeplugin`` Pod. .. code-block:: text # kubectl delete pod -n robinio PP-38087 **Symptom** In certain cases, the snapshot size allocated to a volume could be less than what is requested. This occurs when the volume is allocated from multiple disks. PP-38924 **Symptom** After you delete multiple Helm applications, one of the Pods might get stuck in the ``Error`` state, and one or more ReadWriteMany (RWX) volumes might get stuck in the ``Terminating`` state. **Workaround** On the node where the Pod stuck in the Error state, restart Docker and Kubelet. PP-34451 **Symptom** In rare scenarios, the RWX Pod might be stuck in the ``ContainerCannotRun`` state and display the following error in the Pod’s event: *mount.nfs: mount system call failed* Perform the following steps to confirm the issue: 1. Run the ``robin volume info`` command and check for the following details: a. Check the status of the volume. It should be in the ONLINE status. b. Check whether the respective volume mount path exists. c. Check the physical and logical sizes of the volume. If the physical size of the volume is greater than the logical size, then the volume is full. 2. Run the following command to check whether any of the disks for the volume are running out of space: .. code-block:: text # robin disk info 3. Run the ``lsblk`` and ``blkid`` commands to check whether the device mount path works fine on the nodes where the volume is mounted. 4. Run the ``ls`` command to check if accessing the respective filesystem mount path gives any input and output errors. If you notice any input and output errors in step 4, apply the following workaround: **Workaround** 1. Find all the Pods that are using the respective PVC: .. code-block:: text # kubectl get pods --all-namespaces -o=jsonpath='{range .items[]} {.metadata.namespace} /{.metadata.name}{"\t"}{.spec.volumes[]. persistentVolumeClaim.claimName}{"\n"}{end}' | grep 2. Bounce all the Pods identified in step 1: .. code-block:: text # kubectl delete pod -n PP-21916 **Symptom** A pod IP is not pingable from any other node in the cluster, apart from the node where it is running. **Workaround** Bounce the Calico pod running on the node where the issue is seen. PP-40819 **Symptom** From the Robin CNP UI, when you try to deploy an application by cloning from a snapshot, the operation might fail with the following similar error message indicating an invalid negative CPU value: Invalid value: “-200m”: must be greater than or equal to 0. You might observe this issue specifically when the application has sidecar containers configured with CPU requests/limits. This is a CNP UI issue. You can use the CNP CLI to perform the same operation successfully. **Workaround** Use the following Robin CLI command to clone the snapshot and create an app: .. code-block:: text # robin app create from-snapshot --rpool default --wait PP-41022 **Symptom** The ``robin host list`` command might incorrectly display negative values for CPU resources cores (specifically “Free” or “Allocated” CPU) on certain nodes. This occurs even when there are no user applications consuming significant CPU, suggesting a miscalculation or misreporting of available resources. The issue impacts the ability to accurately assess node capacity and schedule new workloads. **Workaround** If you notice this issue, apply the workaround. Restart kubelet on the affected node: .. code-block:: text # systemctl restart kubelet PP-40993 **Symptom** During large cluster upgrades, the upgrade might fail during Robin pre‑upgrade actions if **Robin Auto Pilot** creates active jobs. This occurs when multiple Robin Auto Pilot watchers are configured for a single pod, resulting in lingering jobs (for example, VnodeDeploy) that block the upgrade process. **Workaround** Restart the ``robin-master-bg`` service on the master node to clear active Auto Pilot jobs, then retry the upgrade. PP-39467 **Symptom** When deploying applications with RWX PVCs, application Pods fail to mount volumes and stuck in the ``ContainerCreating`` state because RPC requests are stuck in IO operations on the volumes, leading to degraded volumes and faulted storage drives. **Workaround** Reboot the host that is in the ``Notready`` state. ============= ============================================================================================================================================================================================================================================================================================================================================================================================================ Technical Support ================= Contact `Robin Technical support `_ for any assistance.