26. Release Notes

26.1. Robin Cloud Native Platform v5.7.2

The Robin Cloud Native Platform (CNP) v5.7.2 release notes document has pre- and post-upgrade considerations, improvements, fixed issues, and known issues.

Release Date: April 22, 2026

26.1.1. Infrastructure Versions

The following software applications are included in this CNP release:

Software Application

Version

Kubernetes

1.33.5

Docker

25.0.2 (RHEL 8.10 or Rocky Linux 8.10)

Podman

5.4.0 (RHEL 9.6)

Prometheus

2.39.1

Prometheus Adapter

0.10.0

Node Exporter

1.4.0

Calico

3.28.2

HAProxy

2.4.7

PostgreSQL

14.12

Grafana

9.2.3

CRI Tools

1.33.0

cert-manager

1.19.1

26.1.2. Supported Operating Systems

The following are the supported operating systems and kernel versions for Robin CNP v5.7.2:

OS Version

Kernel Version

Red Hat Enterprise Linux 8.10

4.18.0-553.el8_10.x86_64

Rocky Linux 8.10

4.18.0-553.el8_10.x86_64

Red Hat Enterprise Linux 9.6

5.14.0-570.24.1.el9_6.x86_64+rt

Note

Robin CNP supports both RT and non-RT kernels on the above supported operating systems.

26.1.3. Upgrade Paths

The following are the supported upgrade paths for Robin CNP v5.7.2:

  • Robin CNP v5.4.3 HF5+PP to Robin CNP v5.7.2-330

  • Robin CNP v5.4.3 HF6 to Robin CNP v5.7.2-330

  • Robin CNP v5.4.3 HF7 to Robin CNP v5.7.2-330

  • Robin CNP v5.5.1-1950 to Robin CNP v5.7.2-330 (For CNO v4.1.0 support only)

26.1.3.1. Pre-upgrade considerations

  • For a successful upgrade, you must run the possible_job_stuck.py script before and after the upgrade. Contact the Robin Support team for the upgrade procedure using the script.

  • When upgrading from supported Robin CNP versions to Robin CNP v5.7.2, if your cluster already has cert-manager installed, you must uninstall it before upgrading to Robin CNP v5.7.2.

  • Before upgrading to Robin CNP v5.7.2, if the robin-certs-check job or CronJob is running, you must stop it. To stop the robin-certs-check job, run the kubectl delete job robin-certs-check -n robinio command, and to stop the robin-certs-check CronJob, run the robin cert check --stop-cronjob command.

26.1.3.2. Post-upgrade considerations

  • After upgrading to Robin CNP v5.7.2, verify that the value of the k8s_resource_sync config parameter is set to 60000 using the robin schedule list | grep -i K8sResSync command. If it is not set, you must run the robin schedule update K8sResSync k8s_resource_sync 60000 command to update the value of the robin schedule K8sResSync config parameter.

  • After upgrading to Robin CNP v5.7.2, you must run the robin-server validate-role-bindings command. To run this command, you need to log in to the robin-master Pod. This command verifies the roles assigned to each user in the cluster and corrects them if necessary.

  • After upgrading to Robin CNP v5.7.2, the k8s_auto_registration config parameter is disabled by default. The config setting is deactivated to prevent all Kubernetes apps from automatically registering and consuming resources. The following are the points you must be aware of with this change:

    • You can register the Kubernetes apps using the robin app register command manually and use Robin CNP for snapshots, clones, and backup operations of the Kubernetes app.

    • As this config parameter is disabled, when you run the robin app nfs-list command, the mappings between Kubernetes apps and NFS server Pods are not listed in the command output.

    • If you need mapping between a Kubernetes app and an NFS server Pod when the k8s_auto_registration config parameter is disabled or the k8s app is not manually registered, get the PVC name from the Pod YAML file (kubectl get pod -n <name> -o YAML) and run the robin nfs export list | grep <pvc name> command.

    • The robin nfs export list command output displays the PVC name and namespace.

  • After upgrading to Robin CNP v5.7.2, you must start the robin-certs-check CronJob using the robin cert check -stat-cronjob command.

26.1.3.3. Pre-upgrade steps

Upgrading from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2-330

Before upgrading from Robin CNP v5.4.3 to Robin CNP v5.7.2, perform the following steps:

  1. Update the value of the suicide_threshold config parameter to 1800:

    # robin config update agent suicide_threshold 1800
    
  2. Set the toleration seconds for all NFS server Pods to 86400 seconds. After the upgrade, you must change the toleration seconds according to the post-upgrade steps.

    for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds to 86400";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done
    
  3. Check the toleration seconds for Robin master Pods:

    # kubectl get pod -n robinio robin-master-xxxx-xxx -o yaml | grep -i tolera
    
  4. Edit the Pod YAML to update the toleration seconds for nodes that are in the NotReady state from 60 seconds to 600 seconds:

    # kubectl edit pod -n robinio robin-master-xxx-xxx
    
  5. Verify the webhooks are enabled by running the robin config list | grep -I robin_k8s_extension command. It should be true. If it is disabled, then enable it:

    # robin config update manager robin_k8s_extension True
    
  6. Copy the new robin-server binary to the /usr/local/robin/patch directory.

  7. Create a preentry.sh file in the /usr/local/robin directory on all master nodes and add the following:

    #!/bin/bash
    ver=$(cd /opt/robin/ && readlink current)
    if [[ $ver == "5.4.3-<current pre upgrade version>" ]]; then
        if [ -f /usr/local/robin/patch/robin-server ]; then
            if [[ ! -f /opt/robin/current/bin/robin-server.backup  ]]; then
                 /usr/bin/cp /opt/robin/current/bin/robin-server /opt/robin/current/bin/robin-server.backup
            fi
            /usr/bin/cp /usr/local/robin/patch/robin-server /opt/robin/current/bin/robin-server
        fi
    fi
    
  8. Change permissions for the preentry.sh and robin-server binary:

    # chmod 755 preentry.sh
    # chmod 755 robin-server
    
  9. Bounce the Robin master Pod to apply the robin-server binary:

    # kubectl delete pod -n robinio -l app=robin-master
    
  10. Verify the md5sum of the new Robin master Pod:

    # rbash master
    # md5sum /opt/robin/current/bin/robin-server
    

    The output of the above command should match with the following:

    • Robin CNP v5.4.3 HF5+PP - b1b7fb80c5e14b17c4d26a345efe8a3f

    • Robin CNP v5.4.3 HF6 - 54604d7f3f06584711d4fa864ee8b787

    • Robin CNP v5.4.3 HF7 - 17312566c198b3732857ba50b8d60b8f

26.1.3.4. Post-upgrade steps

Upgrading from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2-330

After upgrading from Robin CNP v5.4.3 to Robin CNP v5.7.2, perform the following steps:

  1. Verify the state of the hosts, Pods, services, apps, and nodes.

  2. Update the value of the suicide_threshold config parameter to 40:

    # robin config update agent suicide_threshold 40
    
  3. Set the check_helm_apps config parameter to False:

    # robin config update cluster check_helm_apps False
    
  4. Verify the robin_k8s_extension config parameter is set to True. If not, set it to True.

    # robin config update manager robin_k8s_extension True
    
  5. Set the toleration seconds for all NFS server Pods to 60 seconds when the node is in the notready state and set it to 0 seconds when the node is in the unreachable state.

    # for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null
    

26.1.4. Improvements

26.1.4.1. Memory Manager support during upgrades

Robin CNP v5.7.2 now supports and validates enabling the Memory Manager during cluster upgrades from the supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2.

You can provide a configuration JSON file with the memory-manager-policy set to Static using the --config-json flag, the upgrade process automatically initializes the Memory Manager on the specified nodes.

Post-upgrade validation confirms that the policy is correctly reflected in the robin config list output and the Kubelet state file located at /var/lib/kubelet/memory_manager_state.

Enable the Memory Manager during cluster upgrade

To enable the Memory Manager when upgrading a cluster from supported Robin CNP v5.4.3 versions to Robin CNP v5.7.2, follow these steps:

  1. Create a configuration JSON file (for example, config-upg.json) that specifies the policy for each host:

    {
      "<host-ip>": {
        "memory-manager-policy": "Static"
      }
    }
    
  2. Run the upgrade-cluster command and include the --config-json flag:

    # ./gorobin upgrade-cluster --config-json config-upg.json
    

    Example

    # ./gorobin_5.7.2-328 onprem upgrade-cluster --hosts-json host.json --gorobintar gorobintar-5.7.2-328-bin.tar --ignore-warnings --config-json config-upg.json --robin-admin-user admin --robin-ad min-passwd Robin123
    
  3. After the upgrade completes, verify the configuration:

    • Run robin config list | grep policy and confirm mem_manager_policy is set to Static.

    • Verify the Kubelet state by checking the following file: /var/lib/kubelet/memory_manager_state.

26.1.4.2. Secure LDAP (LDAPS) Support

Robin CNP v5.7.2 supports Lightweight Directory Access Protocol Secure (LDAPS) connections on port 636. LDAPS uses TLS or SSL to encrypt communication between clients and servers. This feature protects sensitive user credentials and directory information by encrypting communication between Robin CNP and your LDAP server.

To configure LDAPS, specify port 636 when you add your LDAP server and ensure proper certificate validation. For more information, see Add an LDAP server.

26.1.5. Fixed Issues

Reference ID

Description

RSD-11580

Robin KVM manifest prevents KVMs from booting in UEFI (Unified Extensible Firmware Interface) mode with multiple qcow2 disks.

This issue is fixed.

With Robin CNP v5.7.2, support for the boot: uefi parameter in the Image section and the ability to define both image and empty qcow2 disks in the storage section of the manifest file are added.

The following qcow2 disks are supported in KVM manifest file:

  • qcow2 disk with pre-loaded image: specify an image parameter with name and version with a storage entry to attach a qcow2 disk with a pre-loaded image. The disk is automatically resized to match the Robin volume size.

  • Empty qcow2 disk: specify blank: true and format: qcow2 with a storage entry to attach an empty unpartitioned disk. The disk is sized to match the Robin volume and is ready for guest OS initialization during the first boot.

RSD-11238, RSD-11474

For a multi-attach volume, when one of the nodes became faulty, in addition to the faulty node, the volume was incorrectly unmounted on other nodes. This issue is fixed.

RSD-11346

When you deploy KVM applications with Burstable QoS, the application deployment fails with the following error due to an internal incorrect validation:

CPU min and max values for vnode test1_server01 do not match

This issue is fixed.

RSD-11007

In rare scenarios, after you perform an application upgrade on Robin CNP, the application Pod might enter into a flapping state where it repeatedly restarts. However, after a few restarts, the Pod becomes stable on its own without any intervention. This issue is fixed.

RSD-10166

Burstable Pods deployment fails with Insufficient CPU or No host found errors if their total resource requests (CPU or huge pages) exceed the capacity of a single NUMA node while using a Robin IP pool. This issue is fixed.

Robin CNP now allows burstable pods to span multiple NUMA nodes, even when the topology_manager_policy is set to restricted.

RSD-11284, RSD-11374

When upgrading to Robin CNP v5.7.1, the upgrade fails due to SSH key authentication. This occurs because the cluster is enabled with global cryptography policies which force it to use only strong SSH keys created using elliptic curve cryptography (ECC) algorithms such as Ed25519. This issue is fixed.

RSD-11584

The issue of the robin cert list command displaying incorrect certificate expiry dates for Robin control plane components is fixed.

RSD-11015

The issue of pod-related events not being displayed for users when running the robin event list command is fixed.

RSD-11050

The issue of Open vSwitch (OVS) retaining stale ports for a KVM app after an abrupt node shutdown or crash is fixed. These stale ports caused connectivity loss by associating MAC addresses with incorrect VLAN tags.

RSD-11543

The issue of KVM application deployments failing with a CreateContainerError when using generic OS variant names such as rhel7 is fixed.

26.1.6. Known Issues

Reference ID

Description

PP-42237

Symptom

When you try to deploy a KVM application with multiple vnodes that consume all available vfio-pci resources on a host, the application fails to restart. If you attempt to restart the application, it might result in displaying this error: Failed to allocate resources for App, leaving the application in a FAULTED or NOTREADY state.

Workaround

You need to stop and start the application.

  1. Run the following command to stop the app:

    # robin app stop <app_name>

  2. Run the following command to start the app:

    # robin app start <app_name>

PP-41172

Symptom

After upgrading from a supported Robin CNP version to Robin CNP v5.7.2, NFS mounts on client nodes can become unresponsive, leading to critical issues such as the following:

  • kubelet instability (frequent restarts)

  • patronictl command failures (connection refused)

  • kubectl exec operations failing for pods on affected nodes.

This problem is primarily observed when the NFS client (the node where the PVC is mounted) experiences prolonged unresponsiveness from the NFS server.

Workaround

If an NFS mount is hung, you can recover the system by forcing new NFS sessions:

  1. Identify hung NFS Mounts:

    • Attempt to access the NFS mount path.

      Example

      /var/lib/kubelet/pods/<pod_uid>/volumes/kubernetes.io~csi/<pvc_name>/mount.

    • If the command hangs (for example, ls /path/to/mount with no output and requiring Ctrl+C to exit), the mount is hung.

      Example

      $ ls /var/lib/pods/0cab5468-b43f-4afd-bad3/volumes/
      kubernetes.io~csi/pvc-7f31e2fc-b5b7-4991-ab97/mount
      
  2. Confirm that the NFS export for the hung PVC is in a READY state using robin nfs export-list.

    Example

    $ robin nfs export-list|grep pvc-7f31e2fc-b5b7-4991-ab97
    |READY|19|pvc-7f31e2fc-b5b7-4991-ab97|robin-nfs-shared-23|
    ["sm-compute02"]|192.02.204.31:/pvc-7f31e2fc-b5b7-4991-ab97|
    loaddbehg-fio|sachin|
    
  3. From the robin nfs export-list output, note the NFS Server Pod name serving the hung export. For example, in the above output, the NFS server Pod is robin-nfs-shared-23.

  4. Delete the identified NFS server pod. This action forces new NFS sessions and typically resolves the hung mount issue:

    # kubectl delete pod -n robinio <nfs_shared_server_pod_name>
    

    Example

    # kubectl delete pod robin-nfs-shared-23 -n robinio
    

PP-41195

Symptom

After you perform a force unmount operation for an RWX (ReadWriteMany) volume on a host where it was previously mounted, or during certain failover scenarios involving RWX volumes, the associated Robin NFS server Pod might transition into an ASSIGNED_ERR state.

When the Pod is in this state, the NFS server Pod is unable to export the volume, rendering the volume inaccessible via NFS.

Workaround

Contact the Robin Customer Support team to resolve this issue.

PP-41192

Symptom

When creating a KVM using the Robin bundle, CPU core count must be specified in even numbers. If odd numbers are specified, the KVM will not be deployed, however, the respective Pod might be in the Running state.

PP-41159

Symptom

After upgrading to Robin CNP v5.7.2, some Pods might get stuck in the ContainerCreating state because the VolumeUnmount job holds a lock on the volume, and the VolumeUnmount job shows the following error:

Target /var/lib/robin/nfs/robin-nfs-shared-107/ganesha/pvc-b84ed376-2a58-484f-8031-4530c1899b2c is busy, please retry later. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))

Workaround

Apply the following workaround steps:

  1. Verify no pending I/Os at the RIO layer:

    # rio snapshot iolist
    
  2. Verify no in-flight I/Os on the relevant block device:

    # /sys/block/<dev>/inflight
    
  3. If there are no pending and in-flight I/Os, do a lazy unmount with the path shown in the VolumeUnmount job error message:

    # umount -l <mountpath>
    

PP-42238

Symptom

You might observe a mismatch of Helm versions between the host and the robin-client or the robin-master Pod.

Workaround

Download the correct Helm binary directly from the Robin master pod:

# rbash master
# hostname -i --capture the ip
# exit
# curl -k 'https://[captured_ip]:29442/api/v3/robin_server/download?
file=helm&os=linux' -o helm
# chmod +x helm
# ./helm version
# curl -k 'https://[fd74:ca9b:3a09:868c:172:18:0:7c44]:29442/api/v3/
robin_server/download?file=helm&os=linux' -o helm

PP-35015

Symptom

After renewing the expired Robin license successfully, Robin CNP incorrectly displays the License Violation error when you try to add a new user to the cluster. If you notice this issue, apply the following workaround.

Workaround

You need to restart the robin-server-bg service.

# rbash master
# supervisorctl restart robin-server-bg

PP-39901

Symptom

After rebooting a worker node that is hosting Pods with Robin RWX volumes, one or more application Pods using these volumes might get stuck in the ContainerCreating state indefinitely.

Workaround

If you notice the above issue, contact the Robin CS team.

PP-39645

Symptom

Robin CNP v5.7.2 may rarely fail to honor soft Pod anti-affinity, resulting in uneven Pod distribution on labeled nodes.

When you deploy an application with the recommended preferred DuringSchedulingIgnoredDuringExecution soft Pod Anti-Affinity, pods may not be uniformly distributed across the available, labeled nodes as expected. Kubernetes routes nodes to Robin CNP for pod scheduling. In some situations, a request to the Robin CNP from Kubernetes may not have the required node to honor soft affinity.

Workaround

Bounce the Pod that has not honored soft affinity.

PP-34226

Symptom

When a PersistentVolumeClaim (PVC) is created, the CSI provisioner initiates a VolumeCreate job. If this job fails, the CSI provisioner calls a new VolumeCreate job again for the same PVC. However, if the PVC is deleted during this process, the CSI provisioner will continue to call the VolumeCreate job because it does not verify the existence of the PVC before calling the VolumeCreate job.

Workaround

Bounce the CSI provisioner Pod.

# kubectl delete pod -n robinio <csi-provisioner-robin>

PP-34414

Symptom

In rare scenarios, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issue:

Some app Pods get stuck in the ContainerCreating state after restarting.

Steps to identify the issue:

Check the following type of faulted error in the EVENT_DISK_FAULTED event type in the robin event list command:

disk /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 on node default:poch06 is faulted

# robin event list --type EVENT_DISK_FAULTED

If you see the disk is faulted error, check the IOMGR logs for dev_open() and Failed to exclusively open error messages on the node where disks are present.

# cat iomgr.log.0 | grep scsi-SATA_Micron_M500_MTFD_1401096049D5
| grep "dev_open"

If you see the Device or resource busy error message in the log file, use fuser command to confirm whether the device is in use:

# fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5

Workaround

If the device is not in use, restart the IOMGR service on the respective node:

# supervisorctl restart iomgr

PP-39632

Symptom

After upgrading to Robin CNP v5.7.2, NFS client might hang with no pending IO message.

For no pending IO, refer to this path: /var/log/robin/nodeplugin/robin-csi.log with the following message:

CsiServer_9 - robin.utils - INFO - Executing command /usr/bin/nc -z -w 6 172.19.149.161 2049 with timeout 60 seconds CsiServer_9 - robin.utils - INFO - Command /usr/bin/nc -z -w 6 172.19.149.161 2049 completed with return code 0. CsiServer_9 - robin.utils - INFO - Standard out:

Also, you can find the following message in the dmesg:

nfs: server 172.19.131.218 not responding, timed out nfs: server 172.19.131.218 not responding, timed out nfs: server 172.19.131.218 not responding, timed out

Workaround

  1. Check the node provisioner logs where the PVC is checking for the path and it is hung.

  2. For the deployment/statefulset that is using the problematic PVC, scale down the replica count to 0.

  3. Ensure all Pods associated with the application have terminated.

  4. Scale up the replica count back to the original value.

PP-34492

Symptom

When you run the robin host list command and if you notice a host is in the NotReady and PROBE_PENDING states, follow these workaround steps to diagnose and recover the host:

Workaround

Run the following command to check which host is in the NotReady and PROBE_PENDING states:

# robin host list

Run the following command to check the current (Curr) and desired (Desired) states of the host in the Agent Process (AP) report:

# robin ap report | grep <hostname>

Run the following command to probe the host and recover it:

# robin host probe <hostname> --wait

This command forces a probe of the host and updates its state in the cluster.

Run the following command to verify the host’s state:

# robin host list

The host should now transition to the Ready state.

PP-35478

Symptom

In rare scenarios, the kube-scheduler may not function as expected when many Pods are deployed in a cluster due to issues with the kube-scheduler lease.

Workaround

Complete the following workaround steps to resolve issues with the kube-scheduler lease:

Run the following command to identify the node where the kube-scheduler Pod is running with the lease:

# kubectl get lease -n kube-system

Log in to the node identified in the previous step.

Check if the kube-scheduler Pod is running using the following command:

# docker ps | grep kube-scheduler

As the kube-scheduler is a static Pod, move its configuration file to temporarily stop the Pod:

# mv /etc/kubernetes/manifests/kube-scheduler.yaml /root

Run the following command to confirm that the kube-scheduler Pod is deleted. This may take a few minutes.

# docker ps | grep kube-scheduler

Verify that the kube-scheduler lease is transferred to a different Pod:

# kubectl get lease -n kube-system

Copy the static Pod configuration file back to its original location to redeploy the kube-scheduler Pod:

# mv /root/kube-scheduler.yaml /etc/kubernetes/manifests/

Confirm that the kube-scheduler container is running:

# docker ps | grep kube-scheduler

PP-36865

Symptom

After rebooting a node, the node might not come back online after a long time, and the host BMC console displays the following message for RWX PVCs mounted on that node:

Remounting nfs rwx pic timed out, issugin SIGKILL

Workaround

Power cycle the host system.

PP-37330

Symptom

During or after upgrading to Robin CNP v5.7.2, the NFSAgentAddExport job might fail with an error message similar to the following:

/bin/mount /dev/sdn /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41 -o discard failed with return code 32: mount: /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error.

Workaround

If you notice this issue, contact the Robin Customer Support team for assistance.

PP-37416

Symptom

In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.7.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes:

Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed

Steps to identify the issue:

  1. Check the /var/log/robin-install.log file to know why the upgrade failed.

Example

etcd container: {etcd_container_id} and exited status: {is_exited}

Killing progress PID 4168272

Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed

Install logs can be found at /var/log/robin-install.log

Caught EXIT signal. exit_code: 1

Note

You can get the above error logs for any static manifests of api-server, etcd, scheduler, and controller-manager.

  1. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the Exited state.

# docker ps -a | grep schedule

Workaround

If you notice the above error, restart the kubelet:

# systemctl restart kubelet

PP-38044

Symptom

When attempting to detach a repository from a hydrated Helm application, the operation might fail with the following error:

Can’t detach repo as the application is in IMPORTED state, hydrate it in order to detach the repo from it.

This issue occurs even if the application has already been hydrated. The system incorrectly marks the application in the IMPORTED state, preventing the repository from being detached.

Workaround

To detach the repository, manually rehydrate the application and then retry the detach operation:

  1. Run the following command to rehydrate the application.

    # robin app hydrate --wait
    
  2. Once the hydration is complete, detach the repository.

    # robin app detach-repo --wait –y
    

PP-38251

Symptom

When evacuating a disk from an offline node in the large cluster, the robin drive evacuate command fails with the following error message: “Json deserialize error: invalid value: integer -10, expected u64 at line 1 column 2440.

Workaround

If you notice the above issue, contact the Robin CS team.

PP-38471

Symptom

When StatefulSet Pods restart, the Pods might get stuck in the ContainerCreating state with the error: CSINode <node_name> does not contain driver robin due to stale NFS mount points and failure of the csi-nodeplugin-robin Pod due to CrashLoopBackOff state.

Workaround

If you notice this issue, restart the csi-nodeplugin Pod.

# kubectl delete pod <csi-nodeplugin> -n robinio

PP-38087

Symptom

In certain cases, the snapshot size allocated to a volume could be less than what is requested. This occurs when the volume is allocated from multiple disks.

PP-38924

Symptom

After you delete multiple Helm applications, one of the Pods might get stuck in the Error state, and one or more ReadWriteMany (RWX) volumes might get stuck in the Terminating state.

Workaround

On the node where the Pod stuck in the Error state, restart Docker and Kubelet.

PP-34451

Symptom

In rare scenarios, the RWX Pod might be stuck in the ContainerCannotRun state and display the following error in the Pod’s event:

mount.nfs: mount system call failed

Perform the following steps to confirm the issue:

  1. Run the robin volume info command and check for the following details:

    1. Check the status of the volume. It should be in the ONLINE status.

    2. Check whether the respective volume mount path exists.

    3. Check the physical and logical sizes of the volume. If the physical size of the volume is greater than the logical size, then the volume is full.

  2. Run the following command to check whether any of the disks for the volume are running out of space:

    # robin disk info
    
  3. Run the lsblk and blkid commands to check whether the device mount path works fine on the nodes where the volume is mounted.

  4. Run the ls command to check if accessing the respective filesystem mount path gives any input and output errors.

If you notice any input and output errors in step 4, apply the following workaround:

Workaround

  1. Find all the Pods that are using the respective PVC:

    # kubectl get pods --all-namespaces -o=jsonpath='{range .items[]}
    {.metadata.namespace} /{.metadata.name}{"\t"}{.spec.volumes[].
    persistentVolumeClaim.claimName}{"\n"}{end}' | grep <pvc_nmae>
    
  2. Bounce all the Pods identified in step 1:

    # kubectl delete pod <pod> -n <namespace>
    

PP-21916

Symptom

A pod IP is not pingable from any other node in the cluster, apart from the node where it is running.

Workaround

Bounce the Calico pod running on the node where the issue is seen.

PP-40819

Symptom

From the Robin CNP UI, when you try to deploy an application by cloning from a snapshot, the operation might fail with the following similar error message indicating an invalid negative CPU value: Invalid value: “-200m”: must be greater than or equal to 0.

You might observe this issue specifically when the application has sidecar containers configured with CPU requests/limits. This is a CNP UI issue. You can use the CNP CLI to perform the same operation successfully.

Workaround

Use the following Robin CLI command to clone the snapshot and create an app:

# robin app create from-snapshot <new_app_name>
<snapshot_id> --rpool default --wait

PP-41022

Symptom

The robin host list command might incorrectly display negative values for CPU resources cores (specifically “Free” or “Allocated” CPU) on certain nodes. This occurs even when there are no user applications consuming significant CPU, suggesting a miscalculation or misreporting of available resources. The issue impacts the ability to accurately assess node capacity and schedule new workloads.

Workaround

If you notice this issue, apply the workaround.

Restart kubelet on the affected node:

# systemctl restart kubelet

PP-40993

Symptom

During large cluster upgrades, the upgrade might fail during Robin pre‑upgrade actions if Robin Auto Pilot creates active jobs. This occurs when multiple Robin Auto Pilot watchers are configured for a single pod, resulting in lingering jobs (for example, VnodeDeploy) that block the upgrade process.

Workaround

Restart the robin-master-bg service on the master node to clear active Auto Pilot jobs, then retry the upgrade.

PP-39467

Symptom

When deploying applications with RWX PVCs, application Pods fail to mount volumes and stuck in the ContainerCreating state because RPC requests are stuck in IO operations on the volumes, leading to degraded volumes and faulted storage drives.

Workaround

Reboot the host that is in the Notready state.

26.1.7. Technical Support

Contact Robin Technical support for any assistance.