25. Release Notes

25.1. Robin Cloud Native Platform v5.6.0

The Robin Cloud Native Platform (CNP) v5.6.0 release notes has pre- and post-upgrade steps, new features, improvements, fixed issues, and known issues.

Release Date: June 25, 2025

25.1.1. Infrastructure Versions

The following software applications are included in this CNP release:

Software Application

Version

Kubernetes

1.32.4

Docker

25.0.2

Prometheus

2.39.1

Prometheus Adapter

0.10.0

Node Exporter

1.4.0

Calico

3.28.2

HAProxy

2.4.7

PostgreSQL

14.12

Grafana

9.2.3

CRI Tools

1.32.0

25.1.2. Supported Operating Systems

The following are the supported operating systems and kernel versions for Robin CNP v5.6.0:

OS Version

Kernel Version

RHEL 8.10

4.18.0-553.el8_10.x86_64

Rocky Linux 8.10

4.18.0-553.el8_10.x86_64

25.1.3. Upgrade Paths

The following are the supported upgrade paths for Robin CNP v5.6.0:

  • Robin CNP v5.4.3 HF4 to Robin CNP v5.6.0-128

  • Robin CNP v5.4.3 HF4 PP2 to Robin CNP v5.6.0-128

  • Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128

  • Robin CNP v5.4.3 HF5 PP1 to Robin CNP v5.6.0-128

25.1.3.1. Pre-upgrade considerations

  • For a successful upgrade, you must run the possible_job_stuck.py script before and after the upgrade. Contact the Robin Support team for the upgrade procedure using the script.

  • When upgrading from supported Robin CNP versions to Robin CNP v5.6.0, if your cluster already has cert-manager installed, you must uninstall it before upgrading to Robin CNP v5.6.0.

25.1.3.2. Post-upgrade considerations

  • After upgrading to Robin CNP v5.6.0, you must run the robin schedule update K8sResSync k8s_resource_sync 60000 command to update the robin schedule K8sResSync.

  • After upgrading to Robin CNP v5.6.0, you must run the robin-server validate-role-bindings command. To run this command, you need to log in to the robin-master Pod. This command verifies the roles assigned to each user in the cluster and corrects them if necessary.

  • After upgrading to Robin CNP v5.6.0, the k8s_auto_registration config parameter is disabled by default. The config setting is deactivated to prevent all Kubernetes apps from automatically registering and consuming resources. The following are the points you must be aware of with this change:

    • You can register the Kubernetes apps using the robin app register command manually and use Robin CNP for snapshots, clones, and backup operations of the Kubernetes app.

    • As this config parameter is disabled, when you run the robin app nfs-list command, the mappings between Kubernetes apps and NFS server Pods are not listed in the command output.

    • If you need mapping between Kubernetes app and NFS server Pod when the k8s_auto_registration config parameter is disabled or the k8s app is not manually registered, get the PVC name from the Pod YAML file (kubectl get pod -n <name> -o YAML) and run the robin nfs export list | grep <pvc name> command.

    • The robin nfs export list command output displays the PVC name and namespace.

25.1.3.3. Pre-upgrade steps

  • Upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128

    Before upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, perform the following steps:

    1. Update the value of the suicide_threshold config parameter to 1800:

      # robin config update agent suicide_threshold 1800
      
    2. Disable the NFS Server Monitor schedule:

      # robin schedule disable "NFS Server" Monitor
      
    3. Set the toleration seconds for all NFS server Pods to 86400 seconds. After upgrade, you must change the tolerations seconds according to the post-upgrade steps.

      for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds to 86400";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done
      
  • Upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0-128

    Before upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0, perform the following steps:

    1. Update the value of the suicide_threshold config parameter to 1800:

      # robin config update agent suicide_threshold 1800
      
    2. Set the NFS Server schedule CronJob to at least more than 6 months:

      # rbash master
      # rsql
      # update schedule set kwargs='{"cron":"1 1 1 1 *"}' where callback='nfs_server_monitor';
      # \q
      # systemctl restart robin-server
      
    3. Set the toleration seconds for all NFS server Pods to 86400 seconds. After upgrade, you must change the tolerations seconds according to the post-upgrade steps.

      for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds to 86400";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done
      

25.1.3.4. Post-upgrade steps

  • After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128

    After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, perform the following steps:

    1. Update the value of the suicide_threshold config parameter to 40:

      # robin config update agent suicide_threshold 40
      
    2. Enable the NFS Server Monitor schedule:

      # robin schedule enable "NFS Server" Monitor
      
    3. Set the check_helm_apps config parameter to False:

      # robin config update cluster check_helm_apps False
      
    4. Set the chargeback_track_k8s_resusage config parameter to False:

      # robin config update server chargeback_track_k8s_resusage False
      
    5. Set the robin_k8s_extension config parameter to True:

      # robin config update manager robin_k8s_extension True
      
    6. Verify whether the following mutating webhooks are present:

      # kubectl get mutatingwebhookconfigurations -A | grep robin
      k8srobin-deployment-mutating-webhook   1          20d
      k8srobin-ds-mutating-webhook           1          20d
      k8srobin-pod-mutating-webhook          1          20d
      k8srobin-sts-mutating-webhook          1          20d
      robin-deployment-mutating-webhook      1          20d
      robin-ds-mutating-webhook              1          20d
      robin-pod-mutating-webhook             1          20d
      robin-sts-mutating-webhook             1          20d
      
    7. If above k8srobin-* mutating webhooks are not present then bounce the robink8s-serverext Pods:

      # kubectl delete pod -n robinio -l app=robink8s-serverext
      
    8. Verify whether the following validating webhooks are present:

      # kubectl get validatingwebhookconfigurations
      NAME                             WEBHOOKS   AGE
      cert-manager-webhook             1          45h
      controllers-validating-webhook   1          31h
      ippoolcr-validating-webhook      1          31h
      namespaces-validating-webhook    1          31h
      pods-validating-webhook          1          31h
      pvcs-validating-webhook          1          31h
      
    9. If robin-* mutating webhooks displayed in the step 6 output and validating webhooks displayed in the step 8 output are not present on your setup, then restart the robin-server-bg service:

      # rbash master
      # supervisorctl restart robin-server-bg
      
    10. Set the toleration seconds for all NFS server Pods to 60 seconds when the node is in the notready state and set to 0 seconds, when the node is unreachable state.

      for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null
      
  • After upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0-128

    After upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0, perform the following steps:

    1. Update the value of the suicide_threshold config parameter to 40:

      # robin config update agent suicide_threshold 40
      
    2. Enable the NFS Server Monitor schedule:

      # robin schedule enable "NFS Server" Monitor
      
    3. Set the check_helm_apps config parameter to False:

      # robin config update cluster check_helm_apps False
      
    4. Set the chargeback_track_k8s_resusage config parameter to False:

      # robin config update server chargeback_track_k8s_resusage False
      
    5. Set the robin_k8s_extension config parameter to True:

      # robin config update manager robin_k8s_extension True
      
    6. Delete the NFS Server schedule CronJob and restart the robin-server and robin-server-bg services:

      # rbash master
      # rsql
      # DELETE from schedule where callback='nfs_server_monitor';
      # \q
      # supervisorctl restart robin-server
      # supervisorctl restart robin-server-bg
      
    7. Verify whether the following mutating webhooks are present:

      # kubectl get mutatingwebhookconfigurations -A | grep robin
      k8srobin-deployment-mutating-webhook   1          20d
      k8srobin-ds-mutating-webhook           1          20d
      k8srobin-pod-mutating-webhook          1          20d
      k8srobin-sts-mutating-webhook          1          20d
      robin-deployment-mutating-webhook      1          20d
      robin-ds-mutating-webhook              1          20d
      robin-pod-mutating-webhook             1          20d
      robin-sts-mutating-webhook             1          20d
      
    8. If above k8srobin-* mutating webhooks are not present then bounce the robink8s-serverext Pods:

      # kubectl delete pod -n robinio -l app=robink8s-serverext
      
    9. Verify whether the following validating webhooks are present:

      # kubectl get validatingwebhookconfigurations
      NAME                             WEBHOOKS   AGE
      cert-manager-webhook             1          45h
      controllers-validating-webhook   1          31h
      ippoolcr-validating-webhook      1          31h
      namespaces-validating-webhook    1          31h
      pods-validating-webhook          1          31h
      pvcs-validating-webhook          1          31h
      
    10. If robin-* mutating webhooks displayed in the step 7 output and validating webhooks displayed in the step 9 output are not present on your setup, then restart the robin-server-bg service:

      # rbash master
      # supervisorctl restart robin-server-bg
      
    11. Set the toleration seconds for all NFS server Pods to 60 seconds when the node is in the notready state and set to 0 seconds, when the node is unreachable state.

      for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do     echo "Updating $pod tolerationseconds";     kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null
      

25.1.4. New Features

25.1.4.1. Robin Certificate Management

Starting with Robin CNP v5.6.0, you can manage all certificates for your cluster without manual intervention using the Robin certificate management feature. Robin CNP uses the functionality of cert-manager for this feature. The cert-manager feature is a native Kubernetes certificate management controller. It helps in issuing certificates from various certificate authorities, such as Let’s Encrypt, Entrust, DigiCert, HashiCorp Vault, and Venafi. It can also issue certificates from a local CA (self-signed).

cert-manager adds Certificate and Issuer resources in Kubernetes clusters, which simplifies the process of obtaining, generating, and renewing the certificates for the cluster. For more information, see cert-manager.

The Robin certificate management feature manages certificates only for Robin internal services deployed in the robinio namespace. It also ensures that all certificates are valid and up-to-date. It automatically renews certificates before they expire.

The Robin certificate management feature has the following certificate issuers:

  • cluster-issuer - it is responsible for all certificates used internally by the various control plane services.

  • ident-issuer - it is responsible for the Cluster Identity certificate used by all outward-facing services such as Kubernetes API Server, Robin client, and GUI.

Points to consider for Robin Certificate Management

  • When you install or upgrade to Robin CNP v5.6.0, cert-manager is deployed by default, and a new service named robin-cert-monitor is deployed to monitor the state of all certificates required by various Pods and containers in the Robin CNP cluster, ensuring that all required certificates exist and are valid.

  • During installation or upgrade to Robin CNP v5.6.0, only the cert-manager option is supported. If you want to manage certificates of your cluster using the local control mode, you can use the robin cert reset-cluster-certs to enable local control mode.

  • You can have only one cert-manager instance in a cluster.

  • If your cluster is already installed with a Cluster Identity certificate signed by an external CA, you must reconfigure it using the robin cert reset-cluster-identity command after updating to Robin CNP v5.6.0.

  • If you want to utilize a Cluster Identity certificate signed by an external CA after installing Robin CNP v5.6.0, you can use the robin cert reset-cluster-identity command to configure it.

  • If you want to install Robin CNP v5.6.0 with both (a Cluster Identity certificate signed by an external CA and cert-manager), you must pass the following options in the config.json file for one of the master nodes. For more information, see Installation with Custom Cluster Identity certificate.

    • ident-ca-path

    • ident-cert-path

    • ident-key-path

  • You cannot install your own cert-manager on a Robin CNP cluster. If you want to utilize the functionality of cert-manager, you must use the cert-manager deployed as part of the Robin certificate management feature to create Issuers and Certificates in other namespaces.

For more information, see Robin Certificate Management.

25.1.4.2. Recreate a Faulted Volume for Helm Apps

Robin CNP v5.6.0 enables you to recreate a volume that is in the Faulted status using the same configuration as that of the faulted one. The feature is only supported for volumes used by Helm applications. To support this feature, the following new command is made available:

# robin volume recreate --name <faulted volume name> or --pvc-name <PVC name of a faulted volume> --force

Note

You must use the --force command option along with the command.

When you recreate a new volume in place of a faulted volume, you lose the complete data permanently. For more information, see Recreate a Faulted Volume for Helm Apps.

25.1.4.3. Memory Manager Integration

Robin CNP integrates the Kubernetes Memory Manager plugin starting with Robin CNP v5.6.0.

The Memory Manager plugin allocates guaranteed memory and hugepages for guaranteed QoS Pods at the NUMA level.

The Memory Manager plugin works along with the CPU Manager and Topology Manager. It provides hints to the Topology manager and enables resource allocations. The Memory Manager plugin ensures that the memory requested by a Pod is allocated from a minimum number of Non-Uniform Memory Access (NUMA) nodes.

Note

Robin CNP supports only the Static policy for Memory Manager and supports only Pod as the scope for Topology Manager(topology-manager-scope=Pod).

You can enable this plugin using the "memory-manager-policy":"Static" parameter as part of config.json file during Robin CNP installation or when upgrading to Robin CNP v5.6.0 from a supported version. For more information, see Memory Manager.

25.1.4.4. Integrating Helm Support

Starting with Robin CNP v5.6.0, Robin CNP introduces native support for Helm chart management. The feature allows you to easily deploy, manage, and upgrade applications packaged as Helm charts within the CNP environment. A new CLI (robin helm) is available to support this feature. For more information, see Helm Operations.

25.1.4.5. Istio Integration

Robin CNP supports integration of Istio 1.23. You can install Istio after installing or upgrading to Robin CNP v5.6.0.

Istio is a service mesh that helps in managing the communications between microservices in distributed applications. For more information, see Istio.

After installing the Istio control plane, you must install Ingress and Egress gateways to manage the incoming and outgoing traffic. For more information, see Integrate Istio with Robin CNP.

25.1.4.6. Dual Stack (IPv4 & IPv6) Support

Starting with Robin CNP v5.6.0, Robin CNP supports dual-stack networking on the Calico interface for a cluster, allowing it to accept traffic from both Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6) devices. For more information, see Pv4/IPv6 dual-stack.

Dual-stack Pod networking assigns both IPv4 and IPv6 Calico addresses to Pods. A service can utilize an IPv4 address, an IPv6 address, or both. For more information, see Services. Pod Egress routing works through both IPv4 and IPv6 interfaces.

You can enable the dual-stack networking feature during the Robin CNP installation only, not during the upgrade of an existing Robin CNP cluster. To enable this feature, you must specify the following option in the Config JSON file for one of the master nodes:

  • "ip-protocol":"dualstack"

Note

Hosts must have dual-stack (IPv4 and IPv6) network interfaces.

For more information, see Dual-stack (IPv4 and IPv6) installation.

25.1.4.7. Auto Release Static IP of Terminating Pod

Starting with Robin CNP v5.6.0, Robin CNP supports automatically releasing the static IP address of a Pod that is stuck in the terminating state on a node with the NotReady status. If a Pod with a static IP address is stuck in the terminating state, Kubernetes cannot assign this static IP address to a new Pod because the IP address remains in use by the terminating Pod. The IP address must be released before it can be reassigned to any Pod.

To address this, Robin CNP deploys a system service named robin-kubelet-watcher. This service monitors the health and connectivity of API server with kubelet, CRI, and Docker services on the Notready nodes every 10 seconds. If any of these services are unhealthy for 60 seconds, the robin-kubelet-watcher will terminate all Pods running on that node, releasing their IP addresses.

For more information, see Auto Release Static IP address of Terminating Pod.

25.1.4.8. Secure communication between Kubelet and Kube apiserver

Starting with Robion CNP v5.6.0, Robin CNP supports secure communication between kubelet and kube-apiserver. In a Kubernetes cluster, the kubelet and kube-apiserver communicate with each other securely using TLS certificates. This communication is secured through mutual TLS, meaning both the kubelet and kube-apiserver present their certificates to verify each other’s identity. This ensures that only authorized kubelets connect to the kube-apiserver and communication between them is secure.

By default, kubelet’s server certificate is self-signed meaning it is signed by a temporary Certificate Authority (CA) that is created on the fly and then discarded. To enable secure communication between the kubelet and kube-apiserver, you must configure the kubelet to obtain its server certificate by issuing a Certificate Signing Request (CSR), rather than using a server certificate signed by a self-signed CA. After configuring the kubelet, you must also configure the kube-apiserver to process and approve the CSR.

For more information, see Secure communication between kubelet and kube-apiserver.

25.1.4.9. Large cluster support

Starting with Robin CNP v5.6.0, support for large clusters is available. You can now have a Robin CNP cluster with up to 110 nodes.

25.1.5. Improvements

25.1.5.1. Persistent Prometheus Configuration

Robin CNP v5.6.0 provides an improvement to keep the Prometheus configuration persistent when you stop and start metrics.

With this improvement, when you update any of the following Prometheus-related configuration parameters, they will be persistent across metrics feature stop and start sessions.

  • node_exporter_ds_cpu_limit

  • node_exporter_ds_memory_limit

  • prom_evaluation_interval

  • prom_scrape_interval

  • prom_scrape_timeout

25.1.5.2. New Volume Metrics

Starting with Robin CNP v5.6.0, the robin_vol_psize metric is introduced.

  • robin_vol_psize

It represents the physical (or raw) storage space (in bytes) used by a single replica of the volume. This metric provides further insight into storage consumption.

Example:

# curl -k https://localhost:29446/metrics
robin_vol_rawused{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 134217728
robin_vol_size{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 1073741824
robin_vol_psize{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 67108864

In the above example, the value 67108864 for robin_vol_psize represents the physical (or raw) storage space (in bytes) used by a single replica of the volume.

25.1.5.3. Helm Version Upgrade

Starting with Robin CNP v5.6.0, the Helm version is upgraded from v3.6.3 to v3.16.1.

25.1.5.4. New Node Level Events

Robin CNP v5.6.0 provides the following new events to enhance the system’s ability to monitor and detect node readiness issues at both the Kubernetes and service/component levels:

  • EVENT_NODE_K8S_NOTREADY - This event is generated when a node is marked as down due to an issue with a Kubernetes component. It is a warning alert.

  • EVENT_NODE_K8S_READY - This event is generated when a node is up after being marked as down. It is an info alert.

  • EVENT_NODE_NOTREADY - This event is generated when a node is marked as not ready due to an unhealthy service or component. It is a warning alert.

  • EVENT_NODE_READY - This event is generated when a node is ready after being marked as not ready. It is an info alert.

25.1.5.5. Updated the Default Reclaim Policy for robin-patroni PVs

Starting with Robin CNP v5.6.0, the reclaim policy for robin-patroni PVs is now set to Retain by default.

25.1.5.6. HTTPS support for license proxy server

Starting from Robin CNP v5.6.0, Robin CNP supports Hypertext Transfer Protocol Secure (HTTPS) for the license proxy server to activate and renew Robin CNP cluster’s licenses.

25.1.5.7. VDI access support for Windows VMs

Starting with Robin CNP v5.6.0, you can access Windows-based VMs using the RDP console from the Robin UI.

25.1.5.8. KVM console access for tenant users

Starting with Robin CNP v5.6.0, tenant admins and tenant users can access the KVM application console from the Robin UI.

25.1.5.9. Events for certificates add and remove

Robin CNP generates an event when you add or remove a certificate. The following new Info events are added as part of this release:

  • EVENT_CERT_ADDED - This is generated when a certificate is added.

  • EVENT_CERT_REMOVED - This is generated when a certificate is removed.

25.1.5.10. Archive failed job logs

Starting with Robin CNP v5.6.0, Robin CNP automatically archives failed job logs. A new config parameter failed_job_archive_age is added to archive failed job logs. The default value of this parameter is 3 days, which means failed job logs older than 3 days will be automatically archived.

25.1.5.11. Relaxation in NIC bonding policy

Starting with Robin CNP v5.6.0, Robin CNP considers the NIC bonding interface operational and up when at least one interface from the two interfaces used to create the bond interface is up.

25.1.5.12. Resume upgrade after a failure

The Robin CNP upgrade process is idempotent starting with Robin CNP v5.6.0 and allows you to resume it after a failure.

25.1.5.13. Support to provide static IP when creating an app from backup

When you are creating an app from a backup, you can provide static IPs from an IP pool starting from Robin CNP v5.6.0.

The following new option is added to the existing robin app create from-backup command:

  • --static-ips

Note

You must use the --ip-pools option along with the --static-ips option.

The following is the format for this new option:

  • <ippool1>@<ip1/ip2>

Note

You can only provide multiple IPs from the same IP pool by separating the list of IPs using the “/” symbol.

Example

--static-ips ovs-2@192.0.2.14/192.0.2.15/192.0.2.16

25.1.5.14. MetalLB new install options

Starting with Robin CNP v5.6.0, the following new install options are added for MetalLB:

  • metallb-skip-nodes – Skip nodes from deploying MetalLB speaker Pods.

  • metallb-skip-controlplane – Skip master nodes from deploying MetalLB controller Pods

  • metallb-k8sfrr-mode - Deploy MetalLB using the K8s-FRR mode instead of the default FRR mode.

25.1.5.15. Patroni and Robin Manager Services metrics

Robin CNP v5.6.0 provides support for Patroni metrics and Robin manager service metrics. For more information, see Patroni and service metrics.

25.1.6. Fixed Issues

Reference ID

Description

RSD-8287

Under specific conditions, volumes are unable to recover from a fault, leading them to enter a DEGRADED state. This issue is fixed.

RSD-3885

The robin host remove-vlans command returns an error when attempting to remove VLANs by specifying “ALL” with the --vlans option. This issue is fixed.

RSD-4634

When Robin CNP is running on SuperMicro nodes, the IPMI tool is incorrectly displaying the BMC IPV6 address as follows: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff instead of the actual BMC IPv6 address. This issue is fixed.

RSD-4584

If you have added a range of blacklisted IPs in an unexpanded form, Robin CNP does not allow you to remove a range of blacklisted IPs from the IP Pool. This issue is fixed.

RSD-5771

IPv6 IP pool creation failing with gateway the same as the broadcast address for the IP pool subnet. This issue is fixed.

RSD-8104

The issue of the VolumeCreate job taking longer than expected is fixed.

RSD-7814

The issue of the application creation operation failing with the following error is now fixed.

Failed to mount volume <volume-name>: Node <node-name> has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.`

RSD-7499

There is an issue with storage creation request calculation between Robin CNP and Kubernetes. Due to this mismatched calculation, some of the application Pod are failing to deploy as desired. This issue is fixed.

RSD-9323

When you try to restore an application from a backup that previously had a static IP address, the restore process fails to honor the --ip-pool value provided during deployment. Instead, the restore process attempts to allocate a non-static IP from a historic IP pool, resulting in the following type of error:

Non static IP allocations cannot be done from non-range(network) IP pools -> ‘nc-bss-ov-internal-mgmt-int-v6’.

This issue is fixed.

PP-34457

When the Metrics feature is enabled, the Grafana metrics application is not displaying. This issue is fixed.

PP-38087

In certain cases, the snapshot size allocated to a volume could be less than what is requested. This occurs when the volume is allocated from multiple disks. This issue is fixed.

PP-38397

Robin CNP upgrade failing due to a Docker installation failure. The failure is caused by missing fuse-overlayfs and slirp4netns dependencies required by the updated Docker version. This issue is fixed.

PP-38071

The issue of application creation might fail with the following error is fixed:

Failed to mount volume : Node has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.

25.1.7. Known Issues

Reference ID

Description

PP-35015

Symptom

After renewing the expired Robin license successfully, Robin CNP incorrectly displays the License Violation error when you try to add a new user to the cluster. If you notice this issue, apply the following workaround.

Workaround

You need to restart the robin-server-bg service.

# rbash master
# supervisorctl restart robin-server-bg

PP-21916

Symptom

A Pod IP is not pingable from any other node in the cluster, apart from the node where it is running.

Workaround

Bounce the Calico Pod running on the node where the issue is seen.

PP-30247

Symptom

After upgrading from Robin CNP v5.4.3HF5 to Robin CNP v5.6.0, the RWX apps might report the following error event type:

wrong fs type, bad option, bad superblock on /dev/sdj, missing codepage or helper program, or other error

Workaround

To resolve this issue, contact the Robin Customer Support team.

PP-30398

Symptom

After removing an offline master node from the cluster and power cycling it, the removed master node is automatically added back as a worker node.

Workaround

  1. Run the following command to remove the host:

    # robin host remove
    
  2. Run the following command to remove the node:

    # kubectl delete node
    
  3. Run k8s-script cleanup and host-script cleanup on the to-be-removed node

PP-34226

Symptom

When a PersistentVolumeClaim (PVC) is created, the CSI provisioner initiates a VolumeCreate job. If this job fails, the CSI provisioner calls a new VolumeCreate job again for the same PVC. However, if the PVC is deleted during this process, the CSI provisioner will continue to call the VolumeCreate job because it does not verify the existence of the PVC before calling the VolumeCreate job.

Workaround

Bounce the CSI provisioner Pod.

# kubectl delete pod -n robinio

PP-34414

Symptom

In rare scenarios, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issue:

  • Some app Pods get stuck in the ContainerCreating state after restarting.

Steps to identify the issue:

  1. Check the following type of faulted error in the EVENT_DISK_FAULTED event type in the robin event list command:

    disk /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 on node default:poch06 is faulted

    # robin event list --type EVENT_DISK_FAULTED
    
  2. If you see the disk is faulted error, check the IOMGR logs for dev_open() and Failed to exclusively open error messages on the node where disks are present.

    # cat iomgr.log.0 | grep scsi-SATA_Micron_M500_MTFD_1401096049D5
    | grep "dev_open"
    
  3. If you see the Device or resource busy error message in the log file, use fuser command to confirm whether the device is in use:

    # fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5
    

Workaround

If the device is not in use, restart the IOMGR service on the respective node:

# supervisorctl restart iomgr

PP-34451

Symptom

In rare scenarios, the RWX Pod might be stuck in the ContainerCannotRun state and display the following error in the Pod’s event:

mount.nfs: mount system call failed

Perform the following steps to confirm the issue:

  1. Run the robin volume info command and check for the following details:

    1. Check the status of the volume. It should be in the ONLINE status.

    2. Check whether the respective volume mount path exists.

    3. Check the physical and logical sizes of the volume. If the physical size of the volume is greater than the logical size, then the volume is full.

  2. Run the following command to check whether any of the disks for the volume are running out of space:

    # robin disk info
    
  3. Run the lsblk and blkid commands to check whether the device mount path works fine on the nodes where the volume is mounted.

  4. Run the ls command to check if accessing the respective filesystem mount path gives any input and output errors.

If you notice any input and output errors in step 4, apply the following workaround:

Workaround

  1. Find all the Pods that are using the respective PVC:

    # kubectl get pods --all-namespaces -o=jsonpath='{range .items[]}
    {.metadata.namespace} /{.metadata.name}{"\t"}{.spec.volumes[].
    persistentVolumeClaim.claimName}{"\n"}{end}' | grep <pvc_nmae>
    
  2. Bounce all the Pods identified in step 1:

    # kubectl delete pod -n
    

PP-34492

Symptom

When you run the robin host list command and if you notice a host is in the NotReady and PROBE_PENDING states, follow these workaround steps to diagnose and recover the host:

Workaround

  1. Run the following command to check which host is in the NotReady and PROBE_PENDING states:

    # robin host list
    
  2. Run the following command to check the current (Curr) and desired (Desired) states of the host in the Agent Process (AP) report:

    # robin ap report | grep <hostname>
    
  3. Run the following command to probe the host and recover it:

# robin host probe <hostname> --wait

This command forces a probe of the host and updates its state in the cluster.

  1. Run the following command to verify the host’s state:

    # robin host list
    

    The host should now transition to the Ready state.

In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes:

Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed

Steps to identify the issue:

  1. Check the following error log in the /var/log/robin-install.log file to know why the kubeadm upgrade failed.

    static Pod hash for component kube-scheduler on Node sm-compute04 did not change after 5m0s: timed out waiting for the condition

    Note

    The above error logs may appear for any of K8s control plane components (API server, etcd, scheduler, controller manager).

  2. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the Exited state.

    # docker ps -a | grep schedule
    

Workaround

If you notice the above error, restart the kubelet and rerun the upgrade:

# systemctl restart kubelet

PP-35478

Symptom

In rare scenarios, the kube-scheduler may not function as expected when many Pods are deployed in a cluster due to issues with the kube-scheduler lease.

Workaround

Complete the following workaround steps to resolve issues with the kube-scheduler lease:

  1. Run the following command to identify the node where the kube-scheduler Pod is running with the lease:

    # kubectl get lease -n kube-system
    
  2. Log in to the node identified in the previous step.

  3. Check if the kube-scheduler Pod is running using the following command:

    # docker ps | grep kube-scheduler
    
  4. As the kube-scheduler is a static Pod, move its configuration file to temporarily stop the Pod:

    # mv /etc/kubernetes/manifests/kube-scheduler.yaml /root
    
  5. Run the following command to confirm that the kube-scheduler Pod is deleted. This may take a few minutes.

    # docker ps | grep kube-scheduler
    
  6. Verify that the kube-scheduler lease is transferred to a different Pod:

    # kubectl get lease -n kube-system
    
  7. Copy the static Pod configuration file back to its original location to redeploy the kube-scheduler Pod:

    # mv /root/kube-scheduler.yaml /etc/kubernetes/manifests/
    
  8. Confirm that the kube-scheduler container is running:

    # docker ps | grep kube-scheduler
    

PP-36865

Symptom

After rebooting a node, the node might not come back online after a long time, and the host BMC console displays the following message for RWX PVCs mounted on that node:

Remounting nfs rwx pic timed out, issugin SIGKILL

Workaround Power cycle the host system.

PP-37330

Symptom

During or after upgrading to Robin CNP v5.6.0, the NFSAgentAddExport job might fail with an error message similar to the following:

/bin/mount /dev/sdn /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41 -o discard failed with return code 32: mount: /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error.

Workaround

If you notice this issue, contact the Robin Customer Support team for assistance.

PP-37416

Symptom

In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes:

Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed

Steps to identify the issue:

  1. Check the /var/log/robin-install.log file to know why the kubeadm upgrade failed.

    Example

    [upgrade/staticpods] Moved new manifest to “/etc/kubernetes/manifests/kube-scheduler.yaml” and backed up old manifest to “/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-31-01-03-52/kube-scheduler.yaml” [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) static Pod hash for component kube-scheduler on Node sm-compute04 did not change after 5m0s: timed out waiting for the condition

    Note

    You can get the above error log for any static manifests of api-server, etcd, scheduler, and controller-manager.

  2. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the Exited state.

    # docker ps -a | grep schedule
    

Workaround

If you notice the above error, restart the kubelet:

# systemctl restart kubelet

PP-37965

Symptom

In Robin CNP v5.6.0, when you scale up a Robin Bundle app, it is not considering the existing CPU cores and memory already in use by a vnode. As a result, Robin CNP is not able to find a suitable host, even though there are additional resources available.

Workaround

If you notice this issue, apply the following workaround:

  1. Scale up the resources using the following command.

# robin app computeqos <appname> --role <rolename> --cpus <newcnt>
  --memory <newmem> -- wait
  1. If the scale-up operation fails, stop the app using the following command.

# robin app stop <appname> --wait
  1. Try to scale up the resources again.

# robin app computeqos <appname> --role <rolename> --cpus <newcnt>
-- memory <newmem> --wait

PP-38039

Symptom

During node reboot or power reset scenarios, application volumes may force shutdown due to I/O errors. As a result, application Pods might get stuck in the ContainerCreating state with the following mount failure error:

Context Deadline Exceeded.

On the affected node where the volume is mounted or the application Pod is scheduled, the following error might be observed in the dmesg output:

Log I/O Error Detected. Shutting down filesystem

Workaround

If you notice this issue, contact the Robin Customer Support team for assistance

PP-38044

Symptom

When attempting to detach a repository from a hydrated Helm application, the operation might fail with the following error:

Can’t detach repo as the application is in IMPORTED state, hydrate it in order to detach the repo from it.

This issue occurs even if the application has already been hydrated. The system incorrectly marks the application in the IMPORTED state, preventing the repository from being detached.

Workaround

To detach the repository, manually rehydrate the application and then retry the detach operation:

  1. Run the following command to rehydrate the application.

    # robin app hydrate --wait
    
  2. Once the hydration is complete, detach the repository.

    # robin app detach-repo - -wait –y
    

PP-38078

Symptom

After a network partition, the robin-agent and iomgr-server may not restart automatically, and stale devices may not be cleaned up.This issue occurs because the consulwatch thread responsible for monitoring Consul and triggering restarts may fail to detect the network partition. As a result, stale devices may not be cleaned up, potentially leading to resource contention and other issues.

Workaround

Manually restart the robin-agent and iomgr-server using supervisorctl:

# supervisorctl restart robin-agent iomgr-server

PP-38471

Symptom

When StatefulSet Pods restart, the Pods might get stuck in the ContainerCreating state with the error: CSINode <node_name> does not contain driver robin due to stale NFS mount points and failure of the csi-nodeplugin-robin Pod due to CrashLoopBackOff state.

Workaround

If you notice this issue, restart the csi-nodeplugin Pod.

# kubectl delete pod <csi-nodeplugin> -n robinio

PP-39098

When you create a Robin bundle app with an affinity rule, the bundle app Pod might get stuck in the ContainerCreating and Terminating states in a continuous loop after a node reboot.

If you notice this issue, apply the following workaround.

You need to restart the robin-server-bg service.

# rbash master
# supervisorctl restart robin-server-bg

PP-38924

After you delete multiple Helm applications, one of the Pods might get stuck in the “Error” state, and one or more ReadWriteMany (RWX) volumes might get stuck in the “Terminating” state.

Workaround

On the node where the Pod stuck in the Error state, restart Docker and Kubelet.

PP-38524

When you upgrade your cluster from any supported Robin CNP version to Robin CNP v5.6.0, the upgrade process might get stuck while upgrading Kubernetes and display this error: ERROR: Failed to execute K8S upgrade actions, and Calico Pods might be stuck in the Terminating or ContainerCreating state.

Workaround

Restart the Calico Pods by performing a rolling restart of the calico-node DaemonSet:

# kubectl rollout restart ds -n kube-system calico-node

PP-39200

After upgrading a non-HA (single-node) Robin cluster from a supported version to Robin CNP v5.6.0, application deployments and scaling operations might fail with the following error:

Failed to download file_object, not accessible at this point.

PP-38411

Symptom

After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the robin ip-pool delete command may fail with the following error message:

ERROR - ippoolcr-validating-webhook not found. Please wait for Robin Server Start up to complete.

This issue occurs because the necessary validating webhooks for Robin’s IP Pool Custom Resource Definition (CRD) are not properly created during the upgrade process.

Workaround

To resolve this issue, enable the robin_k8s_extension configuration variable after the upgrade. This will trigger the creation of the missing validating webhooks.

  1. Verify the existence of Robin’s validating webhooks:

    # kubectl get validatingwebhookconfigurations -A
    NAME                   WEBHOOKS   AGE
    cert-manager-webhook   1          11h
    

If the output does not list any webhooks related to robin, proceed to the next step.

  1. Enable the robin_k8s_extension variable:

# robin config update manager robin_k8s_extension True

This will add the register_webhook schedule task, which creates the missing webhooks.

  1. Verify that the register_webhook task has been scheduled:

    # robin schedule list | grep -i webhook
    

PP-39087

Symptom

In a scenario where there are multiple placement constraints with Pod-level anti-affinity for each role and role affinity (co-locate the roles) with explicit tags limiting the placement of Pods and Roles, the application deployment fails.

Workaround

Use tags, maintenance mode, taints, and tolerances to manage placement of Pods.

PP-39188

Symptom

After a Pod using an RWX volume is bounced (deleted and recreated), the new Pod may become stuck in the ContainerCreating state. The PersistentVolumeClaim (PVC) describe command output shows that VolumeFailoverAddNFSExport and VolumeAddNFSExport jobs are stuck in the WAITING state.

Workaround

  1. Identify the Pod in the ContainerCreating state.

    # kubectl get pod -n <namesapce>
    
  2. Identify the stuck job ID.

    # kubectl describe pod -n <namespace> <pod name>
    

From the output, identify the VolumeFailoverAddNFSExport job ID that is holding the lock.

  1. Identify the AGENT_WAIT sub-job.

    # robin job info <VolumeFailoverAddNFSExport_job_ID>
    

From the output, identify the sub-job in # AGENT_WAIT state.

  1. Cancel the stuck sub-job.

    # robin job cancel <AGENT_WAIT_job_ID>
    

After canceling the job, the pod should eventually transition to the Running state.

PP-37652

Symptom

When you deploy a multi-container application using Helm with static IPs assigned from an IP pool, only a subset of the Pods appear on the Robin CNP UI.

Workaround

Run the following CLI command to view all the Pods:

# robin app info <appname> --status

PP-39260

Symptom

Backup operations for applications with sidecar containers are not supported. Contact the Robin Customer Support team for further queries.

PP-39263

Symptom

When you try to create a volume using the robin volume create command with the GiB unit, the volume creation fails with this error message: ERROR - Invalid unit GI.

Workaround

Use the unit G or GI when creating a volume.

PP-39264

Symptom

In the Robin UI, when you have an empty Helm chart, the Helm Charts UI page displays the following error.

Failed to fetch the helm charts

Workaround

You can ignore the error message.

PP-39265

Symptom

When you try to share a Helm app using the Robin UI, the Share button in the UI does not respond.

Workaround

Use the following CLI command to share the Helm app.

# robin app share <name> <user name> --all-tenant-users

25.1.8. Technical Support

Contact Robin Technical support for any assistance.