25. Release Notes¶

25.1. Robin Cloud Native Platform v5.4.1¶

The Robin Cloud Native Platform (CNP) v5.4.1 release has new features, improvements, fixed issues, and known issues.

Release Date: 12 October 2022

25.1.1. Infrastructure Versions¶

The following software applications are included in this CNP release.

Software Application	Version
Kubernetes	1.23.8
Docker	19.03.9
Prometheus	2.38.0
Prometheus Adapter	0.10.0
Node Exporter	1.3.1
Calico	3.21.5
HAProxy	2.4.7
PostgreSQL	14.5
Grafana	9.1.3

25.1.2. Upgrade Path¶

The following are the supported upgrade paths for Robin CNP v5.4.1:

Robin v5.3.11-217 HF2 to Robin v5.4.1 GA
Robin v5.3.11-450 HF4 to Robin v5.4.1 GA
Robin v5.3.13-92 GA to Robin v5.4.1 GA

25.1.3. New Features¶

25.1.3.1. Volume Snapshots¶

Robin CNP v5.4.1 provides the Volume Snapshot feature to create crash-consistent snapshots containing one or more Robin volumes (Regular or PDVs).

You can back up volume snapshots to any supported external storage repository in the cloud.

The volume snapshots feature enables you to create a new volume using volume snapshots.

Note

You can use the volume snapshot and volume backup features only for volumes that are not part of Robin bundle applications.

25.1.3.2. Support for Volume Backup¶

A volume backup is a volume snapshot of a single or multiple volumes that you can push to an external cloud storage repository. Robin CNP v5.4.1 enables you to push the existing volume snapshots to the registered external cloud storage repository, or it creates snapshots when you create a new volume backup.

Note

You can use the volume snapshot and volume backup features only for volumes that are not part of Robin bundle applications.

25.1.3.3. Support for Rocky Linux 8.6¶

Robin supports the Rocky Linux Kernel version 4.18.0-372.9.1.el8.x86_64 and lower.

25.1.3.4. Blacklist an individual IP, a set of IPs, or a range of IPs addresses¶

Robin CNP v5.4.1 allows you to blacklist an individual IP address, a set of IP addresses, or a range of IP addresses in an IP-Pool when you do not want to use these IP addresses. Robin CNP does not assign the blacklisted IP address to any application. You can blacklist both IPv4 and IPv6 addresses.

You can blacklist the IP address during creating an IP-Pool and updating an existing IP-Pool.

25.1.3.5. Reserve a single IP, set of IPs, or a range of IP addresses¶

Robin CNP v5.4.1 allows you to reserve an individual IP address, a set of IP addresses, or a range of IP addresses in an IP-Pool. You can reverse the IP address to assign it for a specific purpose. Robin CNP does not assign the reserved IP address to any application. If you want to assign the reserved IP address, you need to assign it as a static IP address. You can reserve both IPv4 and IPv6 addresses.

You can reserve the IP address during creating an IP-Pool and updating an existing IP-Pool.

25.1.3.6. Support for Static MAC Address¶

Robin CNP v5.4.1 allows you to assign static MAC addresses for Robin Bundle applications. When Pod restarts, its IP address and MAC address will change. When you assign a static MAC address to a Pod, the static MAC address retains after the Pod restarts.

When creating an application, you can assign the static MAC address to the Pod using the static_macs parameter in the IP-Pool.

Note

Robin does not support the static MAC address for KVM-based applications.

25.1.3.7. Custom CA certificate and key¶

Robin CNP v5.4.1 allows you to use a custom CA certificate and key when installing a Robin CNP cluster. You can get the custom CA certificate and key from an external trusted CA or a dedicated internal public key infrastructure service. After getting the custom CA certificate and its key, make sure that the custom CA certificate and its key must be configured as an intermediate CA certificate. The intermediate CA certificate is the signing certificate that signs the other certificates generated by the cluster.

You need to specify the following key pair values in the config.json file for one of the master nodes:

ca-cert-path
ca-key-path

25.1.3.8. Kubernetes audit logs¶

Robin CNP v5.4.1 supports the Kubernetes audit logs feature. The audit logs are a set of records with a chronological list of all requests made to the Kubernetes API server.

You can find the Kubernetes audit logs at /var/log/Kubernetes/audit/audit.log on any master node of your cluster.

Note

By default, the Kubernetes audit logs feature is enabled.

Robin CNP logs the following operations at the metadata level audit policy:

Create request
Patch request
Update request
Delete request

25.1.3.9. Support for Best-Effort Quality of Service (QoS) in isolated CPU setups¶

Robin CNP v5.4.1 supports the Best-Effort Quality of Service (QoS) for isolated CPU setups. In isolated CPU setups, the non-application Pods or the control plane Pods use some isolated CPU cores along with the non-isolated CPU cores.

You must enable the Best-Effort QoS feature to stop the non-application or control plane Pods from using the isolated CPU cores.

Once you enable this feature, the CPU request for these Pods is automatically set to zero.

To enable the Best-Effort QoS feature, you need to specify the following key pair value in the config.json file for one of the master nodes:

“best-effort-qos": "True"

Note

You can enable this feature during the Robin CNP installation only.

25.1.3.10. License Expiry Verification¶

Starting from Robin CNP v5.4.1, Robin CNP verifies the license expiry date and generates events and alerts. Robin CNP generates a warning alert seven days before the license expiry date and a license expired event when the license expires, and other license expiry-related events.

You can set the interval for verifying the license status using the robin schedule update command.

The license expiry verification feature is by default set to True.

Robin CNP provides the following events and alerts:

EVENT_LICENSE_NOT_ACTIVATED - This event is generated when the license is not activated after installing Robin CNP.
EVENT_LICENSE_EXPIRATION_WARNING - This alert is generated when the license is going to expire.
EVENT_LICENSE_EXPIRED - This event is generated after the license is expired.
EVENT_LICENSE_EXPIRING_TODAY - This alert is generated if the license expiry date is the same current date.

25.1.3.11. Support for Cisco DCNM E1000 Virtual Interface¶

Robin CNP v5.4.1 supports Cisco Data Center Network Manager (DCNM) E1000 virtual network interface for KVMs. You can deploy the Cisco DCNM application on the Robin CNP cluster.

Note

The Cisco DCNM E1000 Virtual Interface is supported only on KVMs with OVS IP Pool. You can configure the interface only using input.yaml file.

25.1.3.12. Robin Asynchronous Disaster Recovery (Tech Preview)¶

Starting from Robin CNP v5.4.1, Robin.io provides the snapshot-based Asynchronous Disaster Recovery (DR) feature.

The feature enables you to replicate your Kubernetes-based stateful applications along with its constructs (PVC, StatefulSet, config maps, secrets, services, etc.) onto a remote secondary peer cluster (site), and you can manually failover to it in the event of a disaster or maintenance activities. You can enable encryption when transmitting data over the wire to a peer cluster.

The Robin Asynchronous Disaster Recovery feature allows you to bring your applications online faster by failing over to the secondary cluster (site) in the event of a disaster with a minimum application downtime and failback later.

25.1.4. Improvements¶

25.1.4.1. Improved Robin CNP Install and Upgrade¶

Starting with Robin CNP v5.4.1, only the GoRobin utility tool is available for installing and upgrading Robin CNP v5.4.1. The GoRobin uses the new scripts that are provided as part of CNP v5.4.1.

25.1.4.2. Auto-renewing Robin CNP License¶

Starting from Robin CNP v5.4.1, you can auto-renew the Robin CNP license by activating the license proxy for your Robin CNP clusters. A license proxy can be linked to multiple Robin CNP clusters. You need to set up and activate the license proxy by yourself. Once you activate the license proxy, the license is automatically renewed as per the renewal period mentioned in the license proxy.

25.1.4.3. Install Robin CNP on nodes running on different operating systems¶

Starting from Robin CNP v5.4.1, Robin supports the installation of Robin CNP on the nodes running on different operating systems supported by Robin.

For example, If you want to install Robin CNP v5.4.1 on a three-node cluster, you can have all three nodes with different operating systems, such as one node with CentOS 7, the second node with RHEL 8, and the third node with Rocky Linux 8.6.

25.1.4.4. UI Support for adding SSH key in Robin Bundle applications¶

Starting from Robin CNP v5.4.1, a user can also add or delete SSH keys for the Robin Bundle applications. Robin supports only Rivest Shamir Adleman (RSA) SSH key pairs, not Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA) SSH key pairs.

Adding the SSH key for an application is available in the UI. Now, you can add an SSH key for an application during creating the application using UI.

25.1.4.5. Chargeback support for non-Robin Bundle applications¶

Starting from Robin CNP v5.4.1, Robin CNP supports the chargeback utility for non-Robin Bundle applications also. You need to start the metrics to enable the chargeback feature for non-Robin Bundle applications.

Prometheus and the chargeback_track_k8s_resusage config variable are automatically enabled when you start metrics.

The chargeback utility tracks the usage and cost of the non-Robin Bundle application’s resources such as CPU, GPU, MIG, Memory, Storage space (HDD, SSD), HugePages, SR-IOV, FPGA devices, and PCI devices.

Note

When you stop an application, you will still be billed for the storage bound to it.

25.1.4.6. Provide custom name for Robin default Calico IP-Pool¶

Starting from Robin CNP v5.4.1, you can add the custom name for the Robin default Calico IP-Pool robin-default. To add the custom name for the Robin default Calico IP-Pool, you must specify the name in the following key pair in the config.json file for one of the master nodes:

robin-default-ippool-name

If you do not specify the custom name, Robin CNP uses the robin-default name for the Robin default Calico IP-Pool.

Note

You can use this option during the Robin CNP installation only.

25.1.4.7. Support for recreating existing MIG partitions automatically¶

Starting from Robin CNP v5.4.1, you can automatically recreate the existing MIG partitions when a node is rebooted. You need to enable the create_mig_partitions_on_reboot attribute by setting it to True in the robin config list before rebooting the node.

Note

By default, the create_mig_partitions_on_reboot attribute is disabled.

25.1.4.8. UI support to manage network policies¶

Starting from Robin CNP v5.4.1, you can manage the network policies for your cluster through UI.

25.1.4.9. View `rpool` name in the `robin drive list` command¶

Starting from Robin CNP v5.4.1, an option to show the rpool name along with the host is added to the output of the robin drive list command.

25.1.4.10. Support to `add-routes` and `remove-routes` to an existing IP-Pool¶

Starting from Robin CNP v5.4.1, support for add-routes and remove-routes to an existing IP-Pool is added.

25.1.4.11. View PersistentVolumeClaim (PVC) information for a volume¶

Starting from Robin CNP v5.4.1, the following two options are added as a part of robin volume info command to show a PVC information for a volume:

--pvc-name
--namespace

25.1.4.12. Add SSH key pair in a separate file for passwordless login to KVM- based VMs¶

Starting from Robin CNP v5.4.1, you can log in to KVM-based VMs without a password. For passwordless login to KVM-based VMs, you need to add the SSH key pair in a separate YAML file and add the location of the SSH key pair in the manifest file of the KVM-based VMs.

Now, create KVM-based VMs using the YAML file where you added the SSH key pair. After creating KVM-based VMs, you can log in to the KVM-based VMs without a password.

25.1.4.13. Support for NVIDIA HGX hardware for Robin CNP¶

Starting from Robin CNP v5.4.1, You can also deploy Robin CNP on specialized hardware such as NVIDIA HGX and DGX servers.

25.1.4.14. Collection ID in Robin Bundle Info¶

The robin bundle info command displays the Collection ID. It enables you to know to which File Collection the bundle belongs.

25.1.4.15. Support for jinja variables in Bundle manifest file for PDV section¶

Starting with Robin CNP v5.4.1, the Robin Bundle manifest file supports jinja variables in the PDV section. This enables the mount path to enable PDV to be set in the manifest file and these mount paths are then auto-populated in the UI. Prior to this feature support, the PDV mount paths were fixed.

The following are supported jinja variables:

namespace
user
tenant
resourcepool

Note

Appropriately named PDVs must exist in the namespace.

Example:

Update the bundle manifest file by adding a pdvs section at the same indentation level as storage and compute like so:

pdvs:
   - name: "{{namespace}}-data"
     mount_path: "/data/{{namespace}}-data"
   - name: "{{namespace}}-data-2"
     mount_path: "/data/{{namespace}}-data-2"

This results in the following auto-filled PDV section in the GUI:

Note the jinja variable substitution where {{namespace}} has been substituted with the current namespace, mainly t001-u000004.

Also, note the auto-filled mount paths as specified by the mount_path variable.

25.1.4.16. Disable Init Containers and Sidecars in Bundle App¶

Robin CNP v5.4.1 supports disabling the Init Containers and Sidecars in the Robin Bundle apps using the input.yaml file when deploying the Bundle apps.

The following is the sample Robin Bundle file:

 name: dpdk-intel
 version: v1
 icon: icon.png
 snapshot: enabled
 clone: enabled
 roles:
 - pktgen
 pktgen:
 name: pktgen
 norootfs: true
 image:
    name: robinsys/dpdk-intel
    version: v1
    engine: docker
    imagePullPolicy: IfNotPresent
    entrypoint: entry.sh
 compute:
    memory: 1G
    cpu:
       reserve: true
       cores: 2
 initContainers:
    - name: init1
       image: 'robinsys/dpdk-intel:v1'
       imagePullPolicy: IfNotPresent
       resources:
       limits:
          cpu: 25m
          memory: 128Mi
       command:
       - sleep
       - '5'
 sidecars:
    - name: side1
       image: 'robinsys/dpdk-intel:v1'
       imagePullPolicy: IfNotPresent
       command:
       - /bin/bash
       - '-c'
       - trap 'exit 0' SIGTERM; while true; do sleep 1; done
       resources:
       limits:
          memory: 200Mi
          cpu: '1'
    - name: side2
       image: 'robinsys/dpdk-intel:v1'
       imagePullPolicy: IfNotPresent
       command:
       - /bin/bash
       - '-c'
       - trap 'exit 0' SIGTERM; while true; do sleep 1; done
       resources:
       limits:
          memory: 200Mi
          cpu: '1'

Input Yaml file for disabling Init Containers and Sidecars

In the earlier Robin Bundle sample file, we have side1 and side2 sidecars and Init container init1.

Using the following sample Input.yaml file you can disable the Init Containers and sidecars. From the above sample Bundle Yaml file example, we are disabling side1 sidecar and Init container init1.

The following is the sample input.yaml file for disabling Init Containers and sidecars.

roles:
- name: pktgen
  containers:
    - name: side2
      disabled: false
    - name: side1
      disabled: true
    - name: init1
      disabled: true

You can use the input.yaml file when creating an app using the Robin Bundle.

Syntax

Run the following command when creating an app using the Robin Bundle.

# robin app create from-bundle <appname> <bundleid> <yamlfile> --rpool <rpool> --wait

25.1.4.17. Robin StorageClass with runAsAny parameter¶

Robin CNP v5.4.1 provides a new parameter runAsAny in the StorageClass object to enable any user other than the root user to read or write to an NFS mount point of an RWX volume.

You can use this parameter in a scenario with multiple containers and different users, and you want to allow any user accessing the Pod (containers) to read or write to an NFS mountpoint of an RWX volume.

In the StorageClass object file, set the runAsAny parameter to True.

The following is an example of the StorageClass with runAsAny parameter:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: run-as-any-imm
  labels:
    app.kubernetes.io/instance: robin
    app.kubernetes.io/managed-by: robin.io
    app.kubernetes.io/name: robin
provisioner: robin
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
parameters:
  replication: '2'
  media: HDD
  runAsAny: "true"

25.1.4.18. Sherlock Volume Health report with New Details¶

Starting with Robin CNP v5.4.1, the Sherlock volume health report displays the following details as part of the report output:

Potential IO stalls on mounts
NFS Exports
NFS Server Pods

25.1.4.19. NIC Tags in Robin Bundle Application¶

Robin CNP supports providing NIC tags for SR-IOV IP-Pools as part of the Robin Bundle application. You can provide NIC tags as part of the Bundle application even if NIC tags are not part of an IP-Pool.

You can use the NIC tags to request specific physical NIC in a Robin Bundle application’s manifest file. Also, you can provide the required NIC tag details using the input.yaml file.

Provide the NIC tags in the key: value format. You can use only the name as the key in the key: value format. The value must be the physical NIC name on your node.

25.1.4.20. Persistent Storage for vDU and vCUs¶

Robin CNP v5.4.1 supports the persistent storage for virtualized Distributed Units (vDUs) and virtualized Central Units (vCUs).

The feature enables persistent data written inside the app when you perform VM app restart, stop and start.

25.1.4.21. Set Limits on non-Bundle Applications for Tenant and Users¶

Starting with Robin CNP v5.4.1, you can set limits on non-Bundle applications for tenant and tenant users at Resource pools and application levels.

25.1.4.22. Support for PCI Device Resource¶

Robin CNP v5.4.1 enables you to create PCI Device resources using the PCI details.

A PCI device resource in Robin CNP comprises PCI FPGA device ID, vendor ID, device type, and device driver.

You can use the provided PCI resource name as a resource in the Robin Bundle manifest file and Helm charts.

When you use a PCI device resource name in the Robin Bundle manifest file or Helm chart, you do not need to add PCI device details (device ID, vendor ID, device type, driver, etc.).

Note

For Helm Chart, you must provide the PCI Device resource name in the form annotation.

You can also create and manage PCI Device resource name from the Robin CNP UI.

25.1.4.23. Support for ephemeral containers¶

Robin CNP v5.4.1 supports ephemeral containers. You can add ephemeral containers to a Pod at runtime. Use this feature for debugging distroless containers or any container that does not have utilities needed for debugging.

25.1.4.24. Deprecated events¶

The following events are deprecated in Robin CNP v5.4.1.

EVENT_MGT_MASTER_FAILOVER
EVENT_MGT_MANAGER_UNREACHABLE
EVENT_COLLECTION_ERROR
EVENT_COLLECTION_OFFLINE
EVENT_COLLECTION_OFFLINE_FAILED
EVENT_COLLECTION_ONLINE
EVENT_COLLECTION_ONLINE_FAILED

25.1.5. Fixed Issues¶

Reference ID	Description
PP-22098	Robin does not allow exploding IP addresses for IPv6 IP-Pools except for the last octets. This issue is fixed now.
PP-22516	When installing Kubernetes master nodes, node taints may not apply successfully. This issue is fixed now.
PP-24313	The issue of Robin Bundle getting added to a log collection in place of a File collection is fixed.
PP-24725	The `drive evacuate` command fails for replicated volume. This issue is fixed now.
PP-24787	The issue of duplicate entries in the `nfs_service_endpoint` table, which causes a job storm that eventually impacts other app deployments and creates performance issues also, is fixed now.
PP-25070	Vulnerability CVE-2021-41103 is related to containerd runtime. The container root directories and some plugins had insufficiently restricted permissions. It allows unprivileged Linux users to traverse directory contents and execute programs. For more information about this vulnerability, see CVE-2021-41103.
PP-26389	Sherlock commands are not working for Robin CNP v5.3.5-213. This issue is fixed now.
PP-27304	The 503 error message appears due to the timeout of the HAProxy. To fix this issue, you need to increase the timeout values of HAProxy using the `robin config update` command to 60 seconds for the `connect_timeout` attribute.
PP-27464	Nessus scans discovered a vulnerability of enabled debugging functions like HTTP TRACE and TRACK. This issue is fixed.
PP-28267	IP-Pool creation for the OVS driver fails with the following error: “ValidatingWebhookConfiguration” for ippool “ippoolcr-validating-webhook” was not created. This issue is fixed now.
PP-28408	Robin CNP failed to allocate a static IP address for a Pod in a custom controller due to a stale IP static address residing in the database when the Pod gets deleted. This issue is fixed.
PP-28559	When an application is deployed using a custom interface name, the interface name is not appearing inside the Pod. This issue is fixed.

25.1.6. Known Issues¶

Reference ID	Description
PP-21469	Symptom Change in isolcpu does not reflect in the host after rediscover. Workaround Update `/etc/sysconfig/kubelet` and update the `reserved-cpus` parameter to include CPU IDs. If your new `reserverd-cpus` is a subset of the existing one, just restart kubelet. If new `reserved-cpus` is not a subset of existing cpuset. Drain the K8s nodes or reboot the K8s node (The aim is to get rid of all Pods which are using CPUs from new `reserved-cpus`, you can also delete those specific Pods but this is for an advanced user). Once all Pods are drained, restart kubelet (If you are rebooting, this is not needed). Uncordon K8s node.
PP-21910	Symptom Volume APIs get stuck in the RCM server as some mounts get stuck on the worker nodes. Workaround Contact Robin CS team for workaround steps.
PP-21916	Symptom A Pod IP is not pingable from any other node in the cluster, apart from the node where it is running. Workaround Delete the Calico Pod running on the node where the issue is seen.
PP-21935	Symptom Pods are stuck in the `ContainerCreating` state with the following error: kubernetes.io/csi: mounter.SetUpAt failed to check for STAGE_UNSTAGE_VOLUME capability` Workaround Perform the following steps: Flush connection entries: # conntrack -F Delete nodeplugin Pod. Note If the nodeplugin Pod has become unusable, future filesystem mounts will fail, this is a symptom of the many retries of NFS mount calls that hang. Bouncing the Pod will clear out the hung processes.
PP-22626	Symptom If NVIDIA GPU drivers are already installed on your setup, the Robin GPU operator deployment fails during the Robin CNP install or upgrade process. Workaround Run the following steps to fix this issue: # yum remove nvidia-driver-latest-dkms # yum remove nvidia-container-toolkit Reboot the node.
PP-22643	Symptom In some scenarios, a Pod might be stuck in the `TopologyAffinityError` resulting in VM creation failure. Workaround Delete the application and redeploy it after a few minutes.
PP-22781	Symptom After removing a taint on a master node, GPUs are not detected automatically. Workaround You need to run the `robin host probe --rediscover --all --wait` command for the GPUs to be detected on the primary master node.
PP-22853	Symptom Robin CNP may not detect GPUs in the following scenarios: After Robin CNP installation After upgrading Robin CNP After adding a new node Workaround Run the `robin host probe <hostname> --rediscover` command.
PP-22881	Symptom When you try to stop a Windows-based VM, it fails to stop with the following error: Failed to terminate process 58434 with SIGKILL: Device or resource busy Workaround Reboot the respective node and delete the app.
PP-24248	Symptom When you create a new resource pool and assign it to nodes and later try to deploy a Pod with storage affinity on the node with a newly assigned resource pool, the Pod deployment fails as the node is not taking the correct resource pool. Workaround Complete the following steps to fix this issue: Run the following command to edit the node: # kubectl edit node <node_name> Remove the `robin.io/robinrpool` resource pool. Add the correct resource pool name.
PP-24736	Symptom A PVC may not come online after removing an app from the secondary Protection Group on the peer cluster. Workaround After you remove the application from the Protection Group and allow the application to start, remove the `block_mount` label from the PVCs of the application.
PP-25246	Symptom When you try to delete a KVM application, the deletion process might be stuck as the Virsh commands on the node may not respond. Workaround Reboot the node.
PP-25360	Symptom If containers in a Pod are using an RWX PVC and if they are stuck in the `ContainerCreating` state for a long time and display a timeout error, apply the following workaround. Workaround Delete Pods if they are part of a Deployment or StatefulSet.
PP-25677	Symptom A Pod gets stuck in the `terminating state` as the node on which this Pod was running is permanently unavailable. The static IP address and static MAC address of the terminating Pod are not released until the Pod is permanently deleted. Workaround Permanently delete a Pod by restoring the node or by running the `kubectl delete pod` command.
PP-26345	Symptom When you deploy a Pod to use an SR-IOV VF from Ethernet Virtual Function 700 Series 154c, sometimes the Pod gets stuck in the `ContainerCreating` state with the device busy error message. Workaround Delete the Pod that shows the device busy error message.
PP-26523	Symptom Robin Bundle application with PDV or AEV is not supported for disaster recovery.
PP-26572	Symptom Due to inaccuracies in tracking the Pod creation, tenants and user limits are not explicitly honored for Helm applications.
PP-26581	Symptom After deleting the PCI resources, the existing Pods that are using the PCI resources are stuck in the `ContainerCreating` state during the instance relocation. Workaround Perform the following steps: Recreate the PCI resources. Delete the respective Pod.
PP-26693	Symptom When running heavy storage workloads on Robin CNP v5.4.1, the Robin DaemonSet Pod on one of the worker nodes is down with the following error: Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1: unknown Workaround To recover, reboot the physical server or contact the Robin CS team to recover without rebooting the server.
PP-26768	Symptom You should not use an IP-Pool associated with dpdk drivers as the default network.
PP-26830	Symptom After deleting the PVCs, Robin CNP cluster is down. Workaround Delete the Calico Pod.
PP-26942	Symptom When upgrading your cluster to Robin CNP v5.4.1, there is a DB migration step. During this step, the database briefly goes to read-only mode. The storage manager service might fail if it tries to write to the database while it is in read-only mode. But the storage manager service will restart when the database resumes the read-write mode. You do not need to take any action.
PP-27076	Symptom In Robin CNP, Kubelet might go down due to the stale `cpu_manager_state` file. Workaround Complete the following steps to fix this issue: Remove the stale `/var/lib/kubelet/cpu_manager_state` file using the following command: # rm -rf /var/lib/kubelet/cpu_manager_state Restart the Kubelet by running the following command: # systemctl restart kubelet Make sure etcd and apiserver Pods on this node are up and running.
PP-27077	Symptom When deleting the RWX applications, RWX Pods are stuck in the `Terminating` state. Workaround Perform the following steps for deleting the RWX Pods: Run the following command to find the NFS server Pod associated with the PVC: # robin nfs export-list Delete the NFS server Pod used for the respective PVC.
PP-27138	Symptom Patroni Replicas cannot recover and synchronize with the Leader due to missing WAL files or WAL receiver not running. Workaround Contact Robin support team for workaround steps.
PP-27193	Symptom When upgrading from Robin CNP v5.3.11-HF2 to Robin CNP v5.4.1, RWX Pods may get stuck in the `ContainerCreating` state as the volume is unmounted and Kubernetes is not aware of it. If you notice this issue, apply the following workaround steps: Workaround Check what PVC/volume Pod is using. Check the volume is not mounted by running the `robin volume info <volume_name>` command. Delete the respective RWX Pod or reboot the respective node.
PP-27253	Symptom One node in a HA cluster is in the `NotReady` state because the worker Pod on the respective node is down with the following error message: Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1: unknown Workaround Contact Robin support team for workaround steps.
PP-27276	Symptom After upgrading to Robin CNP v5.4.1, some Robin Bundle apps might be `OFFLINE` due to `PLAN_FAILED`. Workaround Manually restart the Robin Bundle apps one by one.
PP-27283	Symptom In rare scenarios, when you reboot the active master node, two Patroni Pods might have the same role as Replica. Workaround Bounce the Calico Pod running on the node where the issue is seen.
PP-27296	Symptom When installing Robin CNP on Rocky Linux host systems, if `runc` package is installed, you must uninstall it before installing Robin CNP.
PP-27530	Symptom Post upgrade to Robin CNP v5.4.1, RWX Pods get stuck in the `ContainerCreating` state as jobs fail with the error volume is not accessible, vol state `FAULTED` but the `robin volume list` command shows that the volume is in READY state. Workaround Run the following command: # robin host probe --all --wait
PP-27620	Symptom Sync with secondary peer cluster fails due to multiple snapshots restore failures. Workaround Restart the iomgr-server on the affected node. Log in to the robinds Pod on the affected node (`rbash robin`) Run the command `systemctl restart iomgr-server` Check the state of the connections using the `rdvm conn list` command.
PP-27678	Symptom When the node where the volume for file collection is mounted is turned off and you want to delete file collection with a single replica, the file collection delete job will fail putting the file server Pod in the `terminating` state. Workaround Run the following command to delete the file server Pod forcefully stuck in the `terminating` state: # kubectl delete <pod_name> -n <robin_ns> --force
PP-27775	Symptom When upgrading from Robin CNP v5.3.11-HF2 to Robin CNP v5.4.1, one of the hosts is stuck in the `Notready` state. Workaround You need to delete the worker Pod running on the node that is in the `Notready` status. Perform the following steps to delete the worker Pod: Run the following command to know the status of worker Pods: # kubectl get pod -n robinio -o wide \|grep worker Run the following command to delete the stuck worker Pod: # kubectl delete pod -n robinio <pod_name> Reboot the respective node.
PP-27826	Symptom When you reboot all nodes of a cluster together, RWX Pods are stuck in the `CrashLoopBackOff` state. Workaround Delete the respective Pods.
PP-27937	Symptom You might see the following similar type of error in a DR setup when snapshots are being deleted: Snapshot default:pvc-75b3a817-6a16-4b5a-a76a-7490f717e590:t001-u000005-rpol-4-1658759728-1658742924 has too many valid descendants This issue is due to unreplicated snapshots on the primary cluster. After failover, a cluster can have unreplicated snapshots on original primary cluster and the new primary sends new snapshots to new secondary cluster. The unreplicated snapshots and new snapshots from new primary might have common data points that results to this error. Workaround Delete the unreplicated snapshots on the original primary.
PP-28077	Symptom When you try to uninstall Robin CNP without deleting the apps and objects from the cluster, it might get stuck at unmounting `/var/lib/kubelet`. Workaround Power cycle the hosts and rerun the uninstall.
PP-28125	Symptom After upgrading from the existing Robin CNP to Robin CNP v5.4.1, RWX PVC Pods are stuck in the `ContainerCreating` state. Workaround Perform the following steps to generate a new FS UUID: Run the following command to know the Pods that are stuck in `ContainerCreating` state: # kubectl get pods -A \| grep -v containercreating Run the following commands to get the RWX volume used by these Pods: # kubectl describe pods <pod_name> # kubectl get pvc -A \| grep <claim_name> Run the following command to see the respective job output for the RWX volume with `NFSAgentAddExport`: # robin job list \| grep <rwx_volume_name> \| grep NFSAgentAddExport Example: # robin job list \| grep pvc-b0f33e4d-6d1c-4d17-9ddf-0a67b9f1af51 \| grep NFSAgentAddExport ->5840 \| NFSAgentAddExport \| Adding export for vol pvc-b0f33e4d- 6d1c-4d17-9ddf-0a67b9f1af51\| COMPLETED \| FAILED \| 09 Aug 06:39:45 \| 06:40:01 \| 0:00:16 \| [] \| 1 \| Command '/bin/mount/dev/sdo /var/lib/robin/nfs/robin-nfs-shared-60/ganesha/pvc-b0f33e4d-6d1c- 4d17-9ddf-0a67b9f1af51' failed with return code 32: mount: wrong fs type, bad option, bad superblock on /dev/sdo, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg \| tail or so. If you see the above error, run the following command to know the device and host on which this volume is mounted: # robin volume list \| grep <rwx_volume_name> Run the following command on the host to know the error: # cat /var/log/messages \| grep <device_name> Example: # cat /var/log/messages \| grep sdo Aug 9 06:40:00 asa-06 kernel: XFS (sdo): Filesystem has duplicate UUID 47762fc2-1e7c-4863-a551-0fe55b29d0c7 - can't mount Run the following command to generate a new FS UUID for the respective device to be mounted: # xfs_admin -U generate <path to device>
PP-28365	Symptom Nodes are flapping between `Ready` and `NotReady` states because the etcd goes out of sync. Workaround Contact Robin CS team for workaround steps.
PP-28458	Symptom If you increase volume size on the primary Protection Group for an application, the change in volume size is not replicated to the secondary Protection group. Workaround Remove the application from the Primary Protection Group. Increase the volume size. (optional) Delete the application on the secondary Protection Group. Add the application back to the Primary Protection Group.
PP-28460	Symptom The disaster recovery (DR) initial sync might fail in a certain rare scenarios. If the initial sync fails, apply the following workaround. Workaround Remove the application from the Protection Group and add it back.
PP-28461	Symptom When you increase the snapshot space limit on the Primary Protection Group, the same is not replicated to the secondary Protection Group. Workaround If you need to increase space for snapshots on the secondary protection group, apply the following workaround: Run the following command on the secondary cluster to update the snapshots space limit: # robin app snapshot-space-limit
PP-28494	Symptom During a non-HA upgrade, the File-server Pod may get stuck in the `ContainerCreating` state as the volume is unmounted and Kubernetes is not aware of it. If you notice this issue, apply the following workaround steps. Workaround Check what PVC/volume file-server Pod is using. Check the volume is not mounted by running the `robin volume info <volume name>` command. Run the following command to cordon the node where the filer server Pod is mounted. # kubectl cordon <node_name> Run the following command to delete the file server Pod. # kubectl delete pod -n robinio <file_server_pod_name> Run the following command to uncordon the node you have cordoned in step 3. # kubectl uncordon <node_name>
PP-28501	Symptom After upgrading from the existing Robin CNP to Robin CNP v5.4.1 with RWX applications, the NFS server related jobs are stuck. Workaround Perform the following steps: Run the following command to log in to Robin master Pod: # rbash master Run the following command to know the Mount State of the storage nodes: # stormgr node list Run the following command to unlock the `stormgr` CLI: # stormgr devl unlock Run the following command to unblock the blocked storage node: # stormgr node setstatus --block-mount 0 <blocked_storage_node>
PP-28642	Symptom When you add or remove a route from an IP-Pool, it does not reflect inside the KVM app.
PP-28672	Symptom When removing a node from the Robin CNP cluster using the `k8s-script-el8.sh cleanup` command, the console displays the following incorrect message: Kubernetes cluster has Robin CNP installed. Please clean up Robin CNP to continue or use –force. You can ignore the message and do not need to remove Robin CNP from the cluster.
PP-28721	Symptom In the following scenario, the secondary Protection Group is not showing the correct replication state. Replication is paused on the primary protection Group and the pause request is updated on the secondary Protection Group. Meanwhile, the secondary peer cluster went down and later replication is resumed on the primary. However, the state on the secondary is not updated after it is up. Workaround To correct the replication state on the secondary, on the primary, pause the replication again and resume.
PP-28764	Symptom For Robin Bundles, after you delete a route from an IP Pool, the deleted route might still appear inside the Pod after restarting it. Workaround You need to delete the app and add the app again.
PP-28768	Symptom After upgrading Robin CNP v5.4.1, you might notice that the cordoned node is uncordoned. Workaround You should put the cordoned nodes in maintenance mode before upgrading. Or, you need to corden the node again after upgrading to Robin CNP v5.4.1.
PP-28802	Symptom Robin Control Plane is failing to auto-recover in the following conditions: Root FS is full Out-of-memory condition High CPU situations Operating system Kernel crash Workaround Apply the following workaround steps to recover from this situation. Clean up the disk to free up the space. You need a minimum of 50GB of disk space. Reboot the node.
PP-28809	Symptom Post worker node failure, RWX Pods stuck in the `ContainerCreating` state as `VolumeFailoverNFSExport` job fails with the following error: Unable to unmount volume <pvc-name>: vol never mounted at zone/node default/<hostname> Workaround Contact Robin CS team for workaround steps.
PP-28867	Symptom The `robin chargeback report` is not displaying the correct SSD drive price in the report. The report is showing 0.0 as the price.
PP-28912	Symptom Support to install or upgrade Robin CNP v5.4.1 as a non-root user is not available.
PP-28938	Symptom When deleting multiple PDVs using the Robin CNP UI, the checkbox for selecting all PDVs (next to Name field) does not work. Workaround You must select the corresponding checkbox for each of the PDVs that you want to delete and click Remove.
PP-28945	Symptom The parameters to provide a custom CA certificate are not currently supported with the GoRobin utility, even though they are available in the list of parameters. The following parameters are not supported with GoRobin: `--ca-cert-path` `--ca-key-path` workaround You can pass the custom CA certificate parameters as part of the `config.json` file.
PP-28946	Symptom Robin CNP v5.4.1 does not support HashiCorp Vault integration.
PP-28966	Symptom If a Pod deployment fails and you notice the following error message in the Pod events: “Error: Vblock with volume_id <> not mounted” Apply the following workaround. Workaround Delete the VolumeAttachment associated to the Pod object. Delete the Pod.

25.1.7. Technical Support¶

Contact Robin Technical support for any assistance.

25. Release Notes¶

25.1. Robin Cloud Native Platform v5.4.1¶

25.1.1. Infrastructure Versions¶

25.1.2. Upgrade Path¶

25.1.3. New Features¶

25.1.3.1. Volume Snapshots¶

25.1.3.2. Support for Volume Backup¶

25.1.3.3. Support for Rocky Linux 8.6¶

25.1.3.4. Blacklist an individual IP, a set of IPs, or a range of IPs addresses¶

25.1.3.5. Reserve a single IP, set of IPs, or a range of IP addresses¶

25.1.3.6. Support for Static MAC Address¶

25.1.3.7. Custom CA certificate and key¶

25.1.3.8. Kubernetes audit logs¶

25.1.3.9. Support for Best-Effort Quality of Service (QoS) in isolated CPU setups¶

25.1.3.10. License Expiry Verification¶

25.1.3.11. Support for Cisco DCNM E1000 Virtual Interface¶

25.1.3.12. Robin Asynchronous Disaster Recovery (Tech Preview)¶

25.1.4. Improvements¶

25.1.4.1. Improved Robin CNP Install and Upgrade¶

25.1.4.2. Auto-renewing Robin CNP License¶

25.1.4.3. Install Robin CNP on nodes running on different operating systems¶

25.1.4.4. UI Support for adding SSH key in Robin Bundle applications¶

25.1.4.5. Chargeback support for non-Robin Bundle applications¶

25.1.4.6. Provide custom name for Robin default Calico IP-Pool¶

25.1.4.7. Support for recreating existing MIG partitions automatically¶

25.1.4.8. UI support to manage network policies¶

25.1.4.9. View rpool name in the robin drive list command¶

25.1.4.10. Support to add-routes and remove-routes to an existing IP-Pool¶

25.1.4.11. View PersistentVolumeClaim (PVC) information for a volume¶

25.1.4.12. Add SSH key pair in a separate file for passwordless login to KVM- based VMs¶

25.1.4.13. Support for NVIDIA HGX hardware for Robin CNP¶

25.1.4.14. Collection ID in Robin Bundle Info¶

25.1.4.15. Support for jinja variables in Bundle manifest file for PDV section¶

25.1.4.16. Disable Init Containers and Sidecars in Bundle App¶

25.1.4.17. Robin StorageClass with runAsAny parameter¶

25.1.4.18. Sherlock Volume Health report with New Details¶

25.1.4.19. NIC Tags in Robin Bundle Application¶

25.1.4.20. Persistent Storage for vDU and vCUs¶

25.1.4.21. Set Limits on non-Bundle Applications for Tenant and Users¶

25.1.4.22. Support for PCI Device Resource¶

25.1.4.23. Support for ephemeral containers¶

25.1.4.24. Deprecated events¶

25.1.5. Fixed Issues¶

25.1.6. Known Issues¶

25.1.7. Technical Support¶

25.1.4.9. View `rpool` name in the `robin drive list` command¶

25.1.4.10. Support to `add-routes` and `remove-routes` to an existing IP-Pool¶