************* Release Notes ************* =================================== Robin Cloud Native Platform v5.6.0 =================================== The Robin Cloud Native Platform (CNP) v5.6.0 release notes has pre- and post-upgrade steps, new features, improvements, fixed issues, and known issues. **Release Date:** June 25, 2025 Infrastructure Versions ======================= The following software applications are included in this CNP release: ==================== ================== Software Application Version ==================== ================== Kubernetes 1.32.4 Docker 25.0.2 Prometheus 2.39.1 Prometheus Adapter 0.10.0 Node Exporter 1.4.0 Calico 3.28.2 HAProxy 2.4.7 PostgreSQL 14.12 Grafana 9.2.3 CRI Tools 1.32.0 ==================== ================== Supported Operating Systems =========================== The following are the supported operating systems and kernel versions for Robin CNP v5.6.0: ================== ======================================= OS Version Kernel Version ================== ======================================= RHEL 8.10 4.18.0-553.el8_10.x86_64 Rocky Linux 8.10 4.18.0-553.el8_10.x86_64 ================== ======================================= Upgrade Paths ============= The following are the supported upgrade paths for Robin CNP v5.6.0: * Robin CNP v5.4.3 HF4 to Robin CNP v5.6.0-128 * Robin CNP v5.4.3 HF4 PP2 to Robin CNP v5.6.0-128 * Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128 * Robin CNP v5.4.3 HF5 PP1 to Robin CNP v5.6.0-128 Pre-upgrade considerations -------------------------- * For a successful upgrade, you must run the ``possible_job_stuck.py`` script before and after the upgrade. Contact the Robin Support team for the upgrade procedure using the script. * When upgrading from supported Robin CNP versions to Robin CNP v5.6.0, if your cluster already has cert-manager installed, you must uninstall it before upgrading to Robin CNP v5.6.0. Post-upgrade considerations --------------------------- * After upgrading to Robin CNP v5.6.0, you must run the ``robin schedule update K8sResSync k8s_resource_sync 60000`` command to update the ``robin schedule K8sResSync``. * After upgrading to Robin CNP v5.6.0, you must run the ``robin-server validate-role-bindings`` command. To run this command, you need to log in to the ``robin-master`` Pod. This command verifies the roles assigned to each user in the cluster and corrects them if necessary. * After upgrading to Robin CNP v5.6.0, the ``k8s_auto_registration config`` parameter is disabled by default. The config setting is deactivated to prevent all Kubernetes apps from automatically registering and consuming resources. The following are the points you must be aware of with this change: - You can register the Kubernetes apps using the ``robin app register`` command manually and use Robin CNP for snapshots, clones, and backup operations of the Kubernetes app. - As this config parameter is disabled, when you run the ``robin app nfs-list`` command, the mappings between Kubernetes apps and NFS server Pods are not listed in the command output. - If you need mapping between Kubernetes app and NFS server Pod when the ``k8s_auto_registration config`` parameter is disabled or the k8s app is not manually registered, get the PVC name from the Pod YAML file ``(kubectl get pod -n -o YAML)`` and run the ``robin nfs export list | grep `` command. - The ``robin nfs export list`` command output displays the PVC name and namespace. Pre-upgrade steps ------------------ * **Upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128** Before upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, perform the following steps: #. Update the value of the ``suicide_threshold`` config parameter to ``1800``: .. code-block:: text # robin config update agent suicide_threshold 1800 #. Disable the ``NFS Server`` Monitor schedule: .. code-block:: text # robin schedule disable "NFS Server" Monitor #. Set the toleration seconds for all NFS server Pods to ``86400`` seconds. After upgrade, you must change the tolerations seconds according to the post-upgrade steps. .. code-block:: text for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds to 86400"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done * **Upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0-128** Before upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0, perform the following steps: #. Update the value of the ``suicide_threshold`` config parameter to ``1800``: .. code-block:: text # robin config update agent suicide_threshold 1800 #. Set the ``NFS Server`` schedule CronJob to at least more than ``6`` months: .. code-block:: text # rbash master # rsql # update schedule set kwargs='{"cron":"1 1 1 1 *"}' where callback='nfs_server_monitor'; # \q # systemctl restart robin-server #. Set the toleration seconds for all NFS server Pods to ``86400`` seconds. After upgrade, you must change the tolerations seconds according to the post-upgrade steps. .. code-block:: text for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds to 86400"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 86400}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 86400}]'; done Post-upgrade steps ------------------- * **After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0-128** After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, perform the following steps: #. Update the value of the ``suicide_threshold`` config parameter to ``40``: .. code-block:: text # robin config update agent suicide_threshold 40 #. Enable the ``NFS Server`` Monitor schedule: .. code-block:: text # robin schedule enable "NFS Server" Monitor #. Set the ``check_helm_apps`` config parameter to ``False``: .. code-block:: text # robin config update cluster check_helm_apps False #. Set the ``chargeback_track_k8s_resusage`` config parameter to ``False``: .. code-block:: text # robin config update server chargeback_track_k8s_resusage False #. Set the ``robin_k8s_extension`` config parameter to ``True``: .. code-block:: text # robin config update manager robin_k8s_extension True #. Verify whether the following mutating webhooks are present: .. code-block:: text # kubectl get mutatingwebhookconfigurations -A | grep robin k8srobin-deployment-mutating-webhook 1 20d k8srobin-ds-mutating-webhook 1 20d k8srobin-pod-mutating-webhook 1 20d k8srobin-sts-mutating-webhook 1 20d robin-deployment-mutating-webhook 1 20d robin-ds-mutating-webhook 1 20d robin-pod-mutating-webhook 1 20d robin-sts-mutating-webhook 1 20d #. If above ``k8srobin-*`` mutating webhooks are not present then bounce the ``robink8s-serverext`` Pods: .. code-block:: text # kubectl delete pod -n robinio -l app=robink8s-serverext #. Verify whether the following validating webhooks are present: .. code-block:: text # kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 45h controllers-validating-webhook 1 31h ippoolcr-validating-webhook 1 31h namespaces-validating-webhook 1 31h pods-validating-webhook 1 31h pvcs-validating-webhook 1 31h #. If ``robin-*`` mutating webhooks displayed in the step 6 output and validating webhooks displayed in the step 8 output are not present on your setup, then restart the ``robin-server-bg`` service: .. code-block:: text # rbash master # supervisorctl restart robin-server-bg #. Set the toleration seconds for all NFS server Pods to ``60`` seconds when the node is in the ``notready`` state and set to ``0`` seconds, when the node is ``unreachable`` state. .. code-block:: text for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null * **After upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0-128** After upgrading from Robin CNP v5.4.3 HF4+PP to Robin CNP v5.6.0, perform the following steps: #. Update the value of the ``suicide_threshold`` config parameter to ``40``: .. code-block:: text # robin config update agent suicide_threshold 40 #. Enable the ``NFS Server`` Monitor schedule: .. code-block:: text # robin schedule enable "NFS Server" Monitor #. Set the ``check_helm_apps`` config parameter to ``False``: .. code-block:: text # robin config update cluster check_helm_apps False #. Set the ``chargeback_track_k8s_resusage`` config parameter to ``False``: .. code-block:: text # robin config update server chargeback_track_k8s_resusage False #. Set the ``robin_k8s_extension`` config parameter to ``True``: .. code-block:: text # robin config update manager robin_k8s_extension True #. Delete the ``NFS Server`` schedule CronJob and restart the ``robin-server`` and ``robin-server-bg`` services: .. code-block:: text # rbash master # rsql # DELETE from schedule where callback='nfs_server_monitor'; # \q # supervisorctl restart robin-server # supervisorctl restart robin-server-bg #. Verify whether the following mutating webhooks are present: .. code-block:: text # kubectl get mutatingwebhookconfigurations -A | grep robin k8srobin-deployment-mutating-webhook 1 20d k8srobin-ds-mutating-webhook 1 20d k8srobin-pod-mutating-webhook 1 20d k8srobin-sts-mutating-webhook 1 20d robin-deployment-mutating-webhook 1 20d robin-ds-mutating-webhook 1 20d robin-pod-mutating-webhook 1 20d robin-sts-mutating-webhook 1 20d #. If above ``k8srobin-*`` mutating webhooks are not present then bounce the ``robink8s-serverext`` Pods: .. code-block:: text # kubectl delete pod -n robinio -l app=robink8s-serverext #. Verify whether the following validating webhooks are present: .. code-block:: text # kubectl get validatingwebhookconfigurations NAME WEBHOOKS AGE cert-manager-webhook 1 45h controllers-validating-webhook 1 31h ippoolcr-validating-webhook 1 31h namespaces-validating-webhook 1 31h pods-validating-webhook 1 31h pvcs-validating-webhook 1 31h #. If ``robin-*`` mutating webhooks displayed in the step 7 output and validating webhooks displayed in the step 9 output are not present on your setup, then restart the ``robin-server-bg`` service: .. code-block:: text # rbash master # supervisorctl restart robin-server-bg #. Set the toleration seconds for all NFS server Pods to ``60`` seconds when the node is in the ``notready`` state and set to ``0`` seconds, when the node is ``unreachable`` state. .. code-block:: text for pod in `kubectl get pod -n robinio -l robin.io/instance=robin-nfs --output=jsonpath={.items..metadata.name}`; do echo "Updating $pod tolerationseconds"; kubectl patch pod $pod -n robinio --type='json' -p='[{"op": "replace", "path": "/spec/tolerations/0/tolerationSeconds", "value": 60}, {"op": "replace", "path": "/spec/tolerations/1/tolerationSeconds", "value": 0}]'; done 2>/dev/null New Features ============= Robin Certificate Management ---------------------------- Starting with Robin CNP v5.6.0, you can manage all certificates for your cluster without manual intervention using the Robin certificate management feature. Robin CNP uses functionality of cert-manager for this feature. cert-manager feature is a native Kubernetes certificate management controller. It helps in issuing certificates from various certificate authorities, such as **Let’s Encrypt, Entrust, DigiCert, HashiCorp Vault, Venafi**. It can also issue certificates from a local CA (self-signed). cert-manager adds Certificate and Issuer resources in Kubernetes clusters, which simplifies the process of obtaining, generating, and renewing the certificates for the cluster. For more information, see `cert-manager `_. Robin certificate management feature manages certificates only for Robin internal services deployed in the ``robinio`` namespace. It also ensures that all certificates are valid and up-to-date. It automatically renews certificates before they expire. Robin certificate management feature has the following certificate issuers: - **cluster-issuer** - it is responsible for all certificates used internally by the various control plane services. - **ident-issuer** - it is responsible for the Cluster Identity certificate used by all outward-facing services such as Kubernetes API Server, Robin client, and GUI. **Points to consider for Robin Certificate Management feature** - When you install or upgrade to Robin CNP v5.6.0, cert-manager is deployed by default, and a new service named ``robin-cert-monitor`` is deployed to monitor the state of all certificates required by various Pods and containers in the Robin CNP cluster, ensuring that all required certificates exist and valid. - During installation or upgrade to Robin CNP v5.6.0, only the cert-manager option is supported. If you want to manage certificates of your cluster using the local control mode, you can use the ``robin cert reset-cluster-certs`` to enable local control mode. - You can have only one cert-manager instance in a cluster. - If your cluster is already installed with a Cluster Identity certificate signed by an external CA, you must reconfigure it using the ``robin cert reset-cluster-identity`` command after updating to Robin CNP v5.6.0. - If you want to utilize a Cluster Identity certificate signed by an external CA after installing Robin CNP v5.6.0, you can use the ``robin cert reset-cluster-identity`` command to configure it. - If you want to install Robin CNP v5.6.0 with both (Cluster Identity certificate signed by an external CA and cert-manager), you must pass the following options in the ``config.json`` file for one of the master nodes. For more information, see `Installation with Custom Cluster Identity certificate `_. - ident-ca-path - ident-cert-path - ident-key-path - You cannot install your own cert-manager on a Robin CNP cluster. If you want to utilize the functionality of cert-manager, then use cert-manager deployed as part of the Robin certificate management feature to create Issuers and Certificates in other namespaces. For more information, see `Robin Certificate Management `_. Recreate a Faulted Volume for Helm Apps --------------------------------------- Robin CNP v5.6.0 enables you to recreate a volume that is in the Faulted status using the same configuration as that of the faulted one. The feature is only supported for volumes used by Helm applications. To support this feature, the following new command is made available: .. code-block:: text # robin volume recreate --name or --pvc-name --force .. Note:: You must use the ``--force`` command option along with the command. When you recreate a new volume in place of a faulted volume, you lose the complete data permanently. For more information, see `Recreate a Faulted Volume for Helm Apps `_. Memory Manager Integration --------------------------- Robin CNP integrates the Kubernetes Memory Manager plugin starting with Robin CNP v5.6.0. The Memory Manager plugin allocates guaranteed memory and hugepages for guaranteed QoS Pods at the NUMA level. The Memory Manager plugin works along with the CPU Manager and Topology Manager. It provides hints to the Topology manager and enables resource allocations. The Memory Manager plugin ensures that the memory requested by a Pod is allocated from a minimum number of Non-Uniform Memory Access (NUMA) nodes. .. Note:: Robin CNP supports only the ``Static`` policy for Memory Manager and supports only ``Pod`` as the scope for Topology Manager(``topology-manager-scope=Pod``). You can enable this plugin using the ``"memory-manager-policy":"Static"`` parameter as part of ``config.json`` file during Robin CNP installation or when upgrading to Robin CNP v5.6.0 from a supported version. For more information, see `Memory Manager `_ Integrating Helm Support ------------------------ Starting with Robin CNP v5.6.0, Robin CNP introduces native support for Helm chart management. The feature allows you to easily deploy, manage, and upgrade applications packaged as Helm charts within the CNP environment. A new CLI (``robin helm``) is available to support this feature. For more information, see `Helm Operations `_. .. Secure LDAP (LDAPS) Support .. ---------------------------- .. Starting with Robin CNP v5.6.0, Lightweight Directory Access Protocol Secure (LDAPS) connections on port ``636`` are supported. It is a secure version of the LDAP protocol that uses TLS or SSL to encrypt communication between clients and servers. This feature allows you to encrypt communication between Robin CNP and your LDAP server, protecting sensitive user credentials and directory information. .. To configure LDAPS, specify port ``636`` when adding your LDAP server and ensure proper certificate validation. For more information, see `Add an LDAP server `_. Enhanced Password Management in Robin CNP ----------------------------------------- Starting with Robin CNP v5.6.0, the password management features in Robin CNP have been enhanced, including configurable password rules, and improved security measures such as password salting and auto-lockout. These updates provide greater security and flexibility for managing user credentials. Istio Integration ----------------- Robin CNP supports integration of Istio 1.23. You can install Istio after installing or upgrading to Robin CNP v5.6.0. Istio is a service mesh that helps in managing the communications between microservices in distributed applications. For more information, see `Istio `_. After installing Istio control plane, you must install Ingress and Egress gateways to manage the incoming and outgoing traffic. For more information, see `Integrate Istio with Robin CNP `_. Dual Stack (IPv4 & IPv6) Support -------------------------------- Starting with Robin CNP v5.6.0, Robin CNP supports dual-stack networking on the Calico interface for a cluster, allowing it to accept traffic from both Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6) devices. For more information, see `Pv4/IPv6 dual-stack `_. Dual-stack Pod networking assigns both IPv4 and IPv6 Calico addresses to Pods. A service can utilize an IPv4 address, an IPv6 address, or both. For more information, see `Services `_. Pod Egress routing works through both IPv4 and IPv6 interfaces. You can enable the dual-stack networking feature during the Robin CNP installation only, not during the upgrade of an existing Robin CNP cluster. To enable this feature, you must specify the following option in the Config JSON file for one of the master nodes: * ``"ip-protocol":"dualstack"`` .. Note:: Hosts must have dual-stack (IPv4 and IPv6) network interfaces. For more information, see `Dual-stack (IPv4 and IPv6) installation `_. Auto Release Static IP of Terminating Pod ------------------------------------------ Starting with Robin CNP v5.6.0, Robin CNP supports automatically releasing the static IP address of a Pod that is stuck in the terminating state on a NotReady node. If a Pod with a static IP address is stuck in the terminating state, Kubernetes cannot assign this static IP address to a new Pod because the IP address remains in use by the terminating Pod. The IP address must be released before it can be reassigned to any Pod. To address this, Robin CNP deploys a system service named ``robin-kubelet-watcher``. This service monitors the health and connectivity of API server with kubelet, CRI, and Docker services on the Notready nodes every 10 seconds. If any of these services are unhealthy for 60 seconds, the ``robin-kubelet-watcher`` will terminate all Pods running on that node, releasing their IP addresses. For more information, see `Auto Release Static IP address of Terminating Pod `_. Secure communication between Kubelet and Kube apiserver ------------------------------------------------------- Starting with Robion CNP v5.6.0, Robin CNP supports secure communication between kubelet and kube-apiserver. In a Kubernetes cluster, the kubelet and kube-apiserver communicate with each other securely using TLS certificates. This communication is secured through mutual TLS, meaning both the kubelet and kube-apiserver present their certificates to verify each other's identity. This ensures that only authorized kubelets connect to the kube-apiserver and communication between them is secure. By default, kubelet’s server certificate is self-signed meaning it is signed by a temporary Certificate Authority (CA) that is created on the fly and then discarded. To enable secure communication between the kubelet and kube-apiserver, you must configure the kubelet to obtain its server certificate by issuing a Certificate Signing Request (CSR), rather than using a server certificate signed by a self-signed CA. After configuring the kubelet, you must also configure the kube-apiserver to process and approve the CSR. For more information, see `Secure communication between kubelet and kube-apiserver `__. Large cluster support ---------------------- Starting with Robin CNP v5.6.0, support for large clusters is available. You can now have a Robin CNP cluster with up to 110 nodes. Improvements ============= Persistent Prometheus Configuration ----------------------------------- Robin CNP v5.6.0 provides an improvement to keep the Prometheus configuration persistent when you stop and start metrics. With this improvement, when you update any of the following Prometheus-related configuration parameters, they will be persistent across metrics feature stop and start sessions. - ``node_exporter_ds_cpu_limit`` - ``node_exporter_ds_memory_limit`` - ``prom_evaluation_interval`` - ``prom_scrape_interval`` - ``prom_scrape_timeout`` New Volume Metrics ------------------ Starting with Robin CNP v5.6.0, the ``robin_vol_psize`` metric is introduced. * **robin_vol_psize** It represents the physical (or raw) storage space (in bytes) used by a single replica of the volume. This metric provides further insight into storage consumption. **Example:** .. code-block:: text # curl -k https://localhost:29446/metrics robin_vol_rawused{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 134217728 robin_vol_size{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 1073741824 robin_vol_psize{name="pvc-89382d8e-66c4-4d42-8d8c-62f7a328c713",volid="2"} 67108864 In the above example, the value ``67108864`` for ``robin_vol_psize`` represents the physical (or raw) storage space (in bytes) used by a single replica of the volume. Helm Version Upgrade -------------------- Starting with Robin CNP v5.6.0, the Helm version is upgraded from v3.6.3 to v3.16.1. New Node Level Events --------------------- Robin CNP v5.6.0 provides the following new events to enhance the system’s ability to monitor and detect node readiness issues at both the Kubernetes and service/component levels: - ``EVENT_NODE_K8S_NOTREADY`` - This event is generated when a node is marked as down due to an issue with a Kubernetes component. It is a warning alert. - ``EVENT_NODE_K8S_READY`` - This event is generated when a node is up after being marked as down. It is an info alert. - ``EVENT_NODE_NOTREADY`` - This event is generated when a node is marked as not ready due to an unhealthy service or component. It is a warning alert. - ``EVENT_NODE_READY`` - This event is generated when a node is ready after being marked as not ready. It is an info alert. Updated the Default Reclaim Policy for ``robin-patroni`` PVs ------------------------------------------------------------- Starting with Robin CNP v5.6.0, the reclaim policy for ``robin-patroni`` PVs is now set to ``Retain`` by default. HTTPS support for license proxy server -------------------------------------- Starting from Robin CNP v5.6.0, Robin CNP supports Hypertext Transfer Protocol Secure (HTTPS) for license proxy server to activate and renew Robin CNP cluster’s licenses. VDI access support for Windows VMs ----------------------------------- Starting with Robin CNP v5.6.0, you can access Windows based VMs using the RDP console from the Robin UI. KVM Console Access for Tenant Users ----------------------------------- Starting with Robin CNP v5.6.0, tenant admins and tenant users can access the KVM application console from the Robin UI. Events for certificates add and remove -------------------------------------- Robin CNP generates an event when you add or remove a certificate. The following new Info events are added as part of this release: - ``EVENT_CERT_ADDED`` - This is generated when an certificate is added. - ``EVENT_CERT_REMOVED`` - This is generated when an certificate is removed. For more information, see Archive failed job logs ----------------------- Starting with Robin CNP v5.6.0, Robin CNP automatically archives failed job logs. A new config parameter ``failed_job_archive_age`` is added to archive failed job logs. The default value of this parameter is ``3 days``, which means failed job logs older than 3 days will be automatically archived. Relaxation in NIC bonding policy --------------------------------- Starting with Robin CNP v5.6.0, Robin CNP considers the NIC bonding interface operational and up when at least one interface from the two interfaces used to create the bond interface is up. Resume upgrade after a failure ------------------------------- The Robin CNP upgrade process is idempotent starting with Robin CNP v5.6.0 and allows you to resume it after a failure. Support to provide static IP when creating an app from backup --------------------------------------------------------------- When you are creating an app from a backup, you can provide static IPs from an IP pool starting from Robin CNP v5.6.0. The following new option is added to the existing ``robin app create from-backup`` command: * ``--static-ips`` .. Note:: You must use the ``--ip-pools`` option along with the ``--static-ips`` option. The following is the format for this new option: * ``@`` .. Note:: You can only provide multiple IPs from the same IP pool by seprating the list of IPs using the “/” symbol.. **Example** .. code-block:: text --static-ips ovs-2@192.0.2.14/192.0.2.15/192.0.2.16 MetalLB new install options ---------------------------- Starting with Robin CNP v5.6.0, the following new install options are added for MetalLB: * metallb-skip-nodes – Skip nodes from deploying MetalLB speaker Pods. * metallb-skip-controlplane – Skip master nodes from deploying MetalLB controller Pods * metallb-k8sfrr-mode - Deploy MetalLB using the K8s-FRR mode instead of the default FRR mode. Patroni and Robin manager Services metrics ------------------------------------------- Robin CNP v5.6.0 provides support for Patroni metrics and Robin manager service metrics. For more information, see `Patroni and service metrics `__. Fixed Issues ============ ============= ================================================================================================================================================================================================================================================================================================================================================================================================================================================================================ Reference ID Description ============= ================================================================================================================================================================================================================================================================================================================================================================================================================================================================================ RSD-8287 Under specific conditions, volumes are unable to recover from a fault, leading them to enter a ``DEGRADED`` state. This issue is fixed. RSD-3885 The ``robin host remove-vlans`` command returns an error when attempting to remove VLANs by specifying "ALL" with the ``--vlans`` option. This issue is fixed. RSD-4634 When Robin CNP is running on SuperMicro nodes, the IPMI tool is incorrectly displaying the BMC IPV6 address as follows: ``ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff`` instead of the actual BMC IPv6 address. This issue is fixed. RSD-4584 If you have added a range of blacklisted IPs in an unexpanded form, Robin CNP does not allow you to remove a range of blacklisted IPs from the IP Pool. This issue is fixed. RSD-5771 IPv6 IP pool creation failing with gateway same as the broadcast address for the IP pool subnet. This issue is fixed. RSD-8104 The issue of the ``VolumeCreate`` job taking longer than expected is fixed. RSD-7814 The issue of application creation operation failing with the following error is now fixed. *Failed to mount volume : Node has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.`* RSD-7499 There is an issue with storage creation request calculation between Robin CNP and Kubernetes. Due to this mismatched calculation, some of the application Pod are failing to deploy as desired. This issue is fixed. RSD-9323 When you try to restore an application from a backup that previously had a static IP address, the restore process fails to honor the ``--ip-pool`` value provided during deployment. Instead, the restore process attempts to allocate a non-static IP from a historic IP pool, resulting in the following type of error: *Non static IP allocations cannot be done from non-range(network) IP pools -> 'nc-bss-ov-internal-mgmt-int-v6'*. This issue is fixed. PP-34457 When the Metrics feature is enabled, the Grafana metrics application is not displaying. This issue is fixed. PP-38087 In certain cases, the snapshot size allocated to a volume could be less than what is requested. This occurs when the volume is allocated from multiple disks. This issue is fixed. PP-38397 Robin CNP upgrade failing due to a Docker installation failure. The failure is caused by missing ``fuse-overlayfs`` and ``slirp4netns`` dependencies required by the updated Docker version. This issue is fixed. ============= ================================================================================================================================================================================================================================================================================================================================================================================================================================================================================ Known Issues ============= ============= ============================================================================================================================================================================================================================================================================================================================================================================================================ Reference ID Description ============= ============================================================================================================================================================================================================================================================================================================================================================================================================ PP-35015 **Symptom** After renewing the expired Robin license successfully, Robin CNP incorrectly displays the ``License Violation`` error when you try to add a new user to the cluster. If you notice this issue, apply the following workaround. **Workaround** You need to restart the robin-server-bg service. .. code-block:: text # rbash master # supervisorctl restart robin-server-bg PP-21916 **Symptom** A Pod IP is not pingable from any other node in the cluster, apart from the node where it is running. **Workaround** Bounce the Calico Pod running on the node where the issue is seen. PP-30247 **Symptom** After upgrading from Robin CNP v5.4.3HF5 to Robin CNP v5.6.0, the RWX apps might report the following error event type: *wrong fs type, bad option, bad superblock on /dev/sdj, missing codepage or helper program, or other error* **Workaround** To resolve this issue, contact the Robin Customer Support team. PP-30398 **Symptom** After removing an offline master node from the cluster and power cycling it, the removed master node is automatically added back as a worker node. **Workaround** 1. Run the following command to remove the host: .. code-block:: text # robin host remove 2. Run the following command to remove the node: .. code-block:: text # kubectl delete node 3. Run ``k8s-script cleanup`` and ``host-script cleanup`` on the to-be-removed node PP-34226 **Symptom** When a PersistentVolumeClaim (PVC) is created, the CSI provisioner initiates a ``VolumeCreate`` job. If this job fails, the CSI provisioner calls a new ``VolumeCreate`` job again for the same PVC. However, if the PVC is deleted during this process, the CSI provisioner will continue to call the ``VolumeCreate`` job because it does not verify the existence of the PVC before calling the ``VolumeCreate`` job. **Workaround** Bounce the CSI provisioner Pod. .. code-block:: text # kubectl delete pod -n robinio PP-34414 **Symptom** In rare scenarios, the IOMGR service might fail to open devices in the exclusive mode when it starts as other processes are using these disks. You might observe the following issue: - Some app Pods get stuck in the ``ContainerCreating`` state after restarting. Steps to identify the issue: #. Check the following type of faulted error in the ``EVENT_DISK_FAULTED`` event type in the ``robin event list`` command: *disk /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 on node default:poch06 is faulted* .. code-block:: text # robin event list --type EVENT_DISK_FAULTED #. If you see the disk is faulted error, check the IOMGR logs for **dev_open()** and **Failed to exclusively open** error messages on the node where disks are present. .. code-block:: text # cat iomgr.log.0 | grep scsi-SATA_Micron_M500_MTFD_1401096049D5 | grep "dev_open" #. If you see the Device or resource busy error message in the log file, use ``fuser`` command to confirm whether the device is in use: .. code-block:: text # fuser /dev/disk/by-id/scsi-SATA_Micron_M500_MTFD_1401096049D5 **Workaround** If the device is not in use, restart the IOMGR service on the respective node: .. code-block:: text # supervisorctl restart iomgr PP-34451 **Symptom** In rare scenarios, the RWX Pod might be stuck in the ``ContainerCannotRun`` state and display the following error in the Pod’s event: *mount.nfs: mount system call failed* Perform the following steps to confirm the issue: 1. Run the ``robin volume info`` command and check for the following details: a. Check the status of the volume. It should be in the ``ONLINE`` status. b. Check whether the respective volume mount path exists. c. Check the physical and logical sizes of the volume. If the physical size of the volume is greater than the logical size, then the volume is full. 2. Run the following command to check whether any of the disks for the volume are running out of space: .. code-block:: text # robin disk info 3. Run the ``lsblk`` and ``blkid`` commands to check whether the device mount path works fine on the nodes where the volume is mounted. 4. Run the ``ls`` command to check if accessing the respective filesystem mount path gives any input and output errors. If you notice any input and output errors in step 4, apply the following workaround: **Workaround** 1. Find all the Pods that are using the respective PVC: .. code-block:: text # kubectl get pods --all-namespaces -o=jsonpath='{range .items[]} {.metadata.namespace} /{.metadata.name}{"\t"}{.spec.volumes[]. persistentVolumeClaim.claimName}{"\n"}{end}' | grep 2. Bounce all the Pods identified in step 1: .. code-block:: text # kubectl delete pod -n PP-34492 **Symptom** When you run the ``robin host list`` command and if you notice a host is in the ``NotReady`` and ``PROBE_PENDING`` states, follow these workaround steps to diagnose and recover the host: **Workaround** 1. Run the following command to check which host is in the ``NotReady`` and ``PROBE_PENDING`` states: .. code-block:: text # robin host list 2. Run the following command to check the current (``Curr``) and desired (``Desired``) states of the host in the Agent Process (AP) report: .. code-block:: text # robin ap report | grep 3. Run the following command to probe the host and recover it: .. code-block:: text # robin host probe --wait This command forces a probe of the host and updates its state in the cluster. 4. Run the following command to verify the host's state: .. code-block:: text # robin host list The host should now transition to the ``Ready`` state. In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes: *Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed* Steps to identify the issue: 1. Check the following error log in the ``/var/log/robin-install.log`` file to know why the kubeadm upgrade failed. *static Pod hash for component kube-scheduler on Node sm-compute04 did not change after 5m0s: timed out waiting for the condition* .. Note:: The above error logs may appear for any of K8s control plane components (API server, etcd, scheduler, controller manager). 2. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the ``Exited`` state. .. code-block:: text # docker ps -a | grep schedule **Workaround** If you notice the above error, restart the kubelet and rerun the upgrade: .. code-block:: text # systemctl restart kubelet PP-35478 **Symptom** In rare scenarios, the kube-scheduler may not function as expected when many Pods are deployed in a cluster due to issues with the kube-scheduler lease. **Workaround** Complete the following workaround steps to resolve issues with the kube-scheduler lease: 1. Run the following command to identify the node where the kube-scheduler Pod is running with the lease: .. code-block:: text # kubectl get lease -n kube-system 2. Log in to the node identified in the previous step. 3. Check if the kube-scheduler Pod is running using the following command: .. code-block:: text # docker ps | grep kube-scheduler 4. As the kube-scheduler is a static Pod, move its configuration file to temporarily stop the Pod: .. code-block:: text # mv /etc/kubernetes/manifests/kube-scheduler.yaml /root 5. Run the following command to confirm that the kube-scheduler Pod is deleted. This may take a few minutes. .. code-block:: text # docker ps | grep kube-scheduler 6. Verify that the kube-scheduler lease is transferred to a different Pod: .. code-block:: text # kubectl get lease -n kube-system 7. Copy the static Pod configuration file back to its original location to redeploy the kube-scheduler Pod: .. code-block:: text # mv /root/kube-scheduler.yaml /etc/kubernetes/manifests/ 8. Confirm that the kube-scheduler container is running: .. code-block:: text # docker ps | grep kube-scheduler PP-36865 **Symptom** After rebooting a node, the node might not come back online after a long time, and the host BMC console displays the following message for RWX PVCs mounted on that node: ``Remounting nfs rwx pic timed out, issugin SIGKILL`` **Workaround** Power cycle the host system. PP-37330 **Symptom** During or after upgrading to Robin CNP v5.6.0, the ``NFSAgentAddExport`` job might fail with an error message similar to the following: */bin/mount /dev/sdn /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41 -o discard failed with return code 32: mount: /var/lib/robin/nfs/robin-nfs-shared-35/ganesha/pvc-822e76f0-9bb8-4629-8aae-8318fb2d3b41: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error.* **Workaround** If you notice this issue, contact the Robin Customer Support team for assistance. PP-37416 **Symptom** In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade might fail with the following error during the Kubernetes upgrade process on other master nodes: *Failed to execute kubeadm upgrade command for K8S upgrade. Please make sure you have the correct version of kubeadm rpm binary installed* Steps to identify the issue: 1. Check the ``/var/log/robin-install.log`` file to know why the kubeadm upgrade failed. **Example** *[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-31-01-03-52/kube-scheduler.yaml" [upgrade/staticpods] Waiting for the kubelet to restart the component [upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s) static Pod hash for component kube-scheduler on Node sm-compute04 did not change after 5m0s: timed out waiting for the condition* .. Note:: You can get the above error log for any static manifests of api-server, etcd, scheduler, and controller-manager. 2. If you notice the above error, run the following command to inspect the Docker containers for the failed component. The containers will likely be in the ``Exited`` state. .. code-block:: text # docker ps -a | grep schedule **Workaround** If you notice the above error, restart the kubelet: .. code-block:: text # systemctl restart kubelet PP-37965 **Symptom** In Robin CNP v5.6.0, when you scale up a Robin Bundle app, it is not considering the existing CPU cores and memory already in use by a vnode. As a result, Robin CNP is not able to find a suitable host, even though there are additional resources available. **Workaround** If you notice this issue, apply the following workaround: 1. Scale up the resources using the following command. .. code-block:: text # robin app computeqos --role --cpus --memory -- wait 2. If the scale-up operation fails, stop the app using the following command. .. code-block:: text # robin app stop --wait 3. Try to scale up the resources again. .. code-block:: text # robin app computeqos --role --cpus -- memory --wait PP-38039 **Symptom** During node reboot or power reset scenarios, application volumes may force shutdown due to I/O errors. As a result, application Pods might get stuck in the ``ContainerCreating`` state with the following mount failure error: *Context Deadline Exceeded.* On the affected node where the volume is mounted or the application Pod is scheduled, the following error might be observed in the ``dmesg`` output: *Log I/O Error Detected. Shutting down filesystem* **Workaround** If you notice this issue, contact the Robin Customer Support team for assistance PP-38044 **Symptom** When attempting to detach a repository from a hydrated Helm application, the operation might fail with the following error: *Can't detach repo as the application is in IMPORTED state, hydrate it in order to detach the repo from it.* This issue occurs even if the application has already been hydrated. The system incorrectly marks the application in the ``IMPORTED`` state, preventing the repository from being detached. **Workaround** To detach the repository, manually rehydrate the application and then retry the detach operation: 1. Run the following command to rehydrate the application. .. code-block:: text # robin app hydrate --wait 2. Once the hydration is complete, detach the repository. .. code-block:: text # robin app detach-repo - -wait –y PP-38061 **Symptom** In rare scenarios, when upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the upgrade may get stuck while executing Robin upgrade actions on the primary master node because some of the hosts are not in the ``Ready`` state. Steps to identify the issue: 1. Check the following error in the ``/var/log/robin-install.log`` file: *Robin Host is not in READY state. Re-trying host status check in 10 seconds* 2. If you get the above error, run the following command to verify the status of hosts: .. code-block:: text # robin host list **Workaround** If any host in the cluster is in the ``Notready`` state, apply the following steps: 1. Log in to the Robin worker Pod running on the host that is in the ``Notready`` state: .. code-block:: text # rbash robin 2. Check the following error in the ``/var/log/robin/robin-worker-bootstrap.log`` file: - *MainThread - robin.utils - INFO - Standard err: Error from server (NotFound): configmaps "robin-upgrade-config-5.4.3-564" not found* - *MainThread - robin.rcm.setup.robin_upgrade - INFO - get_host_upgrade_status: Skip Upgrade status check for host hypervvm-61-49.robinsystems.com. Configmap robin-upgrade-config-5.4.3-564 not configured* 3. If you see the above error, stop the ``robin-bootstrap`` service: .. code-block:: text # supervisorctl stop robin-bootstrap 4. Create the ``booststrap_done`` file manually, if it does not exist: .. code-block:: text # touch /etc/robin/bootstrap_done 5. Start the robin-bootstrap service again: .. code-block:: text # supervisorctl start robin-bootstrap PP-38071 **Symptom** Application creation might fail with the following error: `Failed to mount volume : Node has mount_blocked STORMGR_NODE_BLOCK_MOUNT. No new mounts are allowed.` This issue occurs when a node enters a mount-blocked state (``STORMGR_NODE_BLOCK_MOUNT``), preventing new volume mounts from being processed. **Workaround** Try to create the application after 15 minutes. PP-38078 **Symptom** After a network partition, the robin-agent and iomgr-server may not restart automatically, and stale devices may not be cleaned up.This issue occurs because the consulwatch thread responsible for monitoring Consul and triggering restarts may fail to detect the network partition. As a result, stale devices may not be cleaned up, potentially leading to resource contention and other issues. **Workaround** Manually restart the robin-agent and iomgr-server using ``supervisorctl``: .. code-block:: text # supervisorctl restart robin-agent iomgr-server PP-38471 **Symptom** When StatefulSet Pods restart, the Pods might get stuck in the ``ContainerCreating`` state with the error: *CSINode does not contain driver robin* due to stale NFS mount points and failure of the ``csi-nodeplugin-robin`` Pod due to ``CrashLoopBackOff`` state. **Workaround** If you notice this issue, restart the ``csi-nodeplugin`` Pod. .. code-block:: text # kubectl delete pod -n robinio PP-39098 When you create a Robin bundle app with an affinity rule, the bundle app Pod might get stuck in the ContainerCreating and Terminating states in a continuous loop after a node reboot. If you notice this issue, apply the following workaround. You need to restart the robin-server-bg service. .. code-block:: text # rbash master # supervisorctl restart robin-server-bg PP-38924 After you delete multiple Helm applications, one of the Pods might get stuck in the "``Error``" state, and one or more ReadWriteMany (RWX) volumes might get stuck in the "``Terminating``" state. **Workaround** On the node where the Pod stuck in the Error state, restart Docker and Kubelet. PP-38524 When you upgrade your cluster from any supported Robin CNP version to Robin CNP v5.6.0, the upgrade process might get stuck while upgrading Kubernetes and display this error: ``ERROR: Failed to execute K8S upgrade actions``, and Calico Pods might be stuck in the ``Terminating`` or ``ContainerCreating`` state. **Workaround** Restart the Calico Pods by performing a rolling restart of the calico-node DaemonSet: .. code-block:: text # kubectl rollout restart ds -n kube-system calico-node PP-39200 After upgrading a non-HA (single-node) Robin cluster from a supported version to Robin CNP v5.6.0, application deployments and scaling operations might fail with the following error: *Failed to download file_object, not accessible at this point.* PP-38411 **Symptom** After upgrading from Robin CNP v5.4.3 HF5 to Robin CNP v5.6.0, the ``robin ip-pool delete`` command may fail with the following error message: *ERROR - ippoolcr-validating-webhook not found. Please wait for Robin Server Start up to complete.* This issue occurs because the necessary validating webhooks for Robin's IP Pool Custom Resource Definition (CRD) are not properly created during the upgrade process. **Workaround** To resolve this issue, enable the ``robin_k8s_extension`` configuration variable after the upgrade. This will trigger the creation of the missing validating webhooks. 1. Verify the existence of Robin's validating webhooks: .. code-block:: text # kubectl get validatingwebhookconfigurations -A NAME WEBHOOKS AGE cert-manager-webhook 1 11h If the output does not list any webhooks related to robin, proceed to the next step. 2. Enable the ``robin_k8s_extension`` variable: .. code-block:: text # robin config update manager robin_k8s_extension True This will add the ``register_webhook schedule`` task, which creates the missing webhooks. 3. Verify that the register_webhook task has been scheduled: .. code-block:: text # robin schedule list | grep -i webhook PP-39087 **Symptom** In a scenario where there are multiple placement constraints with Pod-level anti-affinity for each role and role affinity (co-locate the roles) with explicit tags limiting the placement of Pods and Roles, the application deployment fails. **Workaround** Use tags, maintenance mode, taints, and tolerances to manage placement of Pods. PP-39188 **Symptom** After a Pod using an RWX volume is bounced (deleted and recreated), the new Pod may become stuck in the ``ContainerCreating`` state. The PersistentVolumeClaim (PVC) describe command output shows that ``VolumeFailoverAddNFSExport`` and ``VolumeAddNFSExport`` jobs are stuck in the ``WAITING`` state. **Workaround** 1. Identify the Pod in the ContainerCreating state. .. code-block:: text # kubectl get pod -n 2. Identify the stuck job ID. .. code-block:: text # kubectl describe pod -n From the output, identify the VolumeFailoverAddNFSExport job ID that is holding the lock. 3. Identify the AGENT_WAIT sub-job. .. code-block:: text # robin job info From the output, identify the sub-job in # AGENT_WAIT state. 4. Cancel the stuck sub-job. .. code-block:: text # robin job cancel After canceling the job, the pod should eventually transition to the Running state. PP-37652 **Symptom** When you deploy a multi-container application using Helm with static IPs assigned from an IP pool, only a subset of the Pods appear on the Robin CNP UI. **Workaround** Run the following CLI command to view all the Pods: .. code-block:: text # robin app info --status PP-39260 **Symptom** Backup operations for applications with sidecar containers are not supported. Contact the Robin Customer Support team for further queries. PP-39263 **Symptom** When you try to create a volume using the ``robin volume create`` command with the ``GiB`` unit, the volume creation fails with this error message: ``ERROR - Invalid unit GI``. **Workaround** Use the unit ``G`` or ``GI`` when creating a volume. PP-39264 **Symptom** In the Robin UI, when you have an empty Helm chart, the **Helm Charts** UI page displays the following error. *Failed to fetch the helm charts* **Workaround** You can ignore the error message. PP-39265 **Symptom** When you try to share a Helm app using the Robin UI, the Share button in the UI does not respond. **Workaround** Use the following CLI command to share the Helm app. .. code-block:: text # robin app share --all-tenant-users ============= ============================================================================================================================================================================================================================================================================================================================================================================================================ Technical Support ================= Contact `Robin Technical support `_ for any assistance.