20. Release Notes

20.1. Robin Cloud Native Storage v5.4.8-280

The Robin Cloud Native Storage (CNS) v5.4.8 Release Notes document provides information about upgrade paths, new features, improvements, fixed issues, and known issues.

Release Date: February 23, 2024

20.1.1. Upgrade paths

The following is the supported upgrade paths for Robin CNS v5.4.8:

  • Robin CNS v5.4.6-102 to Robin CNS v5.4.8-280

  • Robin CNS v5.3.16-682 to Robin CNS v5.4.8-280

  • Robin CNS v5.4.4-182 to Robin CNS v5.4.8-280

Note

  • If you are upgrading from any supported Robin CNS version other than Robin CNS v5.4.6 to Robin CNS v5.4.8, you must stop any snapshot creation and deletion operations. However, in unavoidable situations, you can run the robin volume-snapshot upgrade --wait command post-upgrade.

  • After upgrading to Robin CNS v5.4.8, if you are using robin client outside Robin Pods, you must upgrade to the latest version of the Robin CLI Client.

20.1.2. New Features

20.1.2.1. Backup and Import of volume using Kubernetes Spec

Starting from Robin CNS v5.4.8, you can backup and import any volume declaratively using a Kubernetes specification.

Robin CNS already supports backup and import of volumes using the Robin command line. For more information, see here.

20.1.2.2. Auto Disk Rebalance

Robin CNS v5.4.8 supports the Auto Disk Rebalance feature. The Auto Disk Rebalance feature manages the storage space of all disks in the cluster automatically when the disk reaches a certain watermark threshold.

By default, the Auto Disk Rebalance feature is enabled. However, you can disable it if required using the following command:

# robin config update server disk_used_space_high_watermark 120

When a disk reaches a high watermark, the disk rebalance job automatically starts to move the volumes from one disk to another disk. The Auto Disk Rebalance feature is designed in such a way that it always selects the disk that has the most free space. For more information, see here.

20.1.2.3. Support for Thin and Thick Volume Clone

Starting from Robin CNS v5.4.8, the following two types of volume clones are supported:

  • Thin clone: A Thin clone is a writable point-in-time copy of an existing volume or volume snapshot. A Thin clone has a dependency on the parent volume or snapshot.

  • Thick clone: A Thick clone is a writable point-in-time copy of an existing volume or volume snapshot. A Thick clone has no dependency on the parent volume or snapshot.

A Thick clone is essentially a complete new copy of the parent volume. While data copy (hydration) is in progress, you cannot mount the volume in a Pod, or access the volume clone for reads and writes.

Note

If you have clone volumes that were created in an earlier release (any release prior to Robin CNS v5.4.8), they will be available for use after upgrading to Robin CNS v5.4.8. These legacy clones are marked as Volume type CLONE in the volume list. However, you cannot convert them into a Thick clone. Also, you cannot convert a Thin clone into a Thick clone.

To create a Thick or Thin clone, in your Robin StorageClass, you should provide the parameter clonetype and value thick or thin (clonetype: <thick> <thin>) and use the StorageClass when you create a PVC. If you do not provide the clonetype parameter in the Robin StorageClass, by default, Robin CNS creates a Thin clone. Also, in the PVC YAML file, you must provide the value for the kind key under the DataSource parameter as PersistentVolumeClaim or VolumeSnapshot. For more information, see here.

20.1.3. Improvements

20.1.3.1. Delete Kubernetes VolumeSnapshot and PVC Objects with Dependent Volume Clones

Starting from Robin CNS v5.4.8, Robin CNS allows you to delete the VolumeSnapshot objects even if clones created from it are present.

Also, deletion of PVC objects is now allowed even if VolumeSnapshot objects or clones created from them are present.

20.1.3.2. New Metrics

Robin CNS v5.4.8 provides the following new metrics for the following categories: For more information, see here.

  • Manager services

    • robin_manager_services_robin_server

    • robin_manager_services_consul_server

    • robin_manager_services_robin_event_server

    • robin_manager_services_stormgr_server

    • robin_manager_services_pgsql

    • robin_manager_services_robin_master

  • Agent Services

    • robin_agent_services_robin_agent

    • robin_agent_services_iomgr_service

    • robin_agent_services_monitor_server

    • robin_agent_services_consul_client

  • Node Metrics

    • robin_node_state

    • robin_node_maintenance_mode

  • Disk Metrics

    • robin_disk_state

    • robin_disk_maintenance_mode

  • Volume Metrics

    • robin_vol_storstatus

    • robin_vol_status

    • robin_vol_mount_node_id

    • robin_vol_snapshot_space_used

    • robin_vol_snapshot_space_limit

    • robin_vol_total_snapshot_count

20.1.3.3. Collect Robin Logs to Google Cloud Storage

When collecting logs from a Robin CNS cluster, you now have the option to send a copy of the log files to a GCS bucket. Before you copy, you must register the GCS repo.

# robin repo register <repo_name> gcs://<bucket_name>/<path> <JSON auth file> readwrite --wait

After registering the GCS repo, you can use it when collecting logs from the Robin CNS cluster.

# robin log collect repo --repo_name <GCS repo name>

For more information, see here.

20.1.3.4. Partial Volume Evacuation

Robin CNS v5.4.8 supports moving partial volume manually from one disk to another disk to free up space on a disk and balance disk space utilization. To move partial volume, the following new command option is added to the robin disk evacuate command :

  • --freespace

Use this option to provide the amount of space you need to free up on a volume. You can provide the source disk’s WWN and the destination disk’s WWN. If you do not provide the destination disk’s WWN, Robin CNS will automatically move the data to the available disk.

Example

# robin drive evacuate --volume pvc-8c48ae12-bd29-4a8c-82af-85ff7bb912db 0x6002248015cebadcd96e520879a9582f --freespace 5G

For more information, see here.

20.1.3.5. Support for ReadWriteOncePod (RWOP) Access Mode for PVC creation

Starting from Robin CNS v5.4.8, Robin CNS supports ReadWriteOncePod (RWOP) access mode for PVC creation.

20.1.3.6. Support for Robin Pods to configure with QoS Priority Class

SStarting from Robin CNS v5.4.8, Robin Pods have a higher priority class, and they will get evicted after non-critical Pods in case of resource pressure.

20.1.3.7. Support for Quorum-based Replication

Starting from Robin CNS v5.4.8, Robin supports quorum-based replication. A new option for the protection parameter of StorageClass, named quorum-replication is added. You must use the replication factor 3 for volumes with the quorum-replication protection type.

In the quorum-based replication, Robin makes sure that majority replicas are always up to acknowledge a write IO, which means a write IO is only acknowledged once it is durable for majority replicas. When the number of active replicas are less than the quorum value, the volume will operate in the READ_ONLY (RO) mode, which means only the read IOs are served and the write IO will not be served in the region of the volume that is out of quorum because of the faults in the cluster.

Note

Import of a quorum-based replication volume is supported only if it is imported with the hydration option.

20.1.4. Fixed Issues

Reference ID

Description

PP-32287

When restoring a volume backup, the disk write unit on the source and destination clusters must be the same. If there is a mismatch, you cannot use the restored volume. This issue is fixed.

PP-32324

The issue of importing an RWX volume from a volume backup that fails because of the FailedMount error is fixed.

PP-32328

The issue of a hostname with more than 50 characters is not supported is fixed.

PP-32536

The fast failover option is disabled for Pods that have block RWX PVC.

PP-32379

The issue of the UBB agent image registry path not being considered during Robin CNS installation is fixed.

PP-32724

The issue of one of KubeVirt VMs stuck in the init state and failing to start up with the FailedMount issue when you run Kubevirt VMs at a scale (more than 250) is fixed.

PP-33046

The Robin event logs not having JSON format issue is fixed. With this release, the event logs appear in proper JSON format.

PP-33229

The issue of robin backup list and robin backup info commands not displaying the timestamp is fixed.

PP-33445

The issue of Robin Pods stuck in the ContainerCreating state due to the dependency on this directory: /home/robinds/var/lib/pgsql/patroni, which is created only during the Robin CNS installation time, Post-install, if the directory is missing for some reason, you will notice this issue. This issue is fixed.

With this release, the Robin Patroni Pod’s Init container also checks for the /home/robinds/var/lib/pgsql/patroni directory, and it recreates if the directory is absent.

PP-33630

The issue of volume creation failure due to a disk path mismatch is fixed.

PP-33701

When the Robin Master Pod restarts, if there are duplicate database sessions to Robin databases, it causes the Robin server not to start. This issue is fixed.

PP-34000

In a scenario, when a Robin Master Pod and VM are co-scheduled on the same node and the node goes down due to network down, the VM on the node and Robin Master Pod failover to another node, and after the original node is up, if they failback to the original node, the VM will be stuck in the Scheduling state due to a failed mount because of previous stale mount entries. This issue is fixed.

20.1.5. Known Issues

Reference ID

Description

PP-32364

Symptom

When importing a volume created with storage tolerations, you must use the --create-pvc option to adhere to the storage taints.

20.1.6. Technical Support

Contact Robin Technical support for any assistance.

20.2. Robin Cloud Native Storage v5.4.8-313

The Robin Cloud Native Storage (CNS) v5.4.8-313 Release Notes document provides information about upgrade paths, fixed issues, and known issues.

Release Date: June 21, 2024

20.2.1. Upgrade paths

The following are the supported upgrade paths for Robin CNS v5.4.8-313:

  • Robin CNS v5.3.16-682 to Robin CNS v5.4.8-313

  • Robin CNS v5.4.4-182 to Robin CNS v5.4.8-313

  • Robin CNS v5.4.8-279 to Robin CNS v5.4.8-313

  • Robin CNS v5.4.8-280 to Robin CNS v5.4.8-313

For manual upgrade instructions, see here.

Note

  • If you are upgrading from any supported Robin CNS version to Robin CNS v5.4.8-313, you must stop any snapshot creation and deletion operations. However, you can run the robin volume-snapshot upgrade --wait command post-upgrade in unavoidable situations.

  • After upgrading to Robin CNS v5.4.8-313, if you are using a robin client outside Robin Pods, you must upgrade to the latest version of the Robin CLI Client.

20.2.2. Fixed Issues

Reference ID

Description

RSD-7101

During a network partition scenario, sometimes VM disks fail to mount. This issue is fixed.

RSD-7357

After the robin-master Pod fails over to another node due to the Cilium issue, the robin worker fails to communicate with the robin-master service. In this release, a workaround is added to remove the sessionAffinity parameter from the robin-master services, which seems to be causing the Cilium issue.

RSD-7358

The patroni pre-start script failed to execute due to a permission issue on Google Distributed Cloud Edge. This issue is fixed.

RSD-7359

The collect-event container in the robin-master Pod is crashing due to the dual stack configuration. This issue is fixed.

RSD-7402

In some cases, following network partition recovery, the Robin CNS control plane might remain down due to a read-only database connection.

This happens when the Patroni leader switches during a network partition or recovery, leaving stale conntrack entries for the robin-patroni service pointing to the prior Patroni leader.

The stale conntrack entries persist with the robin-patroni service because, in some cases, CNI fails to flush them whenever the service endpoint changes.

To recover from this situation, Robin CNS has implemented a configurable timeout. When the configured timeout is reached and the services detect a read-only database issue, the Robin services will restart automatically after the timeout to create a new database connection.

The following are the config variables provided by Robin CNS that you need to add to address this issue while installing Robin CNS 5.4.8-313 in the options section of the robin.yaml file.

  • session_read_only_timeout: "60"

  • exit_on_read_only_exception: "1"

Note

  • The default value for session_read_only_timeout is 0. This indicates the services will never restart in this scenario. The recommended value for the session_read_only_timeout variable is “60” seconds, and do not set it to less than 30 seconds. This might result in services going down even in cases when the database goes to the read-only status due to a small network glitch.

  • The default value for exit_on_read_only_exception is 1 (true), which controls whether or not to exit the process when the timeout is reached. If you need to correct this issue anytime post-upgrade, you can do so. For more information, see Correcting the read-only database connection issue.

20.2.3. Known Issues

Reference ID

Description

PP-32497

Symptom

When a cluster reboots or it has a network partition, you might observe one or more Pods might be stuck in the ContainerCreating status as the volume mount fails.

Also, you can notice the following type of error messages in the kubectl describe pod command’s output:

  • MapVolume.WaitForAttach failed for volume “pvc-277d4762-f138-43d9-b97c-69c435fc22df” : volume 1718805466:1 has GET error for volume attachment csi-6a50b191c904dbaae23fa40a9f438bc8d325bd1052b9b7ed05c7be2c70f25082: volumeattachments.storage.k8s.io “csi-6a50b191c904dbaae23fa40a9f438bc8d325bd1052b9b7ed05c7be2c70f25082” is forbidden: User “system:node:qct-09.robinsystems.com” cannot get resource “volumeattachments” in API group “storage.k8s.io” at the cluster scope: no relationship found between node ‘qct-09.robinsystems.com’ and this object

  • FailedMapVolume 17s (x47 over 93m) kubelet MapVolume.WaitForAttach failed for volume “pvc-8d8dcc76-1d8f-4bec-9f8c-dbc81320272f” : volume 1718322641:2 has GET error for volume attachment csi-a1c006c313c56004049b14f8fddf4902f9d73d8006675c3b8117bf3052c48935: volumeattachments.storage.k8s.io “csi-a1c006c313c56004049b14f8fddf4902f9d73d8006675c3b8117bf3052c48935” not found

Workaround

Cordon the node and bounce the required Pods in this state.

  1. To cordon the node where the failed Pod is scheduled, run the following command.

    # kubectl cordon <node_name>
    
  2. Bounce the Pod by running the following command so that the Pod moves to a different node.

    # kubectl delete pod <pod_name> -n <namespace>
    
  3. To uncordon the node which was cordon earlier, run the following command.

    # kubectl uncordon <node_name>
    

PP-34388

Symptom

A Pod might be stuck in the ContainerCreating status due to the following error.

Vblock with volume_id 42 not mounted

If you observe this issue, apply the following workaround.

Workaround

You need to find the VolumeAttachment objects of the volume on which the Pod is mounted and delete them.

  1. Run the following command to find out VolumeAttachment objects of the volume on which the Pod is mounted.

    # kubectl get volumeattachment | grep <pv_name> | grep <node_name>
    
  2. To delete the VolumeAttachment, run the following command.

    # kubectl delete volumeattachment <volumeattachment>
    
  3. Wait for a few minutes or bounce the Pod.

  4. Run the following command to check the status of the Pod that was in the ContainerCreating status.

    # kubectl get pod <pod_name>
    

20.2.4. Technical Support

Contact Robin Technical support for any assistance.