6. Managing Storage

Robin discovers disks attached to the nodes and uses them for providing storage to applications. This applies to local disks, cloud volumes, and SAN stoage that are available to nodes which have the storage role assigned to them. During the process Robin collects all the required metadata about the disks and identifies them as HDD, SSD or NVMes. All the elligible disks (this excludes disks with parititions and/or without WWNs) are marked as Robin storage.

Cloud volumes like EBS volumes on AWS and Persistent disks on GCP are not physically tied to a cloud compute instance. They can be detached from one cloud compute instance and attached to another instance at any time. Logical Units from a Storage Array Network can similarly be visible from multiple physical servers in the same storage fabric or network. Robin assigns a primary host for these devices and ensures that all IO access to a device only happens through one server at a given time. As with local disks, these devices are automatically discovered by Robin and registered. However they are considered to be re-attachable given their properties. Robin makes sure these re-attachable disks are always accessible even in the event of node failure. When a node goes down, Robin detaches the disk from failed node and attaches it to a healthy node so that application data remains accessible.

Note

Whilst installing if it is known that certain disks will not have a WWN, the --set-uuid option can be provided via the installer alongside a list of disks on which there is no WWN. This in turn will result in Robin stamping a UUID on the disk and thus enabling it to be used as Robin storage.

Robin allows multiple operations, detailed below, to be performed on registered disks.

robin disk create

Provision and attach a cloud disk

robin disk attach

Attach an existing disk

robin disk detach

Detach a registered disk

robin disk list

List disks

robin disk info

Display detailed information about a disk

robin disk evacuate

Evacuate allocations off a disk

robin disk remove

Remove a disk

robin disk update

Update the attributes of a disk

robin disk unfault

Unfault a disk

robin volume snapshot-space-limit

Change snapshot space limit

6.1. Provisioning a disk

Robin provides a utility through which it can provision disks of any size, attach them to hosts and discover them automatically on multiple cloud platforms. This enables users to expand the storage available for their cluster with convenience and ease. Detailed below are the general options for the robin drive create command, followed by specific examples for each supported cloud environment.

# robin disk create <hostname>
                    --type <type>
                    --number <number>
                    --size <size>
                    --iops <iops>

hostname

Name of host to attach disk to after creation

--type <type>

Type of disks. Choices for GCP include: pd-ssd, pd-standard. Choices for AWS include: gp2, io1, st1. Choices for Anthos include: independent-persistent. Choices for IBM include general-purpose (3iops-tier), 5iops-tier, 10iops-tier, custom.

--number <number>

Number of disks to be created. The default value is 1

--size <size>

Size of disk to be created in GB. The default value is 500 GB

--iops <iops>

IOPs for AWS ‘io1’ disk type, for IBM ‘custom’ disk type. Can be between 100-160000

Provisions disks of any size, attaches and discovers them automatically on cloud based nodes.

End Point: /api/v3/robin_server/disks/

Method: POST

URL Parameters: None

Data Parameters:

  • hostname: <hostname> - This mandatory field within the payload specifies the host to which the provisioned disk should be attached.

  • type: <disk_type> - This mandatory field within the payload specifies the type of disk to be created. Supported types include: pd-ssd, pd-standard (for GCP); gp2, io1, st1 (for AWS); independent-persistent (for Anthos); general-purpose, 5iops-tier, 10iops-tier, custom (for IBM).

  • number: <num_of_disks> - This mandatory field within the payload specifies the number of disks to be created. It should be an integer value.

  • size: <size_of_disk> - This mandatory field within the payload specifies the size of the disks to be created. It should be a string. An example value is ‘200GB’.

  • iops: <iops_of_disk> - Utilizing this parameter results in disks that can handle a maximum number of IOPs equal to <iops_of_disks> being created. Note this parameter is valid for the ‘io1’ disk type for AWS hosts, ‘custom’ disk type for IBM hosts.

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 196
}

On Google Cloud Platform, you can attach disks to your instance via the UI or Google APIs and have them available for use by Robin by running the below command:

$ robin host probe <hostname> --rediscover

On the other hand you can utilize Robin to provision disks in GCP to use for application deployment. To create 100 GB disk in GCP, run following command:

$ robin disk create <hostname> --type <pd-standard | pd-ssd> --size 100

These disks will be attached automatically and auto discovered by Robin so they will be ready to use straightaway.

Note

Due to Robin’s advanced feature to make sure disks are always accessible, it needs the manage disks permission to be selected while deploying cluster on GCP.

On the Google Anthos platform, you can add disks to cluster VMs from vSphere and have them available for use by Robin by running the below command:

$ robin host probe <hostname> --rediscover

On the other hand you can utilize Robin to provision virtual disks to use for application deployment. To create 100 GB disk for Anthos, run following command:

$ robin disk create <hostname> --type independent-persistent --size 100

These disks will be attached automatically and auto discovered by Robin so they will be ready to use straightaway.

Note

Due to Robin’s advanced feature to make sure disks are always accessible, it needs credentials, provided via Kubernetes secret, to have all cluster and disk level API privileges.

On AWS, you can attach disks to your EC2 instance via the UI or AWS CLI/APIs and have them available for use by Robin by running the below command:

$ robin host probe <hostname> --rediscover

On the other hand you can utilize Robin to provision disks in AWS to use for application deployment. To create 100 GB disk in AWS, run following command:

$ robin disk create <hostname> --type <gp2 | io1 | st1> --size 100

These disks will be attached automatically to the EC2 instances and auto discovered by Robin so they will be ready to use straightaway.

Note

Due to Robin’s advanced feature to make sure disks are always accessible, IAM Profiles associated with the host (or permissions granted to a user) must contain all Volume write and list actions.

On IBM Cloud Platform, you can create and attach disks to your instance via the UI or IBM Cloud APIs and have them available for use by Robin by running the below command:

$ robin host probe <hostname> --rediscover

Alternatively you can utilize Robin utility to provision and attach disks in IBM Cloud to use for application deployment. To create 100 GB disk in IBM Cloud, run following command:

$ robin disk create <hostname> --type <general-purpose | 5iops-tier | 10iops-tier | custom > --size 100

These disks will be attached automatically and auto discovered by Robin so they will be ready to use straightaway.

6.2. Attaching a disk

If a disk was previously detached from a host due to the host undergoing decommissioning or to rebalance the storage available on the cluster one will need to attach the disk back to another host in the cluster. Issue the following command to do so:

Note

This command is only for supported for re-attachable disks. Moreover if a host became unreachable Robin will automatically detach the disk and choose a new server to reattach it to based on the accessibility of the disk and the load on the other servers.

# robin disk attach <wwn>
                    --hostname <hostname>
                    --force

wwn

WWN of disk to attach

--hostname <hostname>

Name of host to attach disk to. Note this is a mandatory parameter.

--force

Forcibly re-attach a disk that is already ONLINE

Example:

# robin disk detach 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait
Job:  187 Name: DiskAttach     State: VALIDATED       Error: 0
Job:  187 Name: DiskAttach     State: WAITING         Error: 0
Job:  187 Name: DiskAttach     State: FINALIZED       Error: 0
Job:  187 Name: DiskAttach     State: COMPLETED       Error: 0

Attaches a disk, which might have previously been detached due to the host undergoing decommissioning or a rebalancing of storage disks, to a particular host.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: PUT

URL Parameters: None

Data Parameters:

  • action: attach - This mandatory field within the payload specifies that the attach operation is to be performed.

  • hostname: <hostname> - Utilizing this parameter results in the disk being attached to the specified host.

  • force: true - Utilizing this parameter enables one to force re-attach an ONLINE disk.

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 128
}

6.3. Detaching a disk

If a physical host is temporarily or permanently taken down and one can detach its storage disks to ensure future access. Issue the following command to do so:

Note

This command is only for supported for re-attachable disks.

# robin disk detach <wwn>
                    --hostname <hostname>
                    --force

wwn

WWN of disk to detach

--hostname <hostname>

Name of host to which disk is attached to. Note this is an optional parameter

--force

Forcibly detach a disk that is currently ONLINE

Example:

# robin disk detach 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait
Job:  187 Name: DiskDetach     State: VALIDATED       Error: 0
Job:  187 Name: DiskDetach     State: WAITING         Error: 0
Job:  187 Name: DiskDetach     State: FINALIZED       Error: 0
Job:  187 Name: DiskDetach     State: COMPLETED       Error: 0

Detaches a disk from a host.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: PUT

URL Parameters: None

Data Parameters:

  • action: detach - This mandatory field within the payload specifies that the detach operation is to be performed.

  • hostname: <hostname> - Utilizing this parameter results in the disk being detached from the specified host. This is optional as the host to which the disk is attached can be discovered implicitly.

  • force: true - Utilizing this parameter enables one to force detach an ONLINE disk.

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 130
}

6.4. Listing all disks

In order to view all disks currently present on the cluster and some additional details such as the number of volumes allocated from each, their size, media type etc, issue the following command:

# robin disk list --host <hostname>
                  --role <role>
                  --media <media>
                  --reattachable
                  --eligible
                  --tags
                  --json

--host <hostname>

Filter list by hostname

--role <role>

Filter list by role. Valid choices include: all, storage, rootdisk and reserved

--media <media>

Filter list by media type. Valid choices include: HDD and SSD

--reattachable

Filter list to only display reattachable disks

--eligible

Filter list to only display disks that have free capacity and have not reached the maximum volume count

--tags

Display tags for each disk

--json

Output in JSON

Example:

# robin disk list
ID | WWN                                       | Host    | Path /dev/disk/by-id                          | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role    | Status | LastOpr
---+-------------------------------------------+---------+-----------------------------------------------+----------+---------+------+--------------+------+---------+--------+---------
1  | 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b | vnode36 | scsi-0QEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b | 50       | N       | HDD  | 38/38 (100%) | 0/10 | Storage | ONLINE | READY
2  | 0xQEMU_QEMU_HARDDISK_f12b1f33-8b71-4a8c-a | vnode36 | scsi-0QEMU_QEMU_HARDDISK_f12b1f33-8b71-4a8c-a | 50       | N       | HDD  | 28/38 (74%)  | 1/10 | Storage | ONLINE | READY
3  | 0xQEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | vnode88 | scsi-0QEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | 100      | N       | HDD  | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
4  | 0xQEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | vnode88 | scsi-0QEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | 100      | N       | HDD  | 63/77 (82%)  | 2/10 | Storage | ONLINE | READY
5  | 0xQEMU_QEMU_HARDDISK_19f9ac67-5e7e-4f00-8 | vnode89 | scsi-0QEMU_QEMU_HARDDISK_19f9ac67-5e7e-4f00-8 | 100      | N       | HDD  | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
6  | 0xQEMU_QEMU_HARDDISK_d523b7f2-eba7-4edc-b | vnode89 | scsi-0QEMU_QEMU_HARDDISK_d523b7f2-eba7-4edc-b | 100      | N       | HDD  | 77/77 (100%) | 0/10 | Storage | ONLINE | READY

Returns all disks currently present on the cluster and some additional details such as the number of volumes allocated from each, their size, and media type.

End Point: /api/v5/robin_server/disks

Method: GET

URL Parameters:

  • details=tags : Utilizing this parameter results in tag information for each disk being present in the response payload.

  • host=<hostname> : Utilizing this parameter results in only disks attached to the specified host being returned.

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

Output
{
   "items":[
      {
         "node_hostname":"cscale-82-140.robinsystems.com",
         "spf":0.8,
         "state":"READY",
         "type":"HDD",
         "nvols":3,
         "dev":"\/dev\/sdb",
         "aslices":7,
         "stormgrid":1,
         "pused":234881024,
         "nodeid":1,
         "maintenance_mode":"DISABLED",
         "availability_zone":null,
         "max_alloc_slices":77,
         "role":"Storage",
         "wwn":"0x600224804c48fd7e16c608dea0919064",
         "status":"ONLINE",
         "free_alloc_slices":68,
         "lused_size":0,
         "devpath":"\/dev\/disk\/by-id\/scsi-3600224804c48fd7e16c608dea0919064",
         "alloc_slices":9,
         "reattachable":0,
         "max_volumes_per_disk":10,
         "protected":0,
         "slices":6390,
         "capacity":107374182400,
         "pfree":104287174656
      },
      {
         "node_hostname":"cscale-82-140.robinsystems.com",
         "spf":0.8,
         "state":"READY",
         "type":"HDD",
         "nvols":1,
         "dev":"\/dev\/sdc",
         "aslices":20,
         "stormgrid":2,
         "pused":939524096,
         "nodeid":1,
         "maintenance_mode":"DISABLED",
         "availability_zone":null,
         "max_alloc_slices":77,
         "role":"Storage",
         "wwn":"0x600224803bcdafde95b1f5cd27ceb5fb",
         "status":"ONLINE",
         "free_alloc_slices":53,
         "lused_size":0,
         "devpath":"\/dev\/disk\/by-id\/scsi-3600224803bcdafde95b1f5cd27ceb5fb",
         "alloc_slices":24,
         "reattachable":0,
         "max_volumes_per_disk":10,
         "protected":0,
         "slices":6390,
         "capacity":107374182400,
         "pfree":103582531584
      },
      {
         "node_hostname":"cscale-82-140.robinsystems.com",
         "spf":0.8,
         "state":"INIT",
         "type":"HDD",
         "nvols":0,
         "dev":"\/dev\/dm-1",
         "aslices":0,
         "stormgrid":0,
         "pused":0,
         "nodeid":1,
         "maintenance_mode":"DISABLED",
         "availability_zone":null,
         "max_alloc_slices":5,
         "role":"RootDisk",
         "wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-swap",
         "status":"UNKNOWN",
         "free_alloc_slices":5,
         "lused_size":0,
         "devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphaFy4aq3EUo1yluonS8FG0LF16ycBrdEw",
         "alloc_slices":0,
         "reattachable":0,
         "max_volumes_per_disk":10,
         "protected":0,
         "slices":0,
         "capacity":8254390272,
         "pfree":0
      },
      {
         "node_hostname":"cscale-82-140.robinsystems.com",
         "spf":0.8,
         "state":"INIT",
         "type":"HDD",
         "nvols":0,
         "dev":"\/dev\/dm-0",
         "aslices":0,
         "stormgrid":0,
         "pused":0,
         "nodeid":1,
         "maintenance_mode":"DISABLED",
         "availability_zone":null,
         "max_alloc_slices":38,
         "role":"RootDisk",
         "wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-root",
         "status":"UNKNOWN",
         "free_alloc_slices":38,
         "lused_size":0,
         "devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphgpZcvqGdfOKaXbEbOZzNthc6btsoSXDj",
         "alloc_slices":0,
         "reattachable":0,
         "max_volumes_per_disk":10,
         "protected":0,
         "slices":0,
         "capacity":53687091200,
         "pfree":0
      },
      {
         "node_hostname":"cscale-82-140.robinsystems.com",
         "spf":0.8,
         "state":"INIT",
         "type":"HDD",
         "nvols":0,
         "dev":"\/dev\/dm-2",
         "aslices":0,
         "stormgrid":0,
         "pused":0,
         "nodeid":1,
         "maintenance_mode":"DISABLED",
         "availability_zone":null,
         "max_alloc_slices":32,
         "role":"RootDisk",
         "wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-home",
         "status":"UNKNOWN",
         "free_alloc_slices":32,
         "lused_size":0,
         "devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphQObDlS6eMUSpSxH5zsvyg9I5a0Gpuj5W",
         "alloc_slices":0,
         "reattachable":0,
         "max_volumes_per_disk":10,
         "protected":0,
         "slices":0,
         "capacity":44350570496,
         "pfree":0
      },
      {
         "node_hostname":"cscale-82-140.robinsystems.com",
         "spf":0.8,
         "state":"INIT",
         "type":"HDD",
         "nvols":0,
         "dev":"\/dev\/sda",
         "aslices":0,
         "stormgrid":0,
         "pused":0,
         "nodeid":1,
         "maintenance_mode":"DISABLED",
         "availability_zone":null,
         "max_alloc_slices":77,
         "role":"RootDisk",
         "wwn":"0x600224801d3ac9b6650afd3280aa5898",
         "status":"UNKNOWN",
         "free_alloc_slices":77,
         "lused_size":0,
         "devpath":"\/dev\/disk\/by-id\/scsi-3600224801d3ac9b6650afd3280aa5898",
         "alloc_slices":0,
         "reattachable":0,
         "max_volumes_per_disk":10,
         "protected":0,
         "slices":0,
         "capacity":107374182400,
         "pfree":0
      }
   ]
}

6.5. Show information about a specific disk

In order to view detailed information about a disk such as the breakdown of its available capacity, current allocations including the associated applications, write unit etc, issue the following command:

# robin disk info <wwn>
                  --json

wwn

WWN of disk to detach

--json

Output in JSON

Example:

# robin disk info 0xQEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8
Drive: 0xQEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8
  Id: 2
  Role: Storage
  Type: HDD
  Make: None
  Model: None
  Write Unit: 512
  Availability Zone: None
  Zone Id: 1613675775
  Path: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8
  Node: vnode-95-28.robinsystems.com
  Protected: No
  Reattachable: No
  State: READY
  Status: MAINTENANCE
  Maintenance: ON
  Allocations:
    Current: 15G
    Maximum: 38G
    Free: 23G
    Capacity: 50G
    Factor: 30%
    AllocScore: 83
    Physical Usage: 0.16G
    Physical Free: 48G
  Volumes: 3
    Max volumes: 10
    Max latency volumes: 2
    Max throughput volumes: 1
  Applications: 3
    file-collection-1613696655681
      Instance: file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2

          Id      | Volume Name                                                        | Workload | Size (GB)      | Allocated (GB)
          --------+--------------------------------------------------------------------+----------+----------------+----------------
          1       | file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2 | ordinary | 5              | 5


    pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d
      Instance: pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d

          Id       | Volume Name                              | Workload | Size (GB)      | Allocated (GB)
          ---------+------------------------------------------+----------+----------------+----------------
          21       | pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d | ordinary | 5              | 5

Returns detailed information about a disk such as the breakdown of its available capacity, current allocations including the associated applications and write unit.

End Point: /api/v3/robin_server/disks/<wwn>

Method: GET

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error)

Example Response:

Output
{
   "items":[
      {
         "wwn":"0xQEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8",
         "dev":"\/dev\/sdb",
         "make":null,
         "model":null,
         "devpath":"\/dev\/disk\/by-id\/scsi-0QEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8",
         "capacity":53687091200,
         "type":"HDD",
         "role":"Storage",
         "state":"READY",
         "status":"MAINTENANCE",
         "maintenance_mode":"ON",
         "alloc_score":83,
         "slices":3190,
         "aslices":15,
         "pused":167772160,
         "pfree":51774488576,
         "nodeid":1,
         "stormgrid":2,
         "reattachable":0,
         "protected":0,
         "availability_zone":null,
         "node_hostname":"vnode-95-28.robinsystems.com",
         "reattachable_nodes":[
            [
               "vnode-95-28.robinsystems.com",
               "ONLINE"
            ]
         ],
         "reattachpolicy":{
            "id":2,
            "burst_start_time":0,
            "burst_count":0,
            "burst_interval":600,
            "restart_limit":5,
            "restarts_done":0
         },
         "zoneid":1613675775,
         "allocations":[
            {
               "name":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2.0.e4623953-263e-4b91-b66b-6be69ab60018",
               "slices":5,
               "volume_group":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2.72.1.d9d9726c-8052-48b1-b7ee-b953dbe254ff",
               "vols":[
                  {
                     "id":"1",
                     "name":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2",
                     "size":5368709120,
                     "state":"ONLINE",
                     "media":"HDD",
                     "workload_str":"ordinary"
                  }
               ],
               "volume":{
                  "id":"1",
                  "name":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2",
                  "size":5368709120,
                  "state":"ONLINE",
                  "media":"HDD",
                  "workload_str":"ordinary"
               }
            },
            {
               "name":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d.0",
               "slices":5,
               "volume_group":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d.72.1",
               "vols":[
                  {
                     "id":"21",
                     "name":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d",
                     "size":5368709120,
                     "state":"ONLINE",
                     "media":"HDD",
                     "workload_str":"ordinary"
                  }
               ],
               "volume":{
                  "id":"21",
                  "name":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d",
                  "size":5368709120,
                  "state":"ONLINE",
                  "media":"HDD",
                  "workload_str":"ordinary"
               }
            },
            {
               "name":"pvc-71561296-4699-4b79-bf92-3dd5470929cb.0",
               "slices":5,
               "volume_group":"pvc-71561296-4699-4b79-bf92-3dd5470929cb.72.1",
               "vols":[
                  {
                     "id":"22",
                     "name":"pvc-71561296-4699-4b79-bf92-3dd5470929cb",
                     "size":5368709120,
                     "state":"ONLINE",
                     "media":"HDD",
                     "workload_str":"throughput"
                  }
               ],
               "volume":{
                  "id":"22",
                  "name":"pvc-71561296-4699-4b79-bf92-3dd5470929cb",
                  "size":5368709120,
                  "state":"ONLINE",
                  "media":"HDD",
                  "workload_str":"throughput"
               }
            }
         ],
         "tags":{

         },
         "max_volumes_per_disk":10,
         "max_latency_sensitive_vols_per_disk":2,
         "max_throughput_intensive_vols_per_disk":1,
         "nvols":3,
         "lused_size":0,
         "preserved":0,
         "write_unit":512
      }
   ]
}

6.6. Evacuating volumes from a disk

This command allows users to move Robin storage volumes from one disk to another. As a result, it can be utilized to free up space on a disk when it is getting full, or to move data out of a disk before it is decommissioned. To evacuate a volume, issue the following command:

Note

Robin’s placement algorithm will determine the best disks to evacuate volumes to if target disks are not specified.

# robin disk evacuate <wwn>
                      --volume <volume>
                      --to-disks <target_disks>
                      --exclude-disks <excluded_disks>

wwn

WWN of disk to evacuate

--volume <volume>

Name of volume to be evacuated. Note if not provided all the volumes will be evacuated

--to-disks <target_disks>

List of WWNs representing disks to evacuate to

--exclude-disks <excluded_disks>

List of WWNs representing disks to avoid evacuating to

--yes

Do not prompt the user for confirmation of evacuation

Example:

# robin disk evacuate 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job:   65 Name: DiskEvacuate         State: VALIDATED       Error: 0
Job:   65 Name: DiskEvacuate         State: WAITING         Error: 0
Job:   65 Name: DiskEvacuate         State: COMPLETED       Error: 0

Evacuates volumes residing on one disk to another.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: PUT

URL Parameters: None

Data Parameters:

  • action: evacuate - This mandatory field within the payload specifies that the evacuate operation is to be performed.

  • volume: <volume_name> - Utilizing this parameter results in only the specified volume being evacuated from the disk. If this is not specified all volumes on the disk are evacuated.

  • target_drives: <list_of_target_drives> - Utilizing this parameter, by specifying a list of WWNs representing respective target drives, ensures the evacuated volumes will be transferred to one of the specified disks.

  • exclude_drives: <list_of_excluded_drives> - Utilizing this parameter, by specifying a list of WWNs representing respective excluded drives, ensures the evacuated volumes will not transferred to one of the specified disks.

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 145
}

6.7. Unregistering a disk

In order to unregister disk from the Robin cluster due to the fact it has physically malfunctioned or it is faulted, issue the following command:

Note

A disk can only be unregistered if there are no storage volumes allocated from it and it is still attached to the node from the perspective of the platform.

# robin disk unregister <wwn>
                        --yes

wwn

WWN of disk to unregister

--yes

Do not prompt the user for confirmation of disk unregistration

Example:

# robin disk unregister 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job:   101 Name: DiskUnregister       State: VALIDATED       Error: 0
Job:   101 Name: DiskUnregister       State: WAITING         Error: 0
Job:   101 Name: DiskUnregister       State: COMPLETED       Error: 0

Unregisters a disk from the Robin cluster.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: DELETE

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)

Example Response:

Output
{
   "jobid": 130
}

6.8. Removing a disk

In order to remove disk from the Robin cluster due to the fact it has physically malfunctioned or it is faulted, issue the following command:

Note

A disk can only be removed if there are no storage volumes allocated from it.

# robin disk remove <wwn>
                    --yes

wwn

WWN of disk to remove

--yes

Do not prompt the user for confirmation of disk removal

Example:

# robin disk remove 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job:   101 Name: DiskRemove         State: VALIDATED       Error: 0
Job:   101 Name: DiskRemove         State: WAITING         Error: 0
Job:   101 Name: DiskRemove         State: COMPLETED       Error: 0

Removes a disk from the Robin cluster.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: DELETE

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 130
}

6.9. Updating disk properties

In certain circumstances, due to the hardware or environmental reasons, Robin may not correctly discover all the attributes of a device correctly. As a result, in order to modify a disk properties such that they are correct, issue the following command:

# robin disk update <wwn>
                    --maxvolumes <max_volumes>
                    --maxlatencysensitivevolumes <max_latency_volumes>
                    --maxthroughputintensivevolumes <max_throughput_volumes>
                    --role <role>
                    --type <type>
                    --all
                    --set-reattachable
                    --unset-reattachable
                    --set-maintenance
                    --unset-maintenance

wwn

WWN of disk to update

--maxvolumes <max_volumes>

Max number of volumes allowed on the disk

--maxthroughputintensivevolumes <max_latency_volumes>

Max number of latency sensitive volumes allowed on the disk

--maxthroughputinstensivevolumes <max_throughput_volumes>

Max number of throughput intensive volumes allowed on the disk

--role <role>

Role of the disk to update to

--type <type>

Type of the disk to update to

--all

Run the update for all disks. Should be specified if no WWN is given

--set-reattachable

Mark the disk as re-attachable

--unset-reattachable

Mark the disk as not re-attachable

--set-maintenance

Put the disk into maintenance mode

--unset-maintenance

Remove the disk from maintenance mode

Example:

# robin disk update 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --maxvolumes 100 --type SSD --wait --yes
Job:   101 Name: DiskModify         State: VALIDATED       Error: 0
Job:   101 Name: DiskModify         State: WAITING         Error: 0
Job:   101 Name: DiskModify         State: COMPLETED       Error: 0

Modifies a disk’s discovered and/or Robin specific properties.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: PUT

URL Parameters: None

Data Parameters:

  • action: update - This mandatory field within the payload specifies that the update operation is to be performed.

  • role: <role> - Utilizing this parameter results in the role of the disk being set to the value specified. Options include: Storage, Swap, RootDisk, and Reserved.

  • type: <type> - Utilizing this parameter results in the type of the disk being set to the value specified. Options include: HDD and SSD.

  • maxvolumesperdisk: <max_vols_on_disk> - Utilizing this parameter results in the maximum number of volumes allowed on the disk being set to the value specified.

  • maxlatencysensitivevolumesperdisk: <max_lat_sens_vols_on_disk> - Utilizing this parameter results in the maximum number of latency sensitive volumes allowed on the disk being set to the value specified.

  • maxthroughputintensivevolumesperdisk: <max_through_ints_vols_on_disk> - Utilizing this parameter results in the maximum number of throughput intensive volumes allowed on the disk being set to the value specified.

  • reattachable: [0,1] - Utilizing this parameter results in the reattachable attribute of the disk being set. By specifying a value of 1, the disk is said to be reattachable and vice versa for a value of 0.

  • maintenance: [0,1] - Utilizing this parameter results in the maintenance mode attribute of the disk being set. By specifying a value of 1, the disk is said to be in maintenance mode and vice versa for a value of 0.

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 54
}

6.10. Unfaulting a disk

When the Robin storage stack encounters an IO error on a disk, it will mark the disk as FAULTED. Sometimes the error is NOT due to the disk, but due to environmental or other hardware errors such as controller errors. In this case one can determine that the disk is actually healthy. In turn this command allows a user to inform Robin that the disk which is presumed to be FAULTED is actually healthy and the external issues that caused the IO failure have been resolved. Robin will proceed to mark the disk as ONLINE and resume normal access to the disk. In order to an unfault a disk, issue the following command:

Note

This functionality should be used with extreme care. If the disk is actually bad, and a user utilizes this command to reverse the error reported by the Robin storage stack, it could result in data loss.

# robin disk unfault <wwn>
                     --yes

wwn

WWN of disk to unfault

--yes

Do not prompt the user for confirmation of unfaulting the drive

Example:

# robin disk unfault 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job:   101 Name: DiskUnfault         State: VALIDATED       Error: 0
Job:   101 Name: DiskUnfault         State: WAITING         Error: 0
Job:   101 Name: DiskUnfault         State: COMPLETED       Error: 0

Unfaults a disk that has been marked as FAULTED due to an IO error. Note this functionality should be used with extreme care and only in situations where one has determined that external issues such as hardware/controller errors, that have since been resolved, caused the IO failure. If this condition is not met, unfaulting the disk could result in data loss.

End Point: /api/v3/robin_server/disks/<disk_wwn>

Method: PUT

URL Parameters: None

Data Parameters:

  • action: unfault - This mandatory field within the payload specifies that the unfault operation is to be performed.

  • wwn: <disk_wwn> - This mandatory field within the payload specifies the WWN of the disk on which the unfault operation should be performed on.

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 202

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)

Example Response:

Output
{
   "jobid": 187
}

6.11. Configure the snapshot space limit of a volume

Robin stores snapshot(s) of a volume within a dedicated space, referred to as the snapshot space of the volume. The snapshot space is considered to be a percentage of the allocated space for a volume (by default this percentage is 40%). However after the volume is provisioned the size of the snapshot might need to be modified. In order to achieve this, issue the following command:

# robin volume snapshot-space-limit <volume_name>  <snapshot_space_limit>

Note

Only the snapshot space limit of individual volumes can be configured at this time. To configure the snapshot space limit for an application, update the snapshot space limit of each of the volumes it consists of.

volume_name

Name of the volume to configure the snapshot space limit for

snapshot_space_limit

Size of the snapshot space limit in bytes

Example:

# robin volume snapshot-space-limit pvc-bd01bcd9-9e4d-4a34-ac13-5a98395935b3 4G
Successfully updated volume pvc-bd01bcd9-9e4d-4a34-ac13-5a98395935b3 snapshot space limit to 4G.

6.12. Handling Disruptions

With Robin, highly available applications can be deployed on Kubernetes as Robin can handle failures of drives, rack or hosts automatically. On a Baremetal setup, volumes can be setup with a replication factor of 2 or 3 to ensure that storage is available even if a drive fails. Users can also choose the fault domain to be ‘host’ to protect against node reboots or lost.

However, in a public cloud environment the cloud disks can be detached from one cloud node and reattached to another one. For example, in AWS an EBS volume can be detached on one EC2 host and reattached to a different EC2 host. Same with GCP where a PD can be moved across GCE nodes. If a cloud node (EC2, GCE, Azure VM) is terminated or rebooted, one would want any cloud drive attached to them (EBS, PD, Block) to be moved to the one or more of the remaining healthy nodes automatically. This is not limited to just cloud disks, but also SAN LUNS that are offered to Robin as disks. The SAN LUNS can also be multi-mounted onto multiple nodes or moved around from node to node. User can still choose to replicate volume on public cloud as it takes sometime to detach and attach drives on cloud platforms.

Just having the storage available during a disruption will not help if Kubernetes can not access it from the Pod. For example a Kubernetes StatefulSet serializes the mounting and unmounting of a volume to protect against possible corruptions. Robin utilizes smart detection techniques to ensure that even if a volume is mounted on multiple nodes, it can differentiate the IOs issued from the previous stale mount and the new mount. With this consistency guarantees, Robin enables the Kubernetes StatefulSet to unmount a volume from a dead node and remount it on a healthy node where the Pod is scheduled to run. Robin actively monitors these events to allow for the fast failover of the Pods without user intervention and consequently enables users to reliably deploy highly available stateful applications on Kubernetes.

6.13. Hardware RAID Controllers

6.13.1. Robin Recommendations

Robin recommends not using Redundant Array of Independent Disks (RAID) configurations for the physical storage devices that you plan to format with Robin storage. To understand the reasons for this recommendation, review the explanations presented here. However if RAID usage cannot be avoided, Robin recommends the following:

  • Use the hardware RAID controllers only to set up RAID-1 protection for boot and OS devices. This ensures that the failure of one device does not take the entire node down.

  • To protect Robin volumes against node and rack failures, you must configure replication across the fault domains. All RAID configurations (RAID-0, RAID-1, RAID-5, and RAID-6) protect data only against device failure on the node. They reduce the storage capacity of the cluster, thus increasing the data storage costs.

  • Robín’s replication provides maximum protection against all hardware failures including storage devices, storage nodes, and rack failures.

  • If a RAID controller has a cache, it must have a battery backup. If the cache on the RAID controller does not have a battery backup, Robin recommends disabling the cache as the data on the cache could be lost during a power outage.

6.13.2. Advantages and Disadvantages

The following are the advantages and disadvantages of RAID usage with Robin:

  • RAID-0 - In RAID-0, data is striped across all devices and provides the best performance. However, if one of the physical devices fails, the data on the RAID volume is lost.

  • RAID-1 - In RAID-1, data is mirrored across two devices. It gives the most redundancy. However, the performance of the volume is determined by the performance of the slowest device in the mirror and the data capacity is reduced by half.

  • RAID-5 - In RAID-5, data is striped across “n-1” devices. The controller calculates and writes parity data to the nth device. The RAID-5 configuration can tolerate one device failure and provides good performance. However, if one of the devices fails, the admin has to replace the device immediately and start a RAID rebuild to maintain the redundancy. All devices should be of the same size or the size of the smallest device decides the total usable capacity.

  • RAID-6 - In RAID-6, data is striped across “n-2” devices. The controller calculates and writes parity to two devices. The RAID-6 configuration can tolerate two device failures and provides good performance. However, the disadvantages are the same as RAID-5.

  • The striped RAID configurations (RAID-0, RAID-5, and RAID-6) provide better IO performance. However, RAID-0 does not give data protection, and RAID-5 and RAID-6 configurations reduce the storage capacity.

  • All RAID configurations are managed outside of Robin CNS. Therefore, the administrator must manually manage the device failures and it leads to an increase in operations cost.

  • If you enable hardware RAID, the RAID controller becomes a single point of failure for all the storage devices on the node. If the RAID controller fails, all data on all the storage devices in the node is lost.

Note

For more details on Robin recommendations for RAID controllers, review the points made here.