6. Managing Storage¶
Robin discovers disks attached to the nodes and uses them for providing storage to applications. This applies to local disks, cloud volumes, and SAN stoage that are available to nodes which have the storage role assigned to them. During the process Robin collects all the required metadata about the disks and identifies them as HDD, SSD or NVMes. All the elligible disks (this excludes disks with parititions and/or without WWNs) are marked as Robin storage.
Cloud volumes like EBS volumes on AWS and Persistent disks on GCP are not physically tied to a cloud compute instance.
They can be detached from one cloud compute instance and attached to another instance at any time. Logical Units from a
Storage Array Network can similarly be visible from multiple physical servers in the same storage fabric or network. Robin
assigns a primary host for these devices and ensures that all IO access to a device only happens through one server at a
given time. As with local disks, these devices are automatically discovered by Robin and registered. However they are considered
to be re-attachable
given their properties. Robin makes sure these re-attachable disks are always accessible even in the event of node failure.
When a node goes down, Robin detaches the disk from failed node and attaches it to a healthy node so that application data remains accessible.
Note
Whilst installing if it is known that certain disks will not have a WWN, the --set-uuid
option can be provided via the installer alongside a list of disks on which there is no WWN. This in turn will result in Robin stamping a UUID on the disk and thus enabling it to be used as Robin storage.
Robin allows multiple operations, detailed below, to be performed on registered disks.
|
Provision and attach a cloud disk |
|
Attach an existing disk |
|
Detach a registered disk |
|
List disks |
|
Display detailed information about a disk |
|
Evacuate allocations off a disk |
|
Remove a disk |
|
Update the attributes of a disk |
|
Unfault a disk |
|
Change snapshot space limit |
6.1. Provisioning a disk¶
Robin provides a utility through which it can provision disks of any size, attach them to hosts and discover them automatically on multiple cloud platforms. This enables users to expand the storage available for their cluster with convenience and ease.
Detailed below are the general options for the robin drive create
command, followed by specific examples for each supported cloud environment.
# robin disk create <hostname>
--type <type>
--number <number>
--size <size>
--iops <iops>
|
Name of host to attach disk to after creation |
|
Type of disks. Choices for GCP include: pd-ssd, pd-standard. Choices for AWS include: gp2, io1, st1. Choices for Anthos include: independent-persistent. Choices for IBM include general-purpose (3iops-tier), 5iops-tier, 10iops-tier, custom. |
|
Number of disks to be created. The default value is 1 |
|
Size of disk to be created in GB. The default value is 500 GB |
|
IOPs for AWS ‘io1’ disk type, for IBM ‘custom’ disk type. Can be between 100-160000 |
Provisions disks of any size, attaches and discovers them automatically on cloud based nodes.
End Point: /api/v3/robin_server/disks/
Method: POST
URL Parameters: None
Data Parameters:
hostname: <hostname>
- This mandatory field within the payload specifies the host to which the provisioned disk should be attached.type: <disk_type>
- This mandatory field within the payload specifies the type of disk to be created. Supported types include: pd-ssd, pd-standard (for GCP); gp2, io1, st1 (for AWS); independent-persistent (for Anthos); general-purpose, 5iops-tier, 10iops-tier, custom (for IBM).number: <num_of_disks>
- This mandatory field within the payload specifies the number of disks to be created. It should be an integer value.size: <size_of_disk>
- This mandatory field within the payload specifies the size of the disks to be created. It should be a string. An example value is ‘200GB’.iops: <iops_of_disk>
- Utilizing this parameter results in disks that can handle a maximum number of IOPs equal to <iops_of_disks> being created. Note this parameter is valid for the ‘io1’ disk type for AWS hosts, ‘custom’ disk type for IBM hosts.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 196
}
On Google Cloud Platform, you can attach disks to your instance via the UI or Google APIs and have them available for use by Robin by running the below command:
$ robin host probe <hostname> --rediscover
On the other hand you can utilize Robin to provision disks in GCP to use for application deployment. To create 100 GB disk in GCP, run following command:
$ robin disk create <hostname> --type <pd-standard | pd-ssd> --size 100
These disks will be attached automatically and auto discovered by Robin so they will be ready to use straightaway.
Note
Due to Robin’s advanced feature to make sure disks are always accessible, it needs the manage disks permission to be selected while deploying cluster on GCP.
On the Google Anthos platform, you can add disks to cluster VMs from vSphere and have them available for use by Robin by running the below command:
$ robin host probe <hostname> --rediscover
On the other hand you can utilize Robin to provision virtual disks to use for application deployment. To create 100 GB disk for Anthos, run following command:
$ robin disk create <hostname> --type independent-persistent --size 100
These disks will be attached automatically and auto discovered by Robin so they will be ready to use straightaway.
Note
Due to Robin’s advanced feature to make sure disks are always accessible, it needs credentials, provided via Kubernetes secret, to have all cluster and disk level API privileges.
On AWS, you can attach disks to your EC2 instance via the UI or AWS CLI/APIs and have them available for use by Robin by running the below command:
$ robin host probe <hostname> --rediscover
On the other hand you can utilize Robin to provision disks in AWS to use for application deployment. To create 100 GB disk in AWS, run following command:
$ robin disk create <hostname> --type <gp2 | io1 | st1> --size 100
These disks will be attached automatically to the EC2 instances and auto discovered by Robin so they will be ready to use straightaway.
Note
Due to Robin’s advanced feature to make sure disks are always accessible, IAM Profiles associated with the host (or permissions granted to a user) must contain all Volume write and list actions.
On IBM Cloud Platform, you can create and attach disks to your instance via the UI or IBM Cloud APIs and have them available for use by Robin by running the below command:
$ robin host probe <hostname> --rediscover
Alternatively you can utilize Robin utility to provision and attach disks in IBM Cloud to use for application deployment. To create 100 GB disk in IBM Cloud, run following command:
$ robin disk create <hostname> --type <general-purpose | 5iops-tier | 10iops-tier | custom > --size 100
These disks will be attached automatically and auto discovered by Robin so they will be ready to use straightaway.
6.2. Attaching a disk¶
If a disk was previously detached from a host due to the host undergoing decommissioning or to rebalance the storage available on the cluster one will need to attach the disk back to another host in the cluster. Issue the following command to do so:
Note
This command is only for supported for re-attachable
disks. Moreover if a host became unreachable Robin will automatically detach the disk and choose a new server to reattach it to based on the accessibility of the disk and the load on the other servers.
# robin disk attach <wwn>
--hostname <hostname>
--force
|
WWN of disk to attach |
|
Name of host to attach disk to. Note this is a mandatory parameter. |
|
Forcibly re-attach a disk that is already ONLINE |
Example:
# robin disk detach 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait
Job: 187 Name: DiskAttach State: VALIDATED Error: 0
Job: 187 Name: DiskAttach State: WAITING Error: 0
Job: 187 Name: DiskAttach State: FINALIZED Error: 0
Job: 187 Name: DiskAttach State: COMPLETED Error: 0
Attaches a disk, which might have previously been detached due to the host undergoing decommissioning or a rebalancing of storage disks, to a particular host.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: PUT
URL Parameters: None
Data Parameters:
action: attach
- This mandatory field within the payload specifies that the attach operation is to be performed.hostname: <hostname>
- Utilizing this parameter results in the disk being attached to the specified host.force: true
- Utilizing this parameter enables one to force re-attach an ONLINE disk.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 128
}
6.3. Detaching a disk¶
If a physical host is temporarily or permanently taken down and one can detach its storage disks to ensure future access. Issue the following command to do so:
Note
This command is only for supported for re-attachable
disks.
# robin disk detach <wwn>
--hostname <hostname>
--force
|
WWN of disk to detach |
|
Name of host to which disk is attached to. Note this is an optional parameter |
|
Forcibly detach a disk that is currently ONLINE |
Example:
# robin disk detach 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait
Job: 187 Name: DiskDetach State: VALIDATED Error: 0
Job: 187 Name: DiskDetach State: WAITING Error: 0
Job: 187 Name: DiskDetach State: FINALIZED Error: 0
Job: 187 Name: DiskDetach State: COMPLETED Error: 0
Detaches a disk from a host.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: PUT
URL Parameters: None
Data Parameters:
action: detach
- This mandatory field within the payload specifies that the detach operation is to be performed.hostname: <hostname>
- Utilizing this parameter results in the disk being detached from the specified host. This is optional as the host to which the disk is attached can be discovered implicitly.force: true
- Utilizing this parameter enables one to force detach an ONLINE disk.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 130
}
6.4. Listing all disks¶
In order to view all disks currently present on the cluster and some additional details such as the number of volumes allocated from each, their size, media type etc, issue the following command:
# robin disk list --host <hostname>
--role <role>
--media <media>
--reattachable
--eligible
--tags
--json
|
Filter list by hostname |
|
Filter list by role. Valid choices include: all, storage, rootdisk and reserved |
|
Filter list by media type. Valid choices include: HDD and SSD |
|
Filter list to only display reattachable disks |
|
Filter list to only display disks that have free capacity and have not reached the maximum volume count |
|
Display tags for each disk |
|
Output in JSON |
Example:
# robin disk list
ID | WWN | Host | Path /dev/disk/by-id | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role | Status | LastOpr
---+-------------------------------------------+---------+-----------------------------------------------+----------+---------+------+--------------+------+---------+--------+---------
1 | 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b | vnode36 | scsi-0QEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b | 50 | N | HDD | 38/38 (100%) | 0/10 | Storage | ONLINE | READY
2 | 0xQEMU_QEMU_HARDDISK_f12b1f33-8b71-4a8c-a | vnode36 | scsi-0QEMU_QEMU_HARDDISK_f12b1f33-8b71-4a8c-a | 50 | N | HDD | 28/38 (74%) | 1/10 | Storage | ONLINE | READY
3 | 0xQEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | vnode88 | scsi-0QEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | 100 | N | HDD | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
4 | 0xQEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | vnode88 | scsi-0QEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | 100 | N | HDD | 63/77 (82%) | 2/10 | Storage | ONLINE | READY
5 | 0xQEMU_QEMU_HARDDISK_19f9ac67-5e7e-4f00-8 | vnode89 | scsi-0QEMU_QEMU_HARDDISK_19f9ac67-5e7e-4f00-8 | 100 | N | HDD | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
6 | 0xQEMU_QEMU_HARDDISK_d523b7f2-eba7-4edc-b | vnode89 | scsi-0QEMU_QEMU_HARDDISK_d523b7f2-eba7-4edc-b | 100 | N | HDD | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
Returns all disks currently present on the cluster and some additional details such as the number of volumes allocated from each, their size, and media type.
End Point: /api/v5/robin_server/disks
Method: GET
URL Parameters:
details=tags
: Utilizing this parameter results in tag information for each disk being present in the response payload.host=<hostname>
: Utilizing this parameter results in only disks attached to the specified host being returned.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"items":[
{
"node_hostname":"cscale-82-140.robinsystems.com",
"spf":0.8,
"state":"READY",
"type":"HDD",
"nvols":3,
"dev":"\/dev\/sdb",
"aslices":7,
"stormgrid":1,
"pused":234881024,
"nodeid":1,
"maintenance_mode":"DISABLED",
"availability_zone":null,
"max_alloc_slices":77,
"role":"Storage",
"wwn":"0x600224804c48fd7e16c608dea0919064",
"status":"ONLINE",
"free_alloc_slices":68,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224804c48fd7e16c608dea0919064",
"alloc_slices":9,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"slices":6390,
"capacity":107374182400,
"pfree":104287174656
},
{
"node_hostname":"cscale-82-140.robinsystems.com",
"spf":0.8,
"state":"READY",
"type":"HDD",
"nvols":1,
"dev":"\/dev\/sdc",
"aslices":20,
"stormgrid":2,
"pused":939524096,
"nodeid":1,
"maintenance_mode":"DISABLED",
"availability_zone":null,
"max_alloc_slices":77,
"role":"Storage",
"wwn":"0x600224803bcdafde95b1f5cd27ceb5fb",
"status":"ONLINE",
"free_alloc_slices":53,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224803bcdafde95b1f5cd27ceb5fb",
"alloc_slices":24,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"slices":6390,
"capacity":107374182400,
"pfree":103582531584
},
{
"node_hostname":"cscale-82-140.robinsystems.com",
"spf":0.8,
"state":"INIT",
"type":"HDD",
"nvols":0,
"dev":"\/dev\/dm-1",
"aslices":0,
"stormgrid":0,
"pused":0,
"nodeid":1,
"maintenance_mode":"DISABLED",
"availability_zone":null,
"max_alloc_slices":5,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-swap",
"status":"UNKNOWN",
"free_alloc_slices":5,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphaFy4aq3EUo1yluonS8FG0LF16ycBrdEw",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"slices":0,
"capacity":8254390272,
"pfree":0
},
{
"node_hostname":"cscale-82-140.robinsystems.com",
"spf":0.8,
"state":"INIT",
"type":"HDD",
"nvols":0,
"dev":"\/dev\/dm-0",
"aslices":0,
"stormgrid":0,
"pused":0,
"nodeid":1,
"maintenance_mode":"DISABLED",
"availability_zone":null,
"max_alloc_slices":38,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-root",
"status":"UNKNOWN",
"free_alloc_slices":38,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphgpZcvqGdfOKaXbEbOZzNthc6btsoSXDj",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"slices":0,
"capacity":53687091200,
"pfree":0
},
{
"node_hostname":"cscale-82-140.robinsystems.com",
"spf":0.8,
"state":"INIT",
"type":"HDD",
"nvols":0,
"dev":"\/dev\/dm-2",
"aslices":0,
"stormgrid":0,
"pused":0,
"nodeid":1,
"maintenance_mode":"DISABLED",
"availability_zone":null,
"max_alloc_slices":32,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-home",
"status":"UNKNOWN",
"free_alloc_slices":32,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphQObDlS6eMUSpSxH5zsvyg9I5a0Gpuj5W",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"slices":0,
"capacity":44350570496,
"pfree":0
},
{
"node_hostname":"cscale-82-140.robinsystems.com",
"spf":0.8,
"state":"INIT",
"type":"HDD",
"nvols":0,
"dev":"\/dev\/sda",
"aslices":0,
"stormgrid":0,
"pused":0,
"nodeid":1,
"maintenance_mode":"DISABLED",
"availability_zone":null,
"max_alloc_slices":77,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898",
"status":"UNKNOWN",
"free_alloc_slices":77,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224801d3ac9b6650afd3280aa5898",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"slices":0,
"capacity":107374182400,
"pfree":0
}
]
}
6.5. Show information about a specific disk¶
In order to view detailed information about a disk such as the breakdown of its available capacity, current allocations including the associated applications, write unit etc, issue the following command:
# robin disk info <wwn>
--json
|
WWN of disk to detach |
|
Output in JSON |
Example:
# robin disk info 0xQEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8
Drive: 0xQEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8
Id: 2
Role: Storage
Type: HDD
Make: None
Model: None
Write Unit: 512
Availability Zone: None
Zone Id: 1613675775
Path: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8
Node: vnode-95-28.robinsystems.com
Protected: No
Reattachable: No
State: READY
Status: MAINTENANCE
Maintenance: ON
Allocations:
Current: 15G
Maximum: 38G
Free: 23G
Capacity: 50G
Factor: 30%
AllocScore: 83
Physical Usage: 0.16G
Physical Free: 48G
Volumes: 3
Max volumes: 10
Max latency volumes: 2
Max throughput volumes: 1
Applications: 3
file-collection-1613696655681
Instance: file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2
Id | Volume Name | Workload | Size (GB) | Allocated (GB)
--------+--------------------------------------------------------------------+----------+----------------+----------------
1 | file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2 | ordinary | 5 | 5
pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d
Instance: pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d
Id | Volume Name | Workload | Size (GB) | Allocated (GB)
---------+------------------------------------------+----------+----------------+----------------
21 | pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d | ordinary | 5 | 5
Returns detailed information about a disk such as the breakdown of its available capacity, current allocations including the associated applications and write unit.
End Point: /api/v3/robin_server/disks/<wwn>
Method: GET
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error)
Example Response:
Output
{
"items":[
{
"wwn":"0xQEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8",
"dev":"\/dev\/sdb",
"make":null,
"model":null,
"devpath":"\/dev\/disk\/by-id\/scsi-0QEMU_QEMU_HARDDISK_80cb8b56-aef2-48a8-8",
"capacity":53687091200,
"type":"HDD",
"role":"Storage",
"state":"READY",
"status":"MAINTENANCE",
"maintenance_mode":"ON",
"alloc_score":83,
"slices":3190,
"aslices":15,
"pused":167772160,
"pfree":51774488576,
"nodeid":1,
"stormgrid":2,
"reattachable":0,
"protected":0,
"availability_zone":null,
"node_hostname":"vnode-95-28.robinsystems.com",
"reattachable_nodes":[
[
"vnode-95-28.robinsystems.com",
"ONLINE"
]
],
"reattachpolicy":{
"id":2,
"burst_start_time":0,
"burst_count":0,
"burst_interval":600,
"restart_limit":5,
"restarts_done":0
},
"zoneid":1613675775,
"allocations":[
{
"name":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2.0.e4623953-263e-4b91-b66b-6be69ab60018",
"slices":5,
"volume_group":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2.72.1.d9d9726c-8052-48b1-b7ee-b953dbe254ff",
"vols":[
{
"id":"1",
"name":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2",
"size":5368709120,
"state":"ONLINE",
"media":"HDD",
"workload_str":"ordinary"
}
],
"volume":{
"id":"1",
"name":"file-collection-1613696655681.b9307d9d-f1fe-4731-a815-99800f3811a2",
"size":5368709120,
"state":"ONLINE",
"media":"HDD",
"workload_str":"ordinary"
}
},
{
"name":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d.0",
"slices":5,
"volume_group":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d.72.1",
"vols":[
{
"id":"21",
"name":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d",
"size":5368709120,
"state":"ONLINE",
"media":"HDD",
"workload_str":"ordinary"
}
],
"volume":{
"id":"21",
"name":"pvc-9369013d-cba4-41a9-b9b8-15228e5ea09d",
"size":5368709120,
"state":"ONLINE",
"media":"HDD",
"workload_str":"ordinary"
}
},
{
"name":"pvc-71561296-4699-4b79-bf92-3dd5470929cb.0",
"slices":5,
"volume_group":"pvc-71561296-4699-4b79-bf92-3dd5470929cb.72.1",
"vols":[
{
"id":"22",
"name":"pvc-71561296-4699-4b79-bf92-3dd5470929cb",
"size":5368709120,
"state":"ONLINE",
"media":"HDD",
"workload_str":"throughput"
}
],
"volume":{
"id":"22",
"name":"pvc-71561296-4699-4b79-bf92-3dd5470929cb",
"size":5368709120,
"state":"ONLINE",
"media":"HDD",
"workload_str":"throughput"
}
}
],
"tags":{
},
"max_volumes_per_disk":10,
"max_latency_sensitive_vols_per_disk":2,
"max_throughput_intensive_vols_per_disk":1,
"nvols":3,
"lused_size":0,
"preserved":0,
"write_unit":512
}
]
}
6.6. Evacuating volumes from a disk¶
This command allows users to move Robin storage volumes from one disk to another. As a result, it can be utilized to free up space on a disk when it is getting full, or to move data out of a disk before it is decommissioned. To evacuate a volume, issue the following command:
Note
Robin’s placement algorithm will determine the best disks to evacuate volumes to if target disks are not specified.
# robin disk evacuate <wwn>
--volume <volume>
--to-disks <target_disks>
--exclude-disks <excluded_disks>
|
WWN of disk to evacuate |
|
Name of volume to be evacuated. Note if not provided all the volumes will be evacuated |
|
List of WWNs representing disks to evacuate to |
|
List of WWNs representing disks to avoid evacuating to |
|
Do not prompt the user for confirmation of evacuation |
Example:
# robin disk evacuate 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job: 65 Name: DiskEvacuate State: VALIDATED Error: 0
Job: 65 Name: DiskEvacuate State: WAITING Error: 0
Job: 65 Name: DiskEvacuate State: COMPLETED Error: 0
Evacuates volumes residing on one disk to another.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: PUT
URL Parameters: None
Data Parameters:
action: evacuate
- This mandatory field within the payload specifies that the evacuate operation is to be performed.volume: <volume_name>
- Utilizing this parameter results in only the specified volume being evacuated from the disk. If this is not specified all volumes on the disk are evacuated.target_drives: <list_of_target_drives>
- Utilizing this parameter, by specifying a list of WWNs representing respective target drives, ensures the evacuated volumes will be transferred to one of the specified disks.exclude_drives: <list_of_excluded_drives>
- Utilizing this parameter, by specifying a list of WWNs representing respective excluded drives, ensures the evacuated volumes will not transferred to one of the specified disks.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 145
}
6.7. Unregistering a disk¶
In order to unregister disk from the Robin cluster due to the fact it has physically malfunctioned or it is faulted, issue the following command:
Note
A disk can only be unregistered if there are no storage volumes allocated from it and it is still attached to the node from the perspective of the platform.
# robin disk unregister <wwn>
--yes
|
WWN of disk to unregister |
|
Do not prompt the user for confirmation of disk unregistration |
Example:
# robin disk unregister 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job: 101 Name: DiskUnregister State: VALIDATED Error: 0
Job: 101 Name: DiskUnregister State: WAITING Error: 0
Job: 101 Name: DiskUnregister State: COMPLETED Error: 0
Unregisters a disk from the Robin cluster.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: DELETE
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
Output
{
"jobid": 130
}
6.8. Removing a disk¶
In order to remove disk from the Robin cluster due to the fact it has physically malfunctioned or it is faulted, issue the following command:
Note
A disk can only be removed if there are no storage volumes allocated from it.
# robin disk remove <wwn>
--yes
|
WWN of disk to remove |
|
Do not prompt the user for confirmation of disk removal |
Example:
# robin disk remove 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job: 101 Name: DiskRemove State: VALIDATED Error: 0
Job: 101 Name: DiskRemove State: WAITING Error: 0
Job: 101 Name: DiskRemove State: COMPLETED Error: 0
Removes a disk from the Robin cluster.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: DELETE
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 130
}
6.9. Updating disk properties¶
In certain circumstances, due to the hardware or environmental reasons, Robin may not correctly discover all the attributes of a device correctly. As a result, in order to modify a disk properties such that they are correct, issue the following command:
# robin disk update <wwn>
--maxvolumes <max_volumes>
--maxlatencysensitivevolumes <max_latency_volumes>
--maxthroughputintensivevolumes <max_throughput_volumes>
--role <role>
--type <type>
--all
--set-reattachable
--unset-reattachable
--set-maintenance
--unset-maintenance
|
WWN of disk to update |
|
Max number of volumes allowed on the disk |
|
Max number of latency sensitive volumes allowed on the disk |
|
Max number of throughput intensive volumes allowed on the disk |
|
Role of the disk to update to |
|
Type of the disk to update to |
|
Run the update for all disks. Should be specified if no WWN is given |
|
Mark the disk as re-attachable |
|
Mark the disk as not re-attachable |
|
Put the disk into maintenance mode |
|
Remove the disk from maintenance mode |
Example:
# robin disk update 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --maxvolumes 100 --type SSD --wait --yes
Job: 101 Name: DiskModify State: VALIDATED Error: 0
Job: 101 Name: DiskModify State: WAITING Error: 0
Job: 101 Name: DiskModify State: COMPLETED Error: 0
Modifies a disk’s discovered and/or Robin specific properties.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: PUT
URL Parameters: None
Data Parameters:
action: update
- This mandatory field within the payload specifies that the update operation is to be performed.role: <role>
- Utilizing this parameter results in the role of the disk being set to the value specified. Options include: Storage, Swap, RootDisk, and Reserved.type: <type>
- Utilizing this parameter results in the type of the disk being set to the value specified. Options include: HDD and SSD.maxvolumesperdisk: <max_vols_on_disk>
- Utilizing this parameter results in the maximum number of volumes allowed on the disk being set to the value specified.maxlatencysensitivevolumesperdisk: <max_lat_sens_vols_on_disk>
- Utilizing this parameter results in the maximum number of latency sensitive volumes allowed on the disk being set to the value specified.maxthroughputintensivevolumesperdisk: <max_through_ints_vols_on_disk>
- Utilizing this parameter results in the maximum number of throughput intensive volumes allowed on the disk being set to the value specified.reattachable: [0,1]
- Utilizing this parameter results in the reattachable attribute of the disk being set. By specifying a value of 1, the disk is said to be reattachable and vice versa for a value of 0.maintenance: [0,1]
- Utilizing this parameter results in the maintenance mode attribute of the disk being set. By specifying a value of 1, the disk is said to be in maintenance mode and vice versa for a value of 0.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 54
}
6.10. Unfaulting a disk¶
When the Robin storage stack encounters an IO error on a disk, it will mark the disk as FAULTED. Sometimes the error is NOT due to the disk, but due to environmental or other hardware errors such as controller errors. In this case one can determine that the disk is actually healthy. In turn this command allows a user to inform Robin that the disk which is presumed to be FAULTED is actually healthy and the external issues that caused the IO failure have been resolved. Robin will proceed to mark the disk as ONLINE and resume normal access to the disk. In order to an unfault a disk, issue the following command:
Note
This functionality should be used with extreme care. If the disk is actually bad, and a user utilizes this command to reverse the error reported by the Robin storage stack, it could result in data loss.
# robin disk unfault <wwn>
--yes
|
WWN of disk to unfault |
|
Do not prompt the user for confirmation of unfaulting the drive |
Example:
# robin disk unfault 0xQEMU_QEMU_HARDDISK_3c71c872-fe13-4fd5-b --wait --yes
Job: 101 Name: DiskUnfault State: VALIDATED Error: 0
Job: 101 Name: DiskUnfault State: WAITING Error: 0
Job: 101 Name: DiskUnfault State: COMPLETED Error: 0
Unfaults a disk that has been marked as FAULTED due to an IO error. Note this functionality should be used with extreme care and only in situations where one has determined that external issues such as hardware/controller errors, that have since been resolved, caused the IO failure. If this condition is not met, unfaulting the disk could result in data loss.
End Point: /api/v3/robin_server/disks/<disk_wwn>
Method: PUT
URL Parameters: None
Data Parameters:
action: unfault
- This mandatory field within the payload specifies that the unfault operation is to be performed.wwn: <disk_wwn>
- This mandatory field within the payload specifies the WWN of the disk on which the unfault operation should be performed on.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid": 187
}
6.11. Configure the snapshot space limit of a volume¶
Robin stores snapshot(s) of a volume within a dedicated space, referred to as the snapshot space
of the volume. The snapshot space is considered to be a percentage of the allocated space for a volume (by default this percentage is 40%). However after the volume is provisioned the size of the snapshot might need to be modified. In order to achieve this, issue the following command:
# robin volume snapshot-space-limit <volume_name> <snapshot_space_limit>
Note
Only the snapshot space limit of individual volumes can be configured at this time. To configure the snapshot space limit for an application, update the snapshot space limit of each of the volumes it consists of.
|
Name of the volume to configure the snapshot space limit for |
|
Size of the snapshot space limit in bytes |
Example:
# robin volume snapshot-space-limit pvc-bd01bcd9-9e4d-4a34-ac13-5a98395935b3 4G
Successfully updated volume pvc-bd01bcd9-9e4d-4a34-ac13-5a98395935b3 snapshot space limit to 4G.
6.12. Handling Disruptions¶
With Robin, highly available applications can be deployed on Kubernetes as Robin can handle failures of drives, rack or hosts automatically. On a Baremetal setup, volumes can be setup with a replication factor of 2 or 3 to ensure that storage is available even if a drive fails. Users can also choose the fault domain to be ‘host’ to protect against node reboots or lost.
However, in a public cloud environment the cloud disks can be detached from one cloud node and reattached to another one. For example, in AWS an EBS volume can be detached on one EC2 host and reattached to a different EC2 host. Same with GCP where a PD can be moved across GCE nodes. If a cloud node (EC2, GCE, Azure VM) is terminated or rebooted, one would want any cloud drive attached to them (EBS, PD, Block) to be moved to the one or more of the remaining healthy nodes automatically. This is not limited to just cloud disks, but also SAN LUNS that are offered to Robin as disks. The SAN LUNS can also be multi-mounted onto multiple nodes or moved around from node to node. User can still choose to replicate volume on public cloud as it takes sometime to detach and attach drives on cloud platforms.
Just having the storage available during a disruption will not help if Kubernetes can not access it from the Pod. For example a Kubernetes StatefulSet serializes the mounting and unmounting of a volume to protect against possible corruptions. Robin utilizes smart detection techniques to ensure that even if a volume is mounted on multiple nodes, it can differentiate the IOs issued from the previous stale mount and the new mount. With this consistency guarantees, Robin enables the Kubernetes StatefulSet to unmount a volume from a dead node and remount it on a healthy node where the Pod is scheduled to run. Robin actively monitors these events to allow for the fast failover of the Pods without user intervention and consequently enables users to reliably deploy highly available stateful applications on Kubernetes.
6.13. Hardware RAID Controllers¶
6.13.1. Robin Recommendations¶
Robin recommends not using Redundant Array of Independent Disks (RAID) configurations for the physical storage devices that you plan to format with Robin storage. To understand the reasons for this recommendation, review the explanations presented here. However if RAID usage cannot be avoided, Robin recommends the following:
Use the hardware RAID controllers only to set up RAID-1 protection for boot and OS devices. This ensures that the failure of one device does not take the entire node down.
To protect Robin volumes against node and rack failures, you must configure replication across the fault domains. All RAID configurations (RAID-0, RAID-1, RAID-5, and RAID-6) protect data only against device failure on the node. They reduce the storage capacity of the cluster, thus increasing the data storage costs.
Robín’s replication provides maximum protection against all hardware failures including storage devices, storage nodes, and rack failures.
If a RAID controller has a cache, it must have a battery backup. If the cache on the RAID controller does not have a battery backup, Robin recommends disabling the cache as the data on the cache could be lost during a power outage.
6.13.2. Advantages and Disadvantages¶
The following are the advantages and disadvantages of RAID usage with Robin:
RAID-0 - In RAID-0, data is striped across all devices and provides the best performance. However, if one of the physical devices fails, the data on the RAID volume is lost.
RAID-1 - In RAID-1, data is mirrored across two devices. It gives the most redundancy. However, the performance of the volume is determined by the performance of the slowest device in the mirror and the data capacity is reduced by half.
RAID-5 - In RAID-5, data is striped across “n-1” devices. The controller calculates and writes parity data to the nth device. The RAID-5 configuration can tolerate one device failure and provides good performance. However, if one of the devices fails, the admin has to replace the device immediately and start a RAID rebuild to maintain the redundancy. All devices should be of the same size or the size of the smallest device decides the total usable capacity.
RAID-6 - In RAID-6, data is striped across “n-2” devices. The controller calculates and writes parity to two devices. The RAID-6 configuration can tolerate two device failures and provides good performance. However, the disadvantages are the same as RAID-5.
The striped RAID configurations (RAID-0, RAID-5, and RAID-6) provide better IO performance. However, RAID-0 does not give data protection, and RAID-5 and RAID-6 configurations reduce the storage capacity.
All RAID configurations are managed outside of Robin CNS. Therefore, the administrator must manually manage the device failures and it leads to an increase in operations cost.
If you enable hardware RAID, the RAID controller becomes a single point of failure for all the storage devices on the node. If the RAID controller fails, all data on all the storage devices in the node is lost.
Note
For more details on Robin recommendations for RAID controllers, review the points made here.