5. Managing Nodes¶
5.1. Resource Discovery¶
As part of the Robin storage installation process, resource discovery is run on the node wherein which details about the physical configuration, hardware limits and resource availability are discovered. The purpose of this is two fold. First this process allows Robin to gain a better understanding of the machine in terms of the storage resources it can provide for application deployment as well as allow Robin to better optimize the nodes usage within the cluster.
The following properties of the node are discovered:
Disks
Details on what is captured with regards to each one of the above aspects alongside how they are captured are described below.
5.1.1. Disk Discovery¶
Robin leverages a multitude of sources to discover the disks that are available to a node. Some of the commands and directories used to attain the below details are: lsblk
, partprobe
, pvs
, blkid
and /proc/mounts
. The following details are captured for each disk (if present):
Devpath
Capacity
Physical Sector size
WWN (along with make and model)
Media type
5.1.2. Disk Partitions(LVM) Discovery¶
Sometimes environments like Edge server may not have dedicated Data disks for Robin to consume. In these kind of environments, Disk can be partitioned and Robin would have to use those partition as the Data disk. Today Robin discovers Partition disks and mark that disk as Reserved
to avoid overwriting any user data. Once user decides to use these partitions as Data disk, user can follow below steps to add those partitions to Robin.
Here’s the example where disk with two partitions:
sdb 8:16 0 50G 0 disk
├─sdb1 8:17 0 10G 0 part
└─sdb2 8:18 0 40G 0 part
└─vg-robinds 253:0 0 39G 0 lvm
When Robin discovers above disk it would show up in robin drive list
as:
# robin drive list --role=all
ID | WWN | Host |Path /dev/disk/by-id | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role | Status | LastOpr
---+------------------------------------------------------+--------------+------------------------------------------------------------------------------+----------+---------+------+--------------+------+----------+---------+---------
- | 0xQEMU_QEMU_HARDDISK_561637eb-07d0-4a0d-8 | vnode-89-142 | scsi-0QEMU_QEMU_HARDDISK_561637eb-07d0-4a0d-8 | 50 | N | HDD | 38/38 (100%) | 0/10 | RootDisk | UNKNOWN | INIT
- | 0xQEMU_QEMU_HARDDISK_561637eb-07d0-4a0d-8-vg-robinds | vnode-89-142 | dm-uuid-LVM-DOy7w9WSdGi2PcSERuceOHkfM7dotzr7nm5EuoAWsyAHkvbYGT02MaeWDro05F3R | 39 | N | HDD | 30/30 (100%) | 0/10 | Reserved | UNKNOWN | INIT
Please note the Role of the LVM that we would like to add to our storage pool. We will have to change the role from Reserved
to Storage
by command:
# robin drive update <WWN> --role storage --wait
# robin drive update 0xQEMU_QEMU_HARDDISK_561637eb-07d0-4a0d-8-vg-robinds --role storage --wait
Job: 5922 Name: DiskModify State: PROCESSED Error: 0
Job: 5922 Name: DiskModify State: COMPLETED Error: 0
Once Role is updated to Storage, the uninitialized LVM/drive would show up in the output of robin drive list
# robin drive list
ID | WWN | Host | Path /dev/disk/by-id | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role | Status | LastOpr
---+------------------------------------------------------+--------------+------------------------------------------------------------------------------+----------+---------+------+--------------+------+----------+---------+---------
..
- | 0xQEMU_QEMU_HARDDISK_561637eb-07d0-4a0d-8-vg-robinds | vnode-89-142 | dm-uuid-LVM-DOy7w9WSdGi2PcSERuceOHkfM7dotzr7nm5EuoAWsyAHkvbYGT02MaeWDro05F3R | 39 | N | HDD | 30/30 (100%) | 0/10 | Storage | UNKNOWN | INIT
Now user have to run below command for the host where LVM/Drive is present:
# robin host add-role <hostname> storage
# robin host add-role vnode-89-142 storage --wait
Job: 5923 Name: HostAddRoles State: PROCESSED Error: 0
Job: 5923 Name: HostAddRoles State: WAITING Error: 0
Job: 5923 Name: HostAddRoles State: COMPLETED Error: 0
Now, the LVM/Drive is part of the storage pool and ready for any PVC/Volume requests:
# robin drive list
ID | WWN | Host | Path /dev/disk/by-id | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role | Status | LastOpr
---+------------------------------------------------------+--------------+------------------------------------------------------------------------------+----------+---------+------+--------------+------+----------+---------+---------
..
4 | 0xQEMU_QEMU_HARDDISK_561637eb-07d0-4a0d-8-vg-robinds | vnode-89-142 | dm-uuid-LVM-DOy7w9WSdGi2PcSERuceOHkfM7dotzr7nm5EuoAWsyAHkvbYGT02MaeWDro05F3R | 39 | N | HDD | 30/30 (100%) | 0/10 | Storage | ONLINE | READY
So in summary, below commands are required to add LVM/Disk as storage disks:
# robin drive update <WWN> --role storage --wait
# robin host add-role <hostname> storage --wait
5.2. Robin Node Roles¶
There are two Robin roles that are assigned to hosts: Manager and Storage.
The Manager role is designated to a node which is intended to be part of the Robin Control Plane. The first node which is added as a Manager will be considered the master and have essential Robin services, including the RCM server, file server and event server, running on it. As a result, it will essentially control and manage (hence the name of the role) the agent nodes. This entails handling all external communication with Robin via APIs, maintaining the most current replica of the PostgreSQL database etc. However if a failover occurs, these services are moved and run on the newly elected master. For every Robin cluster there is a maximum of 3 Manager nodes (this is necessary for HA installations).
The Storage role is designated to a node which is intended to provide storage, as indicated by its name, for applications deployed on Robin. As a result, any volumes needed for deployed applications will be created and mounted on devices on nodes with this role set.
The following commands are described in this section:
|
Move one or more role(s) out of maintenance mode for a host |
|
Move one or more role(s) into of maintenance mode for a host |
5.2.1. Enabling role(s)¶
To move a role, which is already added to a host, out of maintenance mode and thus enable it for use again, issue the following command:
# robin host enable-role [<host>] [<roles>]
|
Fully qualified hostname |
|
Valid values include: ‘storage’ |
Example:
# robin host enable-role centos-60-212.robinsystems.com Storage
Role(s) 'Storage' enabled on host centos-60-212.robinsystems.com
Enables a role that was previously disabled on a host.
End Point: /api/v3/robin_server/hosts/<hostname>
Method: PUT
URL Parameters: None
Data Parameters:
action: enable-role
- This mandatory field within the payload specifies that the enable role operation is to be performed.roles: <list_of_roles>
- This mandatory field within the payload is a list of roles that should be enabled on the specified host.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response: On success the reponse is empty.
5.2.2. Disabling role(s)¶
Within Robin when one disables a role, the role is said to be put into maintenance mode. This in turn means that for all intents and purposes the host does not have access to this role. This is useful for debugging purposes and to temporarily reserve the hosts resources. To move a role into maintenance mode and thus disable it for use, issue the following command:
# robin host disable-role [<host>] [<roles>]
|
Fully qualified hostname |
|
Valid values include: ‘storage’ |
Example:
# robin host disable-role centos-60-212.robinsystems.com Storage
Role(s) 'Storage' disabled on host centos-60-212.robinsystems.com
Disables a role for a host such that the host temporarily does not have access to it.
End Point: /api/v3/robin_server/hosts/<hostname>
Method: PUT
URL Parameters: None
Data Parameters:
action: disable-role
- This mandatory field within the payload specifies that the disable role operation is to be performed.roles: <list_of_roles>
- This mandatory field within the payload is a list of roles that should be disabled on the specified host. Valid values include: ‘storage’.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response: On success the reponse is empty.
5.3. Managing File Collections¶
File collections are repositories in a Robin cluster where application bundles and images required for the provisioning of applications are stored. Robin File Server, one of the core Robin management services, is responsible for managing file collections and the files they contain. Storage for each file collection is allocated from Robin managed storage. The volumes allocated for each file collection can be configured with or without replication, however it is highly recommended that replicated storage be used. This will ensure that there will not be any data loss should one of the hosts providing the storage go down or one of storage disks fail.
The following commands are described in this section:
|
Create a file collection |
|
Delete a file collection |
|
List all file collections |
|
Disable a file collection |
|
Enable a file collection |
5.3.1. Creating a file collection¶
In order to create a file collection in order to store logs, images, bundles etc, issue the following the command:
# robin collection create <media_type> <rpool>
--collection_type <collection_type>
--storage_type <storage_type>
--replicas <replicas>
--size <size>
--force
|
The media type of storage drives to allocate from. Options include ‘HDD’ or ‘SSD’ |
|
Name of the resource pool to allocate from. |
|
Type of collection to create. Options include ‘LOG_COLLECTION’ or ‘FILE_COLLECTION’ |
|
Type of storage to use. Options include ‘STORMGR_VOLUME’, ‘LOCAL’, or ‘STORAGE_ARRAY’ |
|
Replication factor of the collection volume |
|
Size of allocated storage volume |
|
Override the requirement that the number of replicas must be set to 3 |
Example:
# robin collection create HDD default --replicas 1 --size 5G --force --wait
Job: 216 Name: CollectionAdd State: PROCESSED Error: 0
Job: 216 Name: CollectionAdd State: WAITING Error: 0
Job: 216 Name: CollectionAdd State: COMPLETED Error: 0
Creates a file collection in order to store logs, images, bundles etc.
End Point: /api/v3/robin_server/collections
Method: POST
URL Parameters: None
Data Parameters:
media: <media_type>
- This mandatory field within the payload specifies the media type of storage drives to allocate from. Options include ‘HDD’ or ‘SSD’.rpool: <resource_pool>
- This mandatory field within the payload specifies the resource pool name from which to allocate from.collection_type: <collection_type>
- This mandatory field within the payload specifies the type of collection create. Options include ‘LOG_COLLECTION’ or ‘FILE_COLLECTION’.storage_type: <storage_type>
- This mandatory field within the payload specifies the type of storage to use. Options include ‘STORMGR_VOLUME’, ‘LOCAL’, or ‘STORAGE_ARRAY’.replicas: <replicas>
- This mandatory field within the payload specifies the replication factor of the collection volume. Valid values include: 1, 2, and 3.size: <size>
- Utilizing this parameter, by specifying the size of the collection volume in bytes, results in a volume of the aforementioned size being created. The default size is 50GB.force: true
- Utilizing this parameter allows one to override the requirment that the number of replicas for a collection value must be 3.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"plan":{
"collection_type":"FILE_COLLECTION",
"storage":[
{
"size":5368709120,
"rpool":"default",
"faultdomain":"host",
"replication":1,
"media":"HDD"
}
],
"storage_type":"STORMGR_VOLUME",
"authorization_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0ZW5hbnRfaWQiOjEsInVzZXJfaWQiOjMsImV4cCI6MTYwMDIyMjY5NH0.qsv3pBCXAfGnS1JrhvncQiubEqQcyPhV2fzGse8aA-A",
"hostname":"vnode-95-42.robinsystems.com",
"collection_id":1600138723311,
"force":true,
"collection_name":"file-collection-1600138723311"
},
"jobid":48
}
5.3.2. Deleting a file collection¶
In order to delete a file collection, issue the following the command:
# robin collection delete <collection_id>
--force
--yes
|
The media type of storage drives to allocate from. Options include HDD or SSD |
|
Delete a collection forcibly if the backing storage is down |
|
Do not prompt the user for confirmation of deletion |
Example:
# robin collection delete 1583820760654 --yes --wait
Job: 221 Name: CollectionDelete State: VALIDATED Error: 0
Job: 221 Name: CollectionDelete State: COMPLETED Error: 0
Deletes a file collection.
End Point: /api/v3/robin_server/collections
Method: DELETE
URL Parameters: None
Data Parameters:
collection_id: <collection_id>
- This mandatory field within the payload specifies the ID of the collection that should be deleted.force: true
- Utilizing this parameter results in the collection being forcibly removed if the backing storage is down.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid":81,
"plan":{
"collection_id":1597897988633,
"authorization_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjozLCJ0ZW5hbnRfaWQiOjEsImV4cCI6MTU5Nzk2NzM1M30.Tcv6pAb66lC2k0VJpLj4oRT1elK99Qi7WBv_7Iid_zo"
}
}
5.3.3. Listing all file collections¶
In order to view all file collections available on the cluster alongside details such the host it is mounted on, its size, replication factor etc, issue the following command:
# robin collection list --name <name>
--full
--json
|
File collection name to filter by |
|
Display additional information about each file collection |
|
Output in JSON |
Example:
# robin collection list
Collection Id | Collection Type | Storage Type | Name | Hostname | Status | LastOpr | Used | Size (GB) | Replication Factor
--------------+-----------------+----------------+-------------------------------+--------------------------+--------+---------+-------+-----------+--------------------
1583356117884 | FILE_COLLECTION | STORMGR_VOLUME | file-collection-1583356117884 | vnode36.robinsystems.com | READY | Online | 1.3M | 10 | 1
1583820760654 | FILE_COLLECTION | STORMGR_VOLUME | file-collection-1583820760654 | vnode36.robinsystems.com | READY | Online | 2.2M | 5 | 1
Returns information on all file collections available on the cluster alongside details such the host it is mounted on, its size, replication factor etc.
End Point: /api/v3/robin_server/collections
Method: GET
URL Parameters:
name=<collection_name>
: Utilizing this parameter results in only information for the file collection with the specified name being returned.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"collections":[
{
"notifications":{
},
"size":5368709120,
"id":1,
"used":"1.3M",
"name":"file-collection-1597122699552",
"pathname":"\/usr\/local\/robin\/collections\/file-collection-1597122699552",
"state":"Online",
"status":"READY",
"timestamp":"August 10, 2020 22:11:39",
"collection_type":"FILE_COLLECTION",
"replication":1,
"storage_type":"STORMGR_VOLUME",
"hostname":"cscale-82-140.robinsystems.com",
"collection_id":1597122699552
}
]
}
5.3.4. Disabling a file collection¶
In order to disable a file collection such that it temporarily cannot be used/accessed, issue the following command:
# robin collection offline <collection_id>
|
File collection to disable |
Example:
# robin collection offline 1583820760654 --wait
Job: 218 Name: CollectionOffline State: PROCESSED Error: 0
Job: 218 Name: CollectionOffline State: COMPLETED Error: 0
# robin collection list
Collection Id | Collection Type | Storage Type | Name | Hostname | Status | LastOpr | Size (GB) | Replication Factor
--------------+-----------------+----------------+-------------------------------+--------------------------+-----------+---------+-----------+--------------------
1583356117884 | FILE_COLLECTION | STORMGR_VOLUME | file-collection-1583356117884 | vnode36.robinsystems.com | READY | Online | 10 | 1
1583820760654 | FILE_COLLECTION | STORMGR_VOLUME | file-collection-1583820760654 | vnode36.robinsystems.com | NOT_READY | Offline | 5 | 1
Disable a file collection such that it temporarily cannot be used/accessed.
End Point: /api/v3/robin_server/collections
Method: PUT
URL Parameters: None
Data Parameters:
action: offline
- This mandatory field within the payload specifies that the offline operation is to be performed.collection_id: <collection_id>
- This mandatory field within the payload specifies which collection should be disabled.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"plan":{
"authorization_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0ZW5hbnRfaWQiOjEsInVzZXJfaWQiOjMsImV4cCI6MTU5Nzc2NjE1NH0._D_s9sj_6HoOQkqytCGqlgtL6aIsp1oym6BOwntM8iU",
"origin":1,
"collection_id":1597122699552
},
"jobid":2073
}
5.3.5. Enabling a file collection¶
In order to enable a file collection such that it is usable again by remounting it on the master node of the cluster, issue the following command:
# robin collection online <collection_id>
|
File collection to enable |
Example:
# robin collection online 1583820760654 --wait
Job: 219 Name: CollectionOnline State: PROCESSED Error: 0
Job: 219 Name: CollectionOnline State: COMPLETED Error: 0
# robin collection list
Collection Id | Collection Type | Storage Type | Name | Hostname | Status | LastOpr | Size (GB) | Replication Factor
--------------+-----------------+----------------+-------------------------------+--------------------------+--------+---------+-----------+--------------------
1583356117884 | FILE_COLLECTION | STORMGR_VOLUME | file-collection-1583356117884 | vnode36.robinsystems.com | READY | Online | 10 | 1
1583820760654 | FILE_COLLECTION | STORMGR_VOLUME | file-collection-1583820760654 | vnode36.robinsystems.com | READY | Online | 5 | 1
Enables a file collection such that it can be used again by remounting it on the master node of the cluster.
End Point: /api/v3/robin_server/collections
Method: PUT
URL Parameters: None
Data Parameters:
action: online
- This mandatory field within the payload specifies that the online operation is to be performed.collection_id: <collection_id>
- This mandatory field within the payload specifies which collection should be enabled.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"jobid":2108,
"plan":{
"origin":1,
"collection_id":1597122699552,
"authorization_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0ZW5hbnRfaWQiOjEsInVzZXJfaWQiOjMsImV4cCI6MTU5Nzc2NjE1NH0._D_s9sj_6HoOQkqytCGqlgtL6aIsp1oym6BOwntM8iU"
}
}
5.4. Gathering information on nodes¶
Robin exposes multiple endpoints that provides a user the means by which to attain information about the hosts registered with a Robin cluster. The information that is returned is a combination of the physical attributes (obtained by the resource discovery described here), resource utilization and status of services for the host. This gives a user insight into the state of the cluster alongside granular details for each individual host and enables application deployment planning.
The following commands are described in this section:
|
View all hosts in a cluster |
|
Display detailed information about a host |
5.4.1. List all hosts¶
In order to view all hosts within a cluster alongside information on their statuses (from Robin’s perspective), resource consumption, and roles within the cluster, issue the following command:
# robin host list --services
--resources
--tags
--json
|
Show status information for each host |
|
Show resource utilization for each host |
|
Show tag information for each host |
|
Output in JSON |
Example 1 (Listing all hosts):
# robin host list
Id | Hostname | Version | Status | RPool | LastOpr | Roles | Isol Cores(SHR/DED/Total) | Non-Isol Cores | GPUs | Mem(Free/Alloc/Total) | HDD(#/Alloc/Total) | SSD(#/Alloc/Total) | Pod Usage | Joined Time
-------------+-------------------------------+-----------+--------+---------+---------+--------+---------------------------+----------------+------+-----------------------+--------------------+--------------------+-----------+----------------------
1596566663:1 | cscale-82-81.robinsystems.com | 5.3.0-172 | Ready | default | ONLINE | M*,S | 0/0/0 | 1/40 | 0/0 | 24G/6G/31G | 1/5G/100G | -/-/- | 11/110 | 04 Aug 2020 04:44:47
1596566663:2 | cscale-82-82.robinsystems.com | 5.3.0-172 | Ready | default | ONLINE | M,S | 0/0/0 | 2/40 | 0/0 | 24G/7G/31G | 2/40G/200G | -/-/- | 13/110 | 04 Aug 2020 04:51:52
1596566663:3 | cscale-82-83.robinsystems.com | 5.3.0-172 | Ready | default | ONLINE | M,S | 0/0/0 | 1/40 | 0/0 | 24G/6G/31G | 2/-/200G | -/-/- | 11/110 | 04 Aug 2020 04:58:25
1596566663:4 | qct-07.robinsystems.com | 5.3.0-172 | Ready | workers | ONLINE | S | 8/68/76 | 1/4 | 0/0 | 280G/95G/376G | 1/554G/893G | -/-/- | 83/110 | 04 Aug 2020 05:05:24
1596566663:5 | qct-08.robinsystems.com | 5.3.0-172 | Ready | workers | ONLINE | S | 0/76/76 | 1/4 | 0/0 | 284G/91G/376G | 1/547G/893G | -/-/- | 88/110 | 04 Aug 2020 05:12:06
1596566663:6 | qct-11.robinsystems.com | 5.3.0-172 | Ready | workers | ONLINE | S | 0/27/76 | 1/4 | 0/0 | 147G/40G/187G | 1/581G/893G | -/-/- | 34/110 | 04 Aug 2020 05:18:47
1596566663:7 | cscale-82-80.robinsystems.com | 5.3.0-172 | Ready | workers | ONLINE | C | 0/21/36 | 1/40 | 0/0 | 0.7G/30G/31G | 1/-/100G | -/-/- | 28/110 | 04 Aug 2020 20:21:40
Example 2 (Retrieving status information):
# robin host list --services
+-------------------------------+-------+-------+------+-------+-------+------+-------+------+------+------+-------+------+------+-------+---------+---------+
| Host | ConCl | ConSr | GCli | Httpd | Iomgr | RMon | Pgsql | RAgt | RAer | REvt | RFile | NMon | RSer | RWdog | Metrics | Stormgr |
+-------------------------------+-------+-------+------+-------+-------+------+-------+------+------+------+-------+------+------+-------+---------+---------+
| cscale-82-81.robinsystems.com | DOWN | UP | UP | UP | UP | UP | UP | UP | UP | UP | UP | UP | UP | UP | DOWN | UP |
| cscale-82-82.robinsystems.com | DOWN | UP | UP | UP | UP | UP | UP | UP | DOWN | DOWN | DOWN | DOWN | DOWN | UP | DOWN | DOWN |
| cscale-82-83.robinsystems.com | DOWN | UP | UP | UP | UP | UP | UP | UP | DOWN | DOWN | DOWN | DOWN | DOWN | UP | DOWN | DOWN |
| qct-07.robinsystems.com | UP | DOWN | UP | UP | UP | UP | DOWN | UP | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN |
| qct-08.robinsystems.com | UP | DOWN | UP | UP | UP | UP | DOWN | UP | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN |
| qct-11.robinsystems.com | UP | DOWN | UP | UP | UP | UP | DOWN | UP | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN |
| cscale-82-80.robinsystems.com | UP | DOWN | UP | UP | UP | UP | DOWN | UP | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN | DOWN |
+-------------------------------+-------+-------+------+-------+-------+------+-------+------+------+------+-------+------+------+-------+---------+---------+
UP: Running
CRIT: Critical and Down
DOWN: Not Running
Returns information on all hosts within a cluster including details on their statuses (from Robin’s perspective), resource consumption, and roles within the cluster.
End Point: /api/v5/robin_server/hosts
Method: GET
URL Parameters:
details=tags
: Utilizing this parameter results in tag information for each host being present in the response payload.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"items":[
{
"memory_used":2692743168,
"memory":33555709952,
"isol_shared_map":{
},
"zoneid":1596601846,
"non_isol_cores_used":3,
"pods_used":26,
"rack":"default",
"napps":2,
"non_isol_total":400,
"k8s_node_name":"cscale-82-140",
"mem_for_storage":1073741824,
"id":1,
"lab":"default",
"gpu_cores_allocated":0,
"isol_dedicated_cores_used":0,
"roles":[
[
"MANAGER",
"ONLINE",
"READY"
],
[
"COMPUTE",
"ONLINE",
"READY"
],
[
"STORAGE",
"ONLINE",
"READY"
]
],
"cpu_cores_used":0,
"cpu_prov_factor":10,
"services":"{\"update_time\":1596761892.3700919151,\"services\":{\"consul_dns\":true,\"stormgr-server\":{\"Id\":\"stormgr-server\",\"MainPID\":2299,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:31:26.794916\",\"ActiveState\":\"active\"},\"gui-cli\":{\"Id\":\"gui-cli\",\"MainPID\":2647,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:31:32.291459\",\"ActiveState\":\"active\"},\"consul-client\":{\"Id\":\"consul-client\",\"MainPID\":0,\"Type\":\"simple\",\"ExecMainStartTimestamp\":0,\"ActiveState\":\"inactive\"},\"httpd\":{\"Id\":\"httpd\",\"MainPID\":2613,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:31:32.124472\",\"ActiveState\":\"active\"},\"robin-node-monitor\":{\"Id\":\"robin-node-monitor\",\"MainPID\":1278,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:31:13.459063\",\"ActiveState\":\"active\"},\"iomgr-server\":{\"Id\":\"iomgr-server\",\"MainPID\":7384,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:55:13.851492\",\"ActiveState\":\"active\"},\"robin-event-server\":{\"Id\":\"robin-event-server\",\"MainPID\":1039,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:30:55.322709\",\"ActiveState\":\"active\"},\"consul-server\":{\"Id\":\"consul-server\",\"MainPID\":564,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:30:32.432940\",\"ActiveState\":\"active\"},\"consul_members\":[{\"DelegateMax\":5,\"ProtocolMin\":1,\"Port\":29460,\"Status\":1,\"ProtocolMax\":5,\"DelegateCur\":4,\"ProtocolCur\":2,\"Name\":\"cscale-82-140.robinsystems.com\",\"Tags\":{\"dc\":\"consul\",\"role\":\"consul\",\"vsn\":\"2\",\"wan_join_port\":\"29461\",\"segment\":\"\",\"port\":\"29459\",\"raft_vsn\":\"2\",\"vsn_min\":\"2\",\"vsn_max\":\"3\",\"id\":\"9dbc13cd-bbb4-1bf1-9bcd-f3d7e0f0026f\",\"bootstrap\":\"1\",\"build\":\"0.9.4:40f243a+\"},\"Addr\":\"10.9.82.140\",\"DelegateMin\":2}],\"robin-file-server\":{\"Id\":\"robin-file-server\",\"MainPID\":1071,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:30:55.801664\",\"ActiveState\":\"active\"},\"robin-watchdog\":{\"Id\":\"robin-watchdog\",\"MainPID\":860,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:30:37.982382\",\"ActiveState\":\"active\"},\"sherlock-server\":{\"Id\":\"sherlock-server\",\"MainPID\":0,\"Type\":\"simple\",\"ExecMainStartTimestamp\":0,\"ActiveState\":\"inactive\"},\"robin-agent\":{\"Id\":\"robin-agent\",\"MainPID\":9186,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:59:23.165676\",\"ActiveState\":\"active\"},\"postgresql-9.6\":{\"Id\":\"postgresql-9.6\",\"MainPID\":660,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:30:34.988682\",\"ActiveState\":\"active\"},\"monitor-server\":{\"Id\":\"monitor-server\",\"MainPID\":2687,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:31:32.455445\",\"ActiveState\":\"active\"},\"robin-server\":{\"Id\":\"robin-server\",\"MainPID\":64400,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-06 17:57:55.208502\",\"ActiveState\":\"active\"},\"robin-auth-server\":{\"Id\":\"robin-auth-server\",\"MainPID\":1010,\"Type\":\"simple\",\"ExecMainStartTimestamp\":\"2020-08-04 14:30:50.844131\",\"ActiveState\":\"active\"}}}",
"gpu_cores":0,
"sysmem":[
33555709952,
2689277952,
7223099392,
0,
0,
0,
22075457536,
843776
],
"ssd_faulted":0,
"isol_shared_cores_used":0,
"hugepages_1g":0,
"ninstances":2,
"maintenance_mode":"DISABLED",
"memory_allocated":0,
"hostname":"cscale-82-140.robinsystems.com",
"datacenter":"default",
"tags":{
"kubernetes.io\/os":[
"linux"
],
"robin.io\/robinrpool":[
"default"
],
"kubernetes.io\/arch":[
"amd64"
]
},
"remove_taint":true,
"hugepages_2m_allocated":0,
"rpool":"default",
"pods":110,
"state":"ONLINE",
"name":"cscale-82-140",
"rcm_ha_role":"MANAGER_MASTER",
"isol_total":0,
"hugepages_2m":0,
"hdd_faulted":0,
"ipaddresses":[
{
"mac_address":"00:15:5d:14:06:0e",
"netmask":"255.255.0.0",
"ip_address":"10.9.82.140"
}
],
"cpu_cores_allocated":0,
"memory_reserved":6442450944.0,
"cpu_cores":40,
"hugepages_1g_allocated":0,
"k8s_node_status":"Ready",
"status":"Ready",
"nics":[
{
"allowed_vlans":[
],
"function":null,
"numa_node":null,
"vendor":null,
"mtu":1500,
"mac_address":"00:15:5d:14:06:0e",
"bus":null,
"name":"br0",
"physical_nic":"eth0",
"num_vfs":0,
"linkstate":"",
"all_vlans_allowed":false,
"used_vfs":0,
"native_vfdriver":null,
"native_vlan":null,
"vendor_desc":null,
"domain":null,
"untagged":false,
"slot":null,
"vfdrivers":[
]
}
],
"public_hostname":"cscale-82-140.robinsystems.com",
"cpu_cores_present":40,
"sysinfo":{
"join_time":1596576678,
"current_version":"5.3.0-171",
"iqn":"iqn.1994-05.com.redhat:329b8568de1",
"install_date":"Tue Mar 17 23:49:17 UTC 2020",
"wwpns":[
],
"distribution":"CentOS Linux",
"version":"#1 SMP Tue Mar 17 23:49:17 UTC 2020",
"uuid":"",
"boot_time":1596576222,
"robin_software":[
{
"version":"5.3.0",
"patch":"",
"full_version":"5.3.0-171",
"install_date":"2020-08-03",
"patch_date":"",
"release":"171",
"build_info":"robin-c2edf85eaa83a42ced9512e7de9c7c2f1e4fa962:robin-ui:9ee33fd00273ba19861d4dc3ef8c6169d822d3e0:robingraph:cf0ceefe696ccac2dbd2eeb1d28b859955452843"
}
],
"release":"3.10.0-1062.18.1.el7.x86_64",
"system":"Linux",
"processor":"x86_64"
},
"disks":[
{
"spf":0.8,
"state":"READY",
"type":"HDD",
"zoneid":1596601846,
"dev":"\/dev\/sdb",
"max_alloc_slices":77,
"free_alloc_slices":68,
"model":null,
"allocated":7,
"maintenance_mode":"DISABLED",
"max_throughput_intensive_vols_per_disk":1,
"role":"Storage",
"wwn":"0x600224804c48fd7e16c608dea0919064",
"status":"ONLINE",
"make":null,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224804c48fd7e16c608dea0919064",
"alloc_slices":9,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"capacity":107374182400,
"node_ref":1,
"max_latency_sensitive_vols_per_disk":2,
"pused":234881024,
"pfree":104287174656
},
{
"spf":0.8,
"state":"READY",
"type":"HDD",
"zoneid":1596601846,
"dev":"\/dev\/sdc",
"max_alloc_slices":77,
"free_alloc_slices":53,
"model":null,
"allocated":20,
"maintenance_mode":"DISABLED",
"max_throughput_intensive_vols_per_disk":1,
"role":"Storage",
"wwn":"0x600224803bcdafde95b1f5cd27ceb5fb",
"status":"ONLINE",
"make":null,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224803bcdafde95b1f5cd27ceb5fb",
"alloc_slices":24,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"capacity":107374182400,
"node_ref":1,
"max_latency_sensitive_vols_per_disk":2,
"pused":939524096,
"pfree":103582531584
},
{
"spf":0.8,
"state":"INIT",
"type":"HDD",
"zoneid":1596601846,
"dev":"\/dev\/dm-1",
"max_alloc_slices":5,
"free_alloc_slices":5,
"model":null,
"allocated":0,
"maintenance_mode":"DISABLED",
"max_throughput_intensive_vols_per_disk":1,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-swap",
"status":"UNKNOWN",
"make":null,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphaFy4aq3EUo1yluonS8FG0LF16ycBrdEw",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"capacity":8254390272,
"node_ref":1,
"max_latency_sensitive_vols_per_disk":2,
"pused":0,
"pfree":0
},
{
"spf":0.8,
"state":"INIT",
"type":"HDD",
"zoneid":1596601846,
"dev":"\/dev\/dm-0",
"max_alloc_slices":38,
"free_alloc_slices":38,
"model":null,
"allocated":0,
"maintenance_mode":"DISABLED",
"max_throughput_intensive_vols_per_disk":1,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-root",
"status":"UNKNOWN",
"make":null,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphgpZcvqGdfOKaXbEbOZzNthc6btsoSXDj",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"capacity":53687091200,
"node_ref":1,
"max_latency_sensitive_vols_per_disk":2,
"pused":0,
"pfree":0
},
{
"spf":0.8,
"state":"INIT",
"type":"HDD",
"zoneid":1596601846,
"dev":"\/dev\/dm-2",
"max_alloc_slices":32,
"free_alloc_slices":32,
"model":null,
"allocated":0,
"maintenance_mode":"DISABLED",
"max_throughput_intensive_vols_per_disk":1,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-home",
"status":"UNKNOWN",
"make":null,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphQObDlS6eMUSpSxH5zsvyg9I5a0Gpuj5W",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"capacity":44350570496,
"node_ref":1,
"max_latency_sensitive_vols_per_disk":2,
"pused":0,
"pfree":0
},
{
"spf":0.8,
"state":"INIT",
"type":"HDD",
"zoneid":1596601846,
"dev":"\/dev\/sda",
"max_alloc_slices":77,
"free_alloc_slices":77,
"model":null,
"allocated":0,
"maintenance_mode":"DISABLED",
"max_throughput_intensive_vols_per_disk":1,
"role":"RootDisk",
"wwn":"0x600224801d3ac9b6650afd3280aa5898",
"status":"UNKNOWN",
"make":null,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224801d3ac9b6650afd3280aa5898",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"protected":0,
"capacity":107374182400,
"node_ref":1,
"max_latency_sensitive_vols_per_disk":2,
"pused":0,
"pfree":0
}
]
}
],
"total":1,
"page_num":1,
"nodes_count":1,
"num_items":1,
"page_size":1
}
5.4.2. Show information about a specific host¶
In order to display detailed information for a host such as the storage allocation breakdown, discovered physical attributes with their utilization (NUMA configuration, network topology etc.) and service details, issue the following command:
# robin host info <hostname>
--services
--resources
--config
--consul
--json
|
FQDN of host |
|
Show status information for the host |
|
Show resource utilization for the host |
|
Show config info |
|
Show consul cluster info |
|
Output in JSON |
Example:
# robin host info poch01.robin.io
Output
Host: qct-07.robinsystems.com
Zone Id: 1596566663
Host Id: 4
Type: physical
Version: 5.3.0-172
Kernel Version: 3.10.0-1062.el7.x86_64
Boot Time: 04 Aug 2020 03:18:01
Resource pool: workers
CPU:
Total Cores: 80
Total Isolated Cores: 76
Total Non-Isolated Cores: 4
Non-Isolated CPUs allocated: 1
Shared Isolated CPUs allocated: 8
Dedicated Isolated CPUs allocated: 68
Provisioning Factor: 1
NUMA Topology:
Node 0:
Total Memory: 187G
Total Isolated CPUs: 38
Total Non-Isolated CPUs: 2
Total Reserved CPUs: 0
Non-Isolated Pinned CPUs: 0
Isolated Shared Pinned CPUs: 0
Isolated Dedicated Pinned CPUs: 38
Total HugePages_1G: -
Total HugePages_2M: -
CPU List: 1-19,41-59
NIC List: enp94s0f0,enp59s0f0,enp59s0f1,enp94s0f1
Node 1:
Total Memory: 188G
Total Isolated CPUs: 38
Total Non-Isolated CPUs: 2
Total Reserved CPUs: 0
Non-Isolated Pinned CPUs: 0
Isolated Shared Pinned CPUs: 8
Isolated Dedicated Pinned CPUs: 30
Total HugePages_1G: -
Total HugePages_2M: -
CPU List: 21-39,61-79
NIC List: enp175s0f1,enp175s0f0
GPU:
Total Cores: 0
Memory:
System Total: 376G
Allocatable Total: 376G
Reserved: 6G
Robin Manager services: -
Robin Compute services: 4G
Robin Storage services: 2G
Memory allocated to instances: 88G
Free Total: 292G
HugePages_2M:
Total: -
Allocated for Robin apps: -
HugePages_1G:
Total: -
Allocated for Robin apps: -
POD Utilization: 83/110
Network:
Bridge Interface: br0
Physical Interface: enp94s0f1
MTU: 1500
Product Info: 158B - Ethernet Controller XXV710 for 25GbE SFP28 (Ethernet Network Adapter XXV710)
Vendor Info: 8086 - Intel Corporation
NUMA Node: 0
H/W Info: 0000:5e:00.1
IP Addresses: 10.9.20.15/16
Interface: enp175s0f1
MTU: 1500
Product Info: 158B - Ethernet Controller XXV710 for 25GbE SFP28 (Ethernet Network Adapter XXV710)
Vendor Info: 8086 - Intel Corporation
NUMA Node: 1
H/W Info: 0000:af:00.1
Interface: enp175s0f0
MTU: 1500
Product Info: 158B - Ethernet Controller XXV710 for 25GbE SFP28 (Ethernet Network Adapter XXV710-2)
Vendor Info: 8086 - Intel Corporation
NUMA Node: 1
H/W Info: 0000:af:00.0
Interface: enp94s0f0
MTU: 1500
Product Info: 158B - Ethernet Controller XXV710 for 25GbE SFP28 (Ethernet Network Adapter OCP XXV710-2)
Vendor Info: 8086 - Intel Corporation
NUMA Node: 0
H/W Info: 0000:5e:00.0
Interface: enp59s0f0
MTU: 1500
Product Info: 158B - Ethernet Controller XXV710 for 25GbE SFP28 (Ethernet Network Adapter XXV710-2)
Vendor Info: 8086 - Intel Corporation
NUMA Node: 0
H/W Info: 0000:3b:00.0
Interface: enp59s0f1
MTU: 1500
Product Info: 158B - Ethernet Controller XXV710 for 25GbE SFP28 (Ethernet Network Adapter XXV710)
Vendor Info: 8086 - Intel Corporation
NUMA Node: 0
H/W Info: 0000:3b:00.1
Public IP Address: 10.9.20.15
Public Hostname: qct-07.robinsystems.com
Instances: 69
State: ONLINE
Status: Ready
K8s_Node_Status: Ready
Maintenance Mode: DISABLED
Consul state: UP
Roles:
STORAGE: ONLINE Status:READY
COMPUTE: ONLINE Status:READY
Storage:
Type | Used (GB) | Robin Allocated (GB) | K8s Allocated (GB) | Total (GB)
-----+-----------+----------------------+--------------------+------------
HDD | 38 | 538 | 16 | 893
SSD | - | - | - | -
Services:
Name | State | RoleTags | PID | Started
-------------------+-------+----------+------+----------------------------
consul-client | UP | A | 351 | 2020-08-04 17:51:31.538657
consul-server | DOWN | M | 0 | 0
gui-cli | UP | - | 1132 | 2020-08-04 17:51:50.479143
httpd | UP | * | 1077 | 2020-08-04 17:51:50.116133
iomgr-server | UP | C S | 5391 | 2020-08-04 17:52:16.467809
monitor-server | UP | M C S | 539 | 2020-08-04 17:51:42.748944
postgresql-9.6 | DOWN | M | 0 | 0
robin-agent | UP | M C S | 600 | 2020-08-04 17:51:45.394012
robin-auth-server | DOWN | M* | 0 | 0
robin-event-server | DOWN | M* | 0 | 0
robin-file-server | DOWN | M* | 0 | 0
robin-node-monitor | DOWN | M* | 0 | 0
robin-server | DOWN | M* | 0 | 0
robin-watchdog | DOWN | M | 0 | 0
sherlock-server | DOWN | - | 0 | 0
stormgr-server | DOWN | M* | 0 | 0
Last updated (04 Aug 2020 18:40:14)
UP: Running
CRIT: Critical and Down
DOWN: Not Running
Root Disk storage info:
Partition | Name | Size (GB) | Available (GB)
--------------------------+-----------------+-----------+----------------
/var/log | RobinLog | 299 | 268
/var/lib/pgsql | Pgsql | 299 | 268
/var/crash | Crash | 299 | 268
/var/lib/robin | RobinLib | 299 | 268
/var/lib/[appropriateCRI] | ContainerImages | 15 | -
Unused container images: 6G
Image | Size (GB)
--------------------------------------------+-----------
k8s.gcr.io/kube-controller-manager:v1.18.6 | 0.15
k8s.gcr.io/kube-apiserver:v1.18.6 | 0.16
k8s.gcr.io/kube-scheduler:v1.18.6 | 0.09
robinsys/robinimg:5.2.7-18 | 3
quay.io/k8scsi/csi-provisioner:v1.6.0_robin | 0.04
k8s.gcr.io/kube-proxy:v1.17.5 | 0.11
k8s.gcr.io/kube-apiserver:v1.17.5 | 0.16
k8s.gcr.io/kube-controller-manager:v1.17.5 | 0.15
k8s.gcr.io/kube-scheduler:v1.17.5 | 0.09
quay.io/k8scsi/snapshot-controller:v2.1.0 | 0.04
k8s.gcr.io/pause:3.2 | 0.0
prom/prometheus:v2.16.0 | 0.12
quay.io/k8scsi/csi-attacher:v2.1.0 | 0.04
calico/typha:v3.11.1 | 0.05
calico/pod2daemon-flexvol:v3.11.1 | 0.1
calico/cni:v3.11.1 | 0.18
calico/kube-controllers:v3.11.1 | 0.05
quay.io/k8scsi/csi-provisioner:v1.4.0_robin | 0.05
k8s.gcr.io/kube-proxy:v1.16.3 | 0.08
k8s.gcr.io/kube-apiserver:v1.16.3 | 0.2
k8s.gcr.io/kube-controller-manager:v1.16.3 | 0.15
k8s.gcr.io/kube-scheduler:v1.16.3 | 0.08
k8s.gcr.io/coredns:1.6.5 | 0.04
metallb/controller:v0.8.2 | 0.04
metallb/speaker:v0.8.2 | 0.04
k8s.gcr.io/etcd:3.4.3-0 | 0.27
quay.io/k8scsi/csi-snapshotter:v1.2.2 | 0.04
k8s.gcr.io/etcd:3.3.15-0 | 0.23
quay.io/k8scsi/csi-attacher:v1.2.1 | 0.04
k8s.gcr.io/coredns:1.6.2 | 0.04
robinsys/genie-plugin:v3.0 | 0.02
quay.io/k8scsi/csi-provisioner:v1.0.0_robin | 0.04
quay.io/k8scsi/csi-provisioner:v0.4.1_robin | 0.04
robinsys/coredns:1.2.2 | 0.03
Returns detailed information for a host such as the storage allocation breakdown, discovered physical attributes with their utilization (NUMA configuration, network topology etc.) and service details.
End Point: /api/v3/robin_server/hosts/<hostname>
Method: GET
URL Parameters:
diskinfo=true
: Utilizing this parameter results in details of the disks attached to the specified host being returned.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error)
Example Response:
Output
{
"items":[
{
"memory_used":2692743168,
"hdd_lalloc":35433480192.0,
"memory":33555709952,
"lab":"default",
"saas_mode":false,
"zoneid":1596601846,
"non_isol_total_with_prov":400,
"zone_name":"default",
"rack":"default",
"ipaddresses":[
{
"ip_address":"10.9.82.140",
"mac_address":"00:15:5d:14:06:0e",
"netmask":"255.255.0.0"
}
],
"public_ip":"10.9.82.140",
"ssd_nonrobin_usage":0,
"k8s_node_name":"cscale-82-140",
"mem_for_storage":1073741824,
"id":1,
"ssd_for_storage":0,
"rcm_ha_role":"MANAGER_MASTER",
"ssd_robin_usage":0,
"gpu_cores_allocated":0,
"numa_map":{
"0":{
"memory_used":0,
"hugepages_1g_used":0,
"isol_total":0,
"isol_shared_map":{
},
"cpu_reserved":0,
"numa_id":0,
"non_isol_cores_used":2,
"cpu_ids":"",
"cpu_used":0,
"mem_used":1493172224,
"non_isol_total":20,
"hugepages_2m_used":0,
"gpu_used":0,
"isol_shared_cores_used":0,
"hugepages_1g_total":0,
"cpu_total":20,
"hugepages_2m_total":0,
"memory_total":16777626965,
"isol_dedicated_cores_used":0
},
"1":{
"memory_used":0,
"hugepages_1g_used":0,
"isol_total":0,
"isol_shared_map":{
},
"cpu_reserved":0,
"numa_id":1,
"non_isol_cores_used":0,
"cpu_ids":"",
"cpu_used":0,
"mem_used":0,
"non_isol_total":20,
"hugepages_2m_used":0,
"gpu_used":0,
"isol_shared_cores_used":0,
"hugepages_1g_total":0,
"cpu_total":20,
"hugepages_2m_total":0,
"memory_total":16778082988,
"isol_dedicated_cores_used":0
}
},
"roles":[
[
"MANAGER",
"ONLINE",
"READY"
],
[
"COMPUTE",
"ONLINE",
"READY"
],
[
"STORAGE",
"ONLINE",
"READY"
]
],
"ssd_max_alloc_slices":0,
"cpu_prov_factor":10,
"services":{
"update_time":1596761892.3700919151,
"services":{
"consul_dns":true,
"stormgr-server":{
"MainPID":2299,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:31:26.794916",
"RoleTags":[
"M*"
],
"Id":"stormgr-server",
"State":"UP",
"ActiveState":"active"
},
"gui-cli":{
"MainPID":2647,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:31:32.291459",
"RoleTags":[
"-"
],
"Id":"gui-cli",
"State":"UP",
"ActiveState":"active"
},
"consul-client":{
"MainPID":0,
"Type":"simple",
"ExecMainStartTimestamp":0,
"RoleTags":[
"A"
],
"Id":"consul-client",
"State":"DOWN",
"ActiveState":"inactive"
},
"robin-server":{
"MainPID":64400,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-06 17:57:55.208502",
"RoleTags":[
"M*"
],
"Id":"robin-server",
"State":"UP",
"ActiveState":"active"
},
"robin-node-monitor":{
"MainPID":1278,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:31:13.459063",
"RoleTags":[
"M*"
],
"Id":"robin-node-monitor",
"State":"UP",
"ActiveState":"active"
},
"iomgr-server":{
"MainPID":7384,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:55:13.851492",
"RoleTags":[
"C",
"S"
],
"Id":"iomgr-server",
"State":"UP",
"ActiveState":"active"
},
"consul_members":[
{
"DelegateMax":5,
"ProtocolCur":2,
"Port":29460,
"Status":1,
"ProtocolMax":5,
"DelegateCur":4,
"Tags":{
"dc":"consul",
"role":"consul",
"vsn":"2",
"wan_join_port":"29461",
"segment":"",
"port":"29459",
"raft_vsn":"2",
"vsn_min":"2",
"vsn_max":"3",
"id":"9dbc13cd-bbb4-1bf1-9bcd-f3d7e0f0026f",
"bootstrap":"1",
"build":"0.9.4:40f243a+"
},
"ProtocolMin":1,
"Name":"cscale-82-140.robinsystems.com",
"Addr":"10.9.82.140",
"DelegateMin":2
}
],
"robin-file-server":{
"MainPID":1071,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:30:55.801664",
"RoleTags":[
"M*"
],
"Id":"robin-file-server",
"State":"UP",
"ActiveState":"active"
},
"robin-event-server":{
"MainPID":1039,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:30:55.322709",
"RoleTags":[
"M*"
],
"Id":"robin-event-server",
"State":"UP",
"ActiveState":"active"
},
"sherlock-server":{
"MainPID":0,
"Type":"simple",
"ExecMainStartTimestamp":0,
"RoleTags":[
"-"
],
"Id":"sherlock-server",
"State":"DOWN",
"ActiveState":"inactive"
},
"robin-agent":{
"MainPID":9186,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:59:23.165676",
"RoleTags":[
"M",
"C",
"S"
],
"Id":"robin-agent",
"State":"UP",
"ActiveState":"active"
},
"postgresql-9.6":{
"MainPID":660,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:30:34.988682",
"RoleTags":[
"M"
],
"Id":"postgresql-9.6",
"State":"UP",
"ActiveState":"active"
},
"consul-server":{
"MainPID":564,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:30:32.432940",
"RoleTags":[
"M"
],
"Id":"consul-server",
"State":"UP",
"ActiveState":"active"
},
"httpd":{
"MainPID":2613,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:31:32.124472",
"RoleTags":[
"*"
],
"Id":"httpd",
"State":"UP",
"ActiveState":"active"
},
"robin-auth-server":{
"MainPID":1010,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:30:50.844131",
"RoleTags":[
"M*"
],
"Id":"robin-auth-server",
"State":"UP",
"ActiveState":"active"
},
"robin-watchdog":{
"MainPID":860,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:30:37.982382",
"RoleTags":[
"M"
],
"Id":"robin-watchdog",
"State":"UP",
"ActiveState":"active"
},
"monitor-server":{
"MainPID":2687,
"Type":"simple",
"ExecMainStartTimestamp":"2020-08-04 14:31:32.455445",
"RoleTags":[
"M",
"C",
"S"
],
"Id":"monitor-server",
"State":"UP",
"ActiveState":"active"
}
}
},
"non_isol_total":40,
"gpu_cores":0,
"hdd_robin_usage":35433480192,
"visibledisks":[
"0x600224801d3ac9b6650afd3280aa5898",
"0x600224801d3ac9b6650afd3280aa5898-centos-root",
"0x600224801d3ac9b6650afd3280aa5898-centos-swap",
"0x600224801d3ac9b6650afd3280aa5898-centos-home",
"0x600224804c48fd7e16c608dea0919064",
"0x600224803bcdafde95b1f5cd27ceb5fb"
],
"ssd_faulted":0,
"isol_shared_cores_used":0,
"nic_details":{
"br0":{
"vfdrivers":[
],
"product_desc":null,
"all_vlans_allowed":false,
"mtu":1500,
"bus":null,
"vendor_id":null,
"slot":null,
"physical_nic":"eth0",
"allowed_vlans":[
],
"num_vfs":0,
"function":null,
"ips":[
"10.9.82.140\/16"
],
"product_id":null,
"local_cpulist":null,
"native_vlan":null,
"vendor_desc":null,
"domain":null,
"untagged":false,
"used_vfs":0,
"numa_node":null
}
},
"ninstances":2,
"non_isol_cores_used":3,
"maintenance_mode":"DISABLED",
"memory_allocated":0,
"mem_for_management":1073741824.0,
"rpool_id":1,
"hdd_for_storage":214748364800,
"datacenter":"default",
"tags":{
"kubernetes.io\/os":[
"linux"
],
"robin.io\/robinrpool":[
"default"
],
"kubernetes.io\/arch":[
"amd64"
]
},
"hugepages_1g":0,
"hugepages_2m_allocated":0,
"rpool":"default",
"hdd_total":428414599168,
"ssd_free_alloc_slices":0,
"pods":110,
"state":"ONLINE",
"status":"Ready",
"instances":[
{
"state":"STARTED",
"name":"rohan-app.nginx.03",
"hostname":"rohan-app-nginx-03.t001-u000003.svc.cluster.local"
},
{
"state":"STARTED",
"name":"test-RIC-1.server.01",
"hostname":"test-ric-1-server-01.t001-u000003.svc.cluster.local"
}
],
"isol_dedicated_cores_used":0,
"host_type":"physical",
"ssd_pused":0,
"isol_total":0,
"primary_ip":"10.9.82.140",
"isol_shared_map":{
},
"hugepages_2m":0,
"pods_used":26,
"hdd_lused":0,
"hdd_faulted":0,
"napps":2,
"hdd_nonrobin_usage":0,
"cpu_cores_present":40,
"ssd_total":0,
"cpu_cores_allocated":0,
"is_master":true,
"memory_reserved":6442450944.0,
"cpu_cores":400,
"config":{
"stormgr_rest_port":29454,
"monitor_host_mem_lowmark":0.8,
"monitor_host_root_volume_highmark":0.9,
"rio_rest_port":29456,
"stormgr_rest_listen_addr":"127.0.0.1",
"kvm_enabled":true,
"hard_reset_on_isolation":0,
"monitor_host_cpu_lowmark":0.8,
"monitor_host_var_crash_volume_highmark":0.9,
"monitor_interval":1,
"monitor_host_var_pgsql_volume_lowmark":0.5,
"kubelet_restart_bursttime":25,
"server_rest_port":29442,
"kubelet_restart_burstlimit":2,
"event_server_port":29449,
"rio_rpc_port":29453,
"rdvm_bmapcache_skip_all":0,
"rdvm_mem_maxcap":25769803776,
"rest_server":"cscale-82-140.robinsystems.com",
"registration_timeout":10,
"rdvm_rpc_port":29452,
"node_exporter_port":29457,
"stormgr_rpc_port":29451,
"monitor_host_root_volume_lowmark":0.85,
"database_port":29458,
"rdvm_rest_listen_addr":"127.0.0.1",
"https_port":29443,
"metrics_grafana_details":"{\"url\": \"\", \"auth\": \":\"}",
"monitor_host_var_volume_lowmark":0.85,
"monitor_host_cpu_highmark":0.85,
"rio_rest_listen_addr":"127.0.0.1",
"monitor_host_var_robin_volume_highmark":0.9,
"rdvm_mem_alloc":1073741824,
"monitor_num_samples":3600,
"monitor_host_swap_lowmark":0.75,
"watchdog_loop_interval":3,
"rdvm_rest_port":29455,
"monitor_container_swap_highmark":0.8,
"consul_serfwan_port":29461,
"saas_mode":false,
"file_object_cache":"\/var\/lib\/robin\/file_object_cache",
"node_monitor_port":29467,
"monitor_influx_details":"{\"url\": \"\", \"dbname\": \"robin\", \"auth\": \":\" }",
"consul_http_port":29462,
"monitor_container_swap_lowmark":0.75,
"hostname":"cscale-82-140.robinsystems.com",
"monitor_host_var_volume_highmark":0.9,
"network_type":4,
"suicide_threshold":50,
"mem_for_compute":null,
"mem_for_management":null,
"sherlock_rest_port":29446,
"nfs_mount_options":"nolock,rw,timeo=60",
"rediscover_timeout":120,
"kvm_emulatorpin_cpuset":"",
"rdvm_bmapcache_invalidate_all":0,
"consul_serflan_port":29460,
"monitor_host_var_robin_volume_lowmark":0.85,
"rest_port":29450,
"monitor_report_interval":5,
"host_type":"physical",
"monitor_host_swap_highmark":0.8,
"nodejs_port":29447,
"monitor_push_interval":60,
"ovs_enabled":true,
"monitor_host_var_log_volume_highmark":0.9,
"kubelet_restart_tolerance":15,
"monitor_host_mem_highmark":0.85,
"monitor_host_var_log_volume_lowmark":0.85,
"log_level":10,
"monitor_host_var_crash_volume_lowmark":0.85,
"monitor_container_volume_highmark":0.9,
"monitor_container_volume_lowmark":0.85,
"consul_server_port":29459,
"monitor_host_var_pgsql_volume_highmark":0.7
},
"ssd_lused":0,
"hugepages_1g_allocated":0,
"k8s_node_status":"Ready",
"hdd_free_alloc_slices":293131517952.0,
"ssd_lalloc":0,
"hostname":"cscale-82-140.robinsystems.com",
"public_hostname":"cscale-82-140.robinsystems.com",
"hdd_max_alloc_slices":328564998144.0,
"memory_total":33555709952,
"mem_for_compute":4294967296,
"sysinfo":{
"join_time":1596576678,
"current_version":"5.3.0-171",
"iqn":"iqn.1994-05.com.redhat:329b8568de1",
"install_date":"Tue Mar 17 23:49:17 UTC 2020",
"wwpns":[
],
"distribution":"CentOS Linux",
"version":"#1 SMP Tue Mar 17 23:49:17 UTC 2020",
"uuid":"",
"boot_time":1596576222,
"robin_software":[
{
"version":"5.3.0",
"patch":"",
"full_version":"5.3.0-171",
"install_date":"2020-08-03",
"patch_date":"",
"release":"171",
"build_info":"robin-c2edf85eaa83a42ced9512e7de9c7c2f1e4fa962:robin-ui:9ee33fd00273ba19861d4dc3ef8c6169d822d3e0:robingraph:cf0ceefe696ccac2dbd2eeb1d28b859955452843"
}
],
"release":"3.10.0-1062.18.1.el7.x86_64",
"system":"Linux",
"processor":"x86_64"
},
"hdd_pused":1174405120,
"disks":[
{
"spf":0.8,
"zoneid":1596601846,
"dev":"\/dev\/sda",
"aslices":0,
"nodeid":1,
"maintenance_mode":"OFF",
"role":"RootDisk",
"protected":0,
"status":"UNKNOWN",
"make":null,
"reattachable_nodes":[
[
"cscale-82-140.robinsystems.com",
"ONLINE"
]
],
"capacity":107374182400,
"max_latency_sensitive_vols_per_disk":2,
"pfree":0,
"node_hostname":"cscale-82-140.robinsystems.com",
"tags":{
},
"pused":0,
"type":"HDD",
"nvols":0,
"state":"INIT",
"reattachpolicy":{
"restarts_done":0,
"burst_count":0,
"burst_start_time":0,
"burst_interval":600,
"id":1,
"restart_limit":5
},
"max_alloc_slices":77,
"stormgrid":0,
"free_alloc_slices":77,
"slices":0,
"availability_zone":null,
"max_throughput_intensive_vols_per_disk":1,
"model":null,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224801d3ac9b6650afd3280aa5898",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"wwn":"0x600224801d3ac9b6650afd3280aa5898",
"allocations":[
],
"alloc_score":0,
"node_ref":1,
"preserved":0
},
{
"spf":0.8,
"zoneid":1596601846,
"dev":"\/dev\/dm-0",
"aslices":0,
"nodeid":1,
"maintenance_mode":"OFF",
"role":"RootDisk",
"protected":0,
"status":"UNKNOWN",
"make":null,
"reattachable_nodes":[
[
"cscale-82-140.robinsystems.com",
"ONLINE"
]
],
"capacity":53687091200,
"max_latency_sensitive_vols_per_disk":2,
"pfree":0,
"node_hostname":"cscale-82-140.robinsystems.com",
"tags":{
},
"pused":0,
"type":"HDD",
"nvols":0,
"state":"INIT",
"reattachpolicy":{
"restarts_done":0,
"burst_count":0,
"burst_start_time":0,
"burst_interval":600,
"id":4,
"restart_limit":5
},
"max_alloc_slices":38,
"stormgrid":0,
"free_alloc_slices":38,
"slices":0,
"availability_zone":null,
"max_throughput_intensive_vols_per_disk":1,
"model":null,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphgpZcvqGdfOKaXbEbOZzNthc6btsoSXDj",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-root",
"allocations":[
],
"alloc_score":0,
"node_ref":1,
"preserved":0
},
{
"spf":0.8,
"zoneid":1596601846,
"dev":"\/dev\/dm-1",
"aslices":0,
"nodeid":1,
"maintenance_mode":"OFF",
"role":"RootDisk",
"protected":0,
"status":"UNKNOWN",
"make":null,
"reattachable_nodes":[
[
"cscale-82-140.robinsystems.com",
"ONLINE"
]
],
"capacity":8254390272,
"max_latency_sensitive_vols_per_disk":2,
"pfree":0,
"node_hostname":"cscale-82-140.robinsystems.com",
"tags":{
},
"pused":0,
"type":"HDD",
"nvols":0,
"state":"INIT",
"reattachpolicy":{
"restarts_done":0,
"burst_count":0,
"burst_start_time":0,
"burst_interval":600,
"id":5,
"restart_limit":5
},
"max_alloc_slices":5,
"stormgrid":0,
"free_alloc_slices":5,
"slices":0,
"availability_zone":null,
"max_throughput_intensive_vols_per_disk":1,
"model":null,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphaFy4aq3EUo1yluonS8FG0LF16ycBrdEw",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-swap",
"allocations":[
],
"alloc_score":0,
"node_ref":1,
"preserved":0
},
{
"spf":0.8,
"zoneid":1596601846,
"dev":"\/dev\/dm-2",
"aslices":0,
"nodeid":1,
"maintenance_mode":"OFF",
"role":"RootDisk",
"protected":0,
"status":"UNKNOWN",
"make":null,
"reattachable_nodes":[
[
"cscale-82-140.robinsystems.com",
"ONLINE"
]
],
"capacity":44350570496,
"max_latency_sensitive_vols_per_disk":2,
"pfree":0,
"node_hostname":"cscale-82-140.robinsystems.com",
"tags":{
},
"pused":0,
"type":"HDD",
"nvols":0,
"state":"INIT",
"reattachpolicy":{
"restarts_done":0,
"burst_count":0,
"burst_start_time":0,
"burst_interval":600,
"id":6,
"restart_limit":5
},
"max_alloc_slices":32,
"stormgrid":0,
"free_alloc_slices":32,
"slices":0,
"availability_zone":null,
"max_throughput_intensive_vols_per_disk":1,
"model":null,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/dm-uuid-LVM-vI83PDTxV3H0dWyAXfH5ef7rxTOuYyphQObDlS6eMUSpSxH5zsvyg9I5a0Gpuj5W",
"alloc_slices":0,
"reattachable":0,
"max_volumes_per_disk":10,
"wwn":"0x600224801d3ac9b6650afd3280aa5898-centos-home",
"allocations":[
],
"alloc_score":0,
"node_ref":1,
"preserved":0
},
{
"spf":0.8,
"zoneid":1596601846,
"dev":"\/dev\/sdb",
"aslices":7,
"nodeid":1,
"maintenance_mode":"OFF",
"role":"Storage",
"write_unit":4096,
"status":"ONLINE",
"make":null,
"reattachable_nodes":[
[
"cscale-82-140.robinsystems.com",
"ONLINE"
]
],
"protected":0,
"capacity":107374182400,
"max_latency_sensitive_vols_per_disk":2,
"pfree":104287174656,
"node_hostname":"cscale-82-140.robinsystems.com",
"tags":{
},
"pused":234881024,
"type":"HDD",
"nvols":3,
"state":"READY",
"reattachpolicy":{
"restarts_done":0,
"burst_count":0,
"burst_start_time":0,
"burst_interval":600,
"id":2,
"restart_limit":5
},
"max_alloc_slices":77,
"stormgrid":1,
"free_alloc_slices":68,
"slices":6390,
"availability_zone":null,
"max_throughput_intensive_vols_per_disk":1,
"model":null,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224804c48fd7e16c608dea0919064",
"alloc_slices":9,
"reattachable":0,
"max_volumes_per_disk":10,
"wwn":"0x600224804c48fd7e16c608dea0919064",
"allocations":[
{
"vols":[
{
"media":"HDD",
"pused":167772160,
"id":"1",
"size":5368709120,
"state":"ONLINE",
"name":"file-collection-1596578146092.269f9b38-f828-48c2-a382-8921dd74ee53"
}
],
"volume_group":"file-collection-1596578146092.269f9b38-f828-48c2-a382-8921dd74ee53.72.1.673abece-0975-4234-9fc2-56a06bf54031",
"name":"file-collection-1596578146092.269f9b38-f828-48c2-a382-8921dd74ee53.0.970a44c7-a15c-4612-ac57-9b4f15ae386e",
"volume":{
"media":"HDD",
"pused":167772160,
"id":"1",
"size":5368709120,
"state":"ONLINE",
"name":"file-collection-1596578146092.269f9b38-f828-48c2-a382-8921dd74ee53"
},
"slices":5
},
{
"vols":[
{
"media":"HDD",
"pused":67108864,
"id":"8",
"size":1073741824,
"state":"ONLINE",
"name":"test-RIC-1.server.01.data.1.382f1ad5-1294-4e24-8297-9c6025eacfe5"
}
],
"volume_group":"test-RIC-1.server.01.72.1.ea297971-f931-4787-99cc-6782e026b77c",
"name":"test-RIC-1.server.01.72.1.ea297971-f931-4787-99cc-6782e026b77c.0.1e8feb41-fd42-4409-b8c1-751331febdc1",
"volume":{
"media":"HDD",
"pused":67108864,
"id":"8",
"size":1073741824,
"state":"ONLINE",
"name":"test-RIC-1.server.01.data.1.382f1ad5-1294-4e24-8297-9c6025eacfe5"
},
"slices":2
},
{
"vols":[
{
"media":"HDD",
"pused":0,
"id":"9",
"size":1073741824,
"state":"ONLINE",
"name":"test-RIC-1.server.01.block.1.1053eaeb-4542-42a5-a173-d69a76703ead"
}
],
"volume_group":"test-RIC-1.server.01.72.1.1e483fd0-2d5c-434c-aef0-91a87796977a",
"name":"test-RIC-1.server.01.72.1.1e483fd0-2d5c-434c-aef0-91a87796977a.0.5bf728c4-ea26-4e0f-82d7-c584fcf0bd9a",
"volume":{
"media":"HDD",
"pused":0,
"id":"9",
"size":1073741824,
"state":"ONLINE",
"name":"test-RIC-1.server.01.block.1.1053eaeb-4542-42a5-a173-d69a76703ead"
},
"slices":2
}
],
"alloc_score":95,
"node_ref":1,
"preserved":0
},
{
"spf":0.8,
"zoneid":1596601846,
"dev":"\/dev\/sdc",
"aslices":20,
"nodeid":1,
"maintenance_mode":"OFF",
"role":"Storage",
"write_unit":4096,
"status":"ONLINE",
"make":null,
"reattachable_nodes":[
[
"cscale-82-140.robinsystems.com",
"ONLINE"
]
],
"protected":0,
"capacity":107374182400,
"max_latency_sensitive_vols_per_disk":2,
"pfree":103582531584,
"node_hostname":"cscale-82-140.robinsystems.com",
"tags":{
},
"pused":939524096,
"type":"HDD",
"nvols":1,
"state":"READY",
"reattachpolicy":{
"restarts_done":0,
"burst_count":0,
"burst_start_time":0,
"burst_interval":600,
"id":3,
"restart_limit":5
},
"max_alloc_slices":77,
"stormgrid":2,
"free_alloc_slices":53,
"slices":6390,
"availability_zone":null,
"max_throughput_intensive_vols_per_disk":1,
"model":null,
"lused_size":0,
"devpath":"\/dev\/disk\/by-id\/scsi-3600224803bcdafde95b1f5cd27ceb5fb",
"alloc_slices":24,
"reattachable":0,
"max_volumes_per_disk":10,
"wwn":"0x600224803bcdafde95b1f5cd27ceb5fb",
"allocations":[
{
"vols":[
{
"media":"HDD",
"pused":939524096,
"id":"16",
"size":21474836480,
"state":"ONLINE",
"name":"rohan-app.nginx.03.data.1.83d03fbf-3bfe-4723-8abe-5cbd51014e0c"
}
],
"volume_group":"rohan-app.nginx.03.72.1.44251a23-0221-4fda-837e-db26bca3ccb8",
"name":"rohan-app.nginx.03.72.1.44251a23-0221-4fda-837e-db26bca3ccb8.0.6e2afb65-5c4c-42e4-972e-161de3fb3856",
"volume":{
"media":"HDD",
"pused":939524096,
"id":"16",
"size":21474836480,
"state":"ONLINE",
"name":"rohan-app.nginx.03.data.1.83d03fbf-3bfe-4723-8abe-5cbd51014e0c"
},
"slices":24
}
],
"alloc_score":89,
"node_ref":1,
"preserved":0
}
]
}
]
}
5.5. Disabling a node¶
In certain situations, a user might not want any resources for an application to be allocated from a particular host due to a malfunction with the physical machine or simply because the host is temporarily undergoing maintenance. Instead of requiring the user to remove the node from an existing cluster, Robin allows one to place a host into maintenance mode. This effectively isolates the host with regards to resource availability as it entails that none of the host’s storage capacity can be used for future application deployment regardless of the Robin roles assigned to the node. This mode can be toggled using the commands detailed below. For more granular control, please review the section on disabling/enabling particular roles here.
The following commands are described in this section:
|
Place a host into maintenance mode |
|
Place a host into non-maintenance (normal) mode |
5.5.1. Placing a host into maintenance mode¶
In order to put a host into maintenance and thus temporarily suspend it from providing either storage or compute resources for future application deployments, issue the following command:
# robin host set-maintenance <hostname>
|
FQDN of host |
Example:
# robin host set-maintenance vnode36.robinsystems.com
Host vnode36.robinsystems.com set in maintenance mode
Puts a host into maintenance mode, which in turn temporarily suspends it from providing storage and compute resources for application deployments.
End Point: /api/v3/robin_server/hosts/<hostname>
Method: PUT
URL Parameters: None
Data Parameters:
action: set_maintenance
- This mandatory field within the payload specifies that the set maintenance mode operation is to be performed.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"message":"Maintenance mode set"
}
5.5.2. Placing a host into non-maintenance mode¶
In order to revert a host back into its normal setting and thus allow it to provide resources for future application deployments, issue the following command:
# robin host unset-maintenance <hostname>
|
FQDN of host |
Example:
# robin host unset-maintenance vnode36.robinsystems.com
Host vnode36.robinsystems.com out of maintenance mode
Removes a host from maintenance mode, which in turn allows it to provide storage and compute resources for application deployments.
End Point: /api/v3/robin_server/hosts/<hostname>
Method: PUT
URL Parameters: None
Data Parameters:
action: unset_maintenance
- This mandatory field within the payload specifies that the unset maintenance mode operation is to be performed.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid Api Usage Error)
Example Response:
Output
{
"message":"Maintenance mode unset"
}
5.6. Decommissioning a node¶
When a server physically malfunctions or needs to be replaced, one has to remove the node from its respective Robin and Kubernetes clusters. Detailed below is a step-by-step walkthrough which showcases how to achieve this, whilst also ensuring your Robin cluster continues to function normally.
Removing roles from a host:
Before a host can be removed from the Robin cluster all the roles currently assigned to it must be removed. Detailed below are the different preconditions that need to be met before removing a particular role.
If the host in question has the Storage role assigned to it, all
volumes that are currently allocated on drives which belong to that
host must be evacuated to drives residing on other hosts. This can be
achieved by running the robin drive evacuate
command. More
details about this command can be found here. The reason
for this is that the Storage role denotes that the host is available
to provide storage capacity for volumes that are created alongisde
applications. As a result, in order to remove the role there must be
no volume allocations tied with the host. This is the resulting
consequence of evacuating all the volumes from each drive on the
node.
# robin drive list --host intel-1.robinsystems.com
ID | WWN | Host | Path /dev/disk/by-id | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role | Status | LastOpr
---+-------------------------------------------+---------+-----------------------------------------------+----------+---------+------+--------------+------+---------+--------+---------
3 | 0xQEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | intel-1 | scsi-0QEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | 100 | N | HDD | 63/77 (82%) | 2/10 | Storage | ONLINE | READY
4 | 0xQEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | intel-1 | scsi-0QEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | 100 | N | HDD | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
Only storage disks are shown. Issue `robin disk list --role all` to view all disks
# robin drive evacuate 0xQEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b --exclude-disks 0xQEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b, 0xQEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a --wait --yes
Job: 65 Name: DiskEvacuate State: VALIDATED Error: 0
Job: 65 Name: DiskEvacuate State: WAITING Error: 0
Job: 65 Name: DiskEvacuate State: COMPLETED Error: 0
# robin drive list --host intel-1.robinsystems.com
ID | WWN | Host | Path /dev/disk/by-id | Size(GB) | Movable | Type | Free/Max(GB) | Vols | Role | Status | LastOpr
---+-------------------------------------------+---------+-----------------------------------------------+----------+---------+------+--------------+------+---------+--------+---------
3 | 0xQEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | intel-1 | scsi-0QEMU_QEMU_HARDDISK_e54d6149-0a4e-48ce-b | 100 | N | HDD | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
4 | 0xQEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | intel-1 | scsi-0QEMU_QEMU_HARDDISK_89fc0488-2050-4f44-a | 100 | N | HDD | 77/77 (100%) | 0/10 | Storage | ONLINE | READY
If the host in question has the Manager role assigned to it, there are no explicit preconditions that need to be met. However we recommend that for high availability clusters that there are at least 2 nodes with the Manager role assigned to them at all times. This is because if the master Manager node does fail or is rebooted, a cluster failover can occur and the Robin cluster is not affected.
Once all the necessary conditions have been met, the roles can be removed as shown in the following example:
# robin host list
Id | Hostname | Version | Status | LastOpr | Resource Pool | Roles | Cores | GPUs | Mem(Free/Alloc/Total) | HDD(#/Alloc/Total) | SSD(#/Alloc/Total) | Instances | Joined Time
-------------+-------------------------------+------------+--------+---------+---------------+-------+-------+------+-----------------------+--------------------+--------------------+-----------+----------------------
1582820722:1 | cscale-82-81.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M* | 2/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:25:46
1582820722:2 | cscale-82-82.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:32:47
1582820722:3 | cscale-82-83.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:41:56
1582820722:4 | intel-1.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | S | 1/720 | 0/0 | 175G/12G/187G | 2/-/200G | -/-/- | 0 | 27 Feb 2020 00:53:35
# robin host remove-role intel-1.robinsystems.com storage --yes --wait
Job: 208 Name: HostRemoveRoles State: VALIDATED Error: 0
Job: 208 Name: HostRemoveRoles State: WAITING Error: 0
Job: 208 Name: HostRemoveRoles State: COMPLETED Error: 0
# robin host list
Id | Hostname | Version | Status | LastOpr | Resource Pool | Roles | Cores | GPUs | Mem(Free/Alloc/Total) | HDD(#/Alloc/Total) | SSD(#/Alloc/Total) | Instances | Joined Time
-------------+-------------------------------+------------+--------+---------+---------------+-------+-------+------+-----------------------+--------------------+--------------------+-----------+----------------------
1582820722:1 | cscale-82-81.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M* | 2/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:25:46
1582820722:2 | cscale-82-82.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:32:47
1582820722:3 | cscale-82-83.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:41:56
1582820722:4 | intel-1.robinsystems.com | 5.2.1-9769 | Ready | SYNCED | default | | 1/720 | 0/0 | 187G/0.05G/187G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:53:35
Removing the host:
Once all the the roles are removed from a host, issue the following command to remove the host from a Robin cluster:
# robin host remove [<host>]
--force
--yes
|
FQDN of host |
|
Forcibly remove a host from a cluster. Required if removing a node that is down |
|
Do not prompt the user for confirmation of removal |
Example:
# robin host list
Id | Hostname | Version | Status | LastOpr | Resource Pool | Roles | Cores | GPUs | Mem(Free/Alloc/Total) | HDD(#/Alloc/Total) | SSD(#/Alloc/Total) | Instances | Joined Time
-------------+-------------------------------+------------+--------+---------+---------------+-------+-------+------+-----------------------+--------------------+--------------------+-----------+----------------------
1582820722:1 | cscale-82-81.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M* | 2/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:25:46
1582820722:2 | cscale-82-82.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:32:47
1582820722:3 | cscale-82-83.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:41:56
1582820722:4 | intel-1.robinsystems.com | 5.2.1-9769 | Ready | SYNCED | default | | 1/720 | 0/0 | 187G/0.05G/187G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:53:35
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
cscale-82-81 Ready master 21h v1.16.3
cscale-82-82 Ready master 21h v1.16.3
cscale-82-83 Ready master 21h v1.16.3
intel-1 Ready worker 21h v1.16.3
# robin host remove intel-1.robinsystems.com --yes --wait
Job: 215 Name: HostRemove State: VALIDATED Error: 0
Job: 215 Name: HostRemove State: WAITING Error: 0
Job: 215 Name: HostRemove State: COMPLETED Error: 0
# robin host list
Id | Hostname | Version | Status | LastOpr | Resource Pool | Roles | Cores | GPUs | Mem(Free/Alloc/Total) | HDD(#/Alloc/Total) | SSD(#/Alloc/Total) | Instances | Joined Time
-------------+-------------------------------+------------+--------+---------+---------------+-------+-------+------+-----------------------+--------------------+--------------------+-----------+----------------------
1582820722:1 | cscale-82-81.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M* | 2/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:25:46
1582820722:2 | cscale-82-82.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:32:47
1582820722:3 | cscale-82-83.robinsystems.com | 5.2.1-9769 | Ready | ONLINE | default | M | 1/400 | 0/0 | 30G/1G/31G | -/-/- | -/-/- | 0 | 27 Feb 2020 00:41:56
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
cscale-82-81 Ready master 21h v1.16.3
cscale-82-82 Ready master 21h v1.16.3
cscale-82-83 Ready master 21h v1.16.3
5.7. Managing a cluster via the remote client¶
In addition to the Robin CLI, which is available on all hosts where Robin is installed, a remote client is shipped with each cluster that is deployed.
This client mirrors the functionality of the native CLI with regards to the commands available and hence it provides the management capabilities that are described
throughout this document. One advantage of utilizing this client is that it can be used to manage a multitude of Robin clusters via the concept of contexts
.
A context
in this scenario refers to a Robin cluster and is identified by the server name or IP Address. In addition to this primary key, the following attributes
can also be set within a context
: the port values for various Robin services (including the Robin Server, File Server, Event Server, Watchdog Server, and Metrics Server)
along with the logging level. The attributes are discussed in more detail in the following sections. After creating the appropriate context
for a Robin cluster,
one can set it to be the current context and communicate with the respective cluster. The commands which can be used to achieve this are described below.
The following commands are described in this section:
|
Add a Robin cluster context |
|
List all registered Robin cluster contexts |
|
Set a Robin cluster context as the current context |
|
Update attributes for the current Robin cluster context |
|
Delete a Robin cluster context |
5.7.1. Downloading the Robin client¶
In order to download the Robin client from an existing Robin cluster, issue the following command:
# curl -k 'https://<master_ip>:<port>/api/v3/robin_server/download?file=robincli&os=<os>' -o robin
|
IP Address of the Master Node or VIP |
|
Port number for the Robin Server |
|
The operating system to download the client for. Supported operating systems include: Linux, MacOS. |
Example:
# curl -k 'https://vnode42:29442/api/v3/robin_server/download?file=robincli&os=linux' -o robin
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10.1M 100 10.1M 0 0 1421k 0 0:00:07 0:00:07 --:--:-- 1483k
# ls -lart
-rw-r--r-- 1 demo staff 10655536 Mar 26 14:12 robin
5.7.2. Adding a Context¶
A context is a construct that can be used to define a Robin cluster in a manner that the remote client can understand. In order to add a context, issue the following command:
Note
If a context already exists with the server specified, that context will be updated with the values supplied.
# robin client add-context <server>
--port <port>
--file-port <file_port>
--event-port <event_port>
--watchdog-port <watch_port>
--metrics-port <metrics_port>
--log-level <log_level>
--product <product_type>
--set-current
|
FQDN/IP Address of the Master Node or VIP |
|
Port number for the Robin Server. Default value is 29442 |
|
Port number for the File Server. Default value is 29445 |
|
Port number for the Event Server. Default value is 29449 |
|
Port number for the Watchdog Server. Default value is 29444 |
|
Port number for the Metrics Server. Default is 29446 |
|
Number indicating the verbosity of logs. Valid values are 10 (DEBUG), 20 (INFO), 40 (ERROR). Default value is 40. |
|
Type of ROBIN installation. Valid choices are ‘platform’ or ‘storage’. Default value is ‘platform’. |
|
Set context to be created as the current |
Note
If the target Robin cluster was deployed as a highly available cluster, ensure that the robincp_mode
config attribute is enabled and set the port value for the Robin, file and event server to 29465. Otherwise the remote client will not be fully operational.
Example:
# robin client add-context centos-60-214 --port 29443
Context robin-cluster-centos-60-214 created successfully
5.7.3. Listing all available contexts¶
In order to list all contexts that have already been registered with the client alongside additional details such as the port values specified or the log level, issue the following command:
# robin client list-contexts
--full
|
Show additional details about all registered contexts |
Example:
# robin client list-contexts --full
| Server | Port | Version | Tenant | Last Login | Tenants | FPort | WPort | MPort | LogLevel
---+-----------------------------------+-------+------------+----------------+----------------------+----------------+-------+-------+-------+----------
| master.robin-server.service.robin | 29442 | - | - | - | | 29445 | 29444 | 29446 | ERR
| centos-60-214 | 29443 | - | Administrators | - | | 29445 | 29444 | 29446 | ERR
* | 172.19.174.194 | 29442 | 5.2.3-9842 | Administrators | 26 Mar 2020 16:10:58 | Administrators | 29445 | 29444 | 29446 | ERR
Note
The asterisk displayed above indicates the current context.
5.7.4. Setting the current context¶
In order to access a particular Robin cluster, its respective context needs to be set as the current context. To achieve this, issue the following command:
# robin client set-current <context>
|
The server attribute of the context to be set as current |
Example:
# robin client set-current centos-60-214
Current context set to robin-cluster-centos-60-214
5.7.5. Updating the current context¶
In certain situations, such as a reinstallation, the attributes of a context might be altered whilst retaining the same server IP Address or hostname. As a result, the context which refers to this cluster will have to be updated. In order to do so, issue the following command:
Note
The below command only updates the current context.
# robin client update-context
--port <port>
--file-port <file_port>
--event-port <event_port>
--watchdog-port <watch_port>
--metrics-port <metrics_port>
--log-level <log_level>
|
Updated port number for the Robin Server |
|
Updated port number for the File Server |
|
Updated port number for the Event Server |
|
Updated port number for the Watchdog Server |
|
Updated port number for the Metrics Server |
|
Updated number indicating the verbosity of logs. Valid values are 10 (DEBUG), 20 (INFO), 40 (ERROR) |
Example:
# robin client update-context --port 29942 --file-port 29445 --watchdog-port 29444 --metrics-port 29446
Updating attributes for context robin-cluster-centos-60-214
Server: centos-60-214
Context config updated for robin-cluster-centos-60-214
5.7.6. Deleting a context¶
In order to remove a registered context, issue the following command:
# robin client delete-context <context>
|
The server attribute of the context to be deleted |
Example:
# robin client delete-context centos-60-214
Context centos-60-214 deleted