21. Troubleshooting Tools

Robin Platform provides a number of native tools and commands for an administrator to utilize in order troubleshoot their Robin cluster and/or report issues. These tools vary in their use case but provide enough information to provide insight as to why as the cluster is not functioning as intended or the reason for unexpected failures. As a result, they should be the go-to utilities when debugging potential issues and their outputs should be sent alongside any bug reports filed to Robin. Each tool has been described in their respective sections below.

Alongside the aforementioned tools for administrators, Robin Platform also provides more granular commands, detailed in the sections below, for individual users to track the progress of their executed operations and determine reasons for their failure. These operations are referred to as jobs and are identified by a unique ID. Each job has a set of attributes such as the aforementioned job ID, job type, description, and so on. Robin stores a record of each job, including the metadata, within the database alongside respective job logs on the relevant nodes. An adminstrator can view the job logs and troubleshoot issues within the cluster with them. It is reccomended that the complete job logs are provided when reporting issues to Robin for debugging purposes.

The Robin job logs are stored in the following directoroies within the Robin container:

  • Server side job logs are stored within /var/log/robin/server. Note this directory only present on the Robin master nodes.

  • Worker/agent side job logs are stored within /var/log/robin/agent. This directory is present on all Robin nodes.

In order to access the job logs on the host instead of within the container, the /home/robinds/var/log/robin/server and /home/robinds/var/log/robin/agent can be used respectively.

21.1. Listing all jobs

Robin stores all jobs that have occurred during a cluster’s lifespan. To view these jobs alongside details such as their start time, state etc. issue the following command:

# robin job list --verbose
                 --ignoredeps
                 --noarchived
                 --nopurged
                 --states  <states>
                 --failed
                 --nocolor
                 --page_size <size>
                 --page_num <num>
                 --total
                 --all
                 --app <app_name>
                 --k8sapp <k8sapp_name>
                 --vnode <vnode_name>
                 --node <node_name>
                 --disk <disk_wwn>

--verbose

Show complete job information instead of truncating it for display purposes.

--ignoredeps

Do not show child jobs

--noarchived

Do not show archived jobs

--nopurged

Do not show purged jobs

--states <states>

Filter jobs based on states. Choose one or more from: active, failed, succeeded, archived, purged

--failed

Show only jobs which have failed

--nocolor

Show uncolored output

--page_size <size>

Number of jobs that should be displayed for each page

--page_num <num>

Page number to start displaying jobs from (starting index 1)

--total

Return the total number of qualified root jobs

--all

Display all jobs associated with a specific application. Note this option must be used in conjunction with the --app option

--app <app_name>

Filter jobs based on specified application

--k8sapp <k8sapp_name>

Filter jobs based on specified K8s/Helm registered application name

--vnode <vnode_name>

Filter jobs based on specified Vnode name

--node <node_name>

Filter jobs based on specified physical node name

--disk <disk_wwn>

Filter jobs based on specified disk WWN

Example:

Output
# robin job list
ID            | Type              | Description                                                                                                                                | State            | Start           | End     | User   | Message
--------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+---------+--------+------------------------------------------
1013          | ApplicationStart  | Starting application 'wp-10'                                                                                                               | COMPLETED        | 13 Aug 23:28:29 | 0:00:54 | system |
|->1015       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 13 Aug 23:28:30 | 0:00:38 | system |
|  |->1017    | VnodeDeploy       | Deploying vnode 'wp-10.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                            | COMPLETED        | 13 Aug 23:28:30 | 0:00:38 | system |
|  |  |->1018 | VnodeStop         | Stopping vnode wp-10.mysql.01 on cscale-82-140.robinsystems.com                                                                            | COMPLETED        | 13 Aug 23:28:30 | 0:00:15 | system |
|->1016       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |->1024    | VnodeDeploy       | Deploying vnode 'wp-10.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |  |->1025 | VnodeStop         | Stopping vnode wp-10.wordpress.01 on cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:07 | system |
1014          | ApplicationStart  | ApplicationStart                                                                                                                           | COMPLETED|FAILED | 13 Aug 23:28:29 | 0:00:00 | system | Another job is running on application 'w
1019          | ApplicationStart  | Starting application 'wp-20'                                                                                                               | COMPLETED        | 13 Aug 23:28:31 | 0:00:51 | system |
|->1020       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 13 Aug 23:28:32 | 0:00:36 | system |
|  |->1022    | VnodeDeploy       | Deploying vnode 'wp-20.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                            | COMPLETED        | 13 Aug 23:28:32 | 0:00:36 | system |
|  |  |->1023 | VnodeStop         | Stopping vnode wp-20.mysql.01 on cscale-82-140.robinsystems.com                                                                            | COMPLETED        | 13 Aug 23:28:32 | 0:00:13 | system |
|->1021       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |->1026    | VnodeDeploy       | Deploying vnode 'wp-20.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |  |->1027 | VnodeStop         | Stopping vnode wp-20.wordpress.01 on cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:05 | system |
1028          | JobArchive        | Archiving job/s on all hosts                                                                                                               | COMPLETED        | 14 Aug 00:00:00 | 0:00:02 | system |
|->1029       | AgentJobArchive   | Archiving job/s on host cscale-82-140.robinsystems.com                                                                                     | COMPLETED        | 14 Aug 00:00:01 | 0:00:00 | system |
1030          | HostProbe         | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch.                                       | COMPLETED        | 14 Aug 07:54:37 | 0:00:01 | system |
1031          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch.                       | COMPLETED        | 14 Aug 07:54:37 | 0:00:51 | system |
1032          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch.                       | COMPLETED        | 14 Aug 08:11:11 | 0:00:50 | system |
1033          | HostProbe         | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch.                                       | COMPLETED        | 14 Aug 08:11:11 | 0:00:01 | system |
1034          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp.                                | COMPLETED        | 14 Aug 09:24:17 | 0:00:50 | system |
1035          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED|FAILED | 14 Aug 09:25:07 | 0:01:40 | system | Pods do not need to be failed over as Ku
1036          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Ready. Origin: StateChange.                                     | COMPLETED        | 14 Aug 09:25:17 | 0:00:01 | system |
1037          | ApplicationDelete | Deleting application 'wp-10'                                                                                                               | COMPLETED        | 14 Aug 09:41:10 | 0:00:12 | robin  |
|->1038       | VnodeDelete       | Deleting vnode 'wp-10.wordpress.01' from cscale-82-140.robinsystems.com                                                                    | COMPLETED        | 14 Aug 09:41:10 | 0:00:06 | robin  |
|->1039       | VnodeDelete       | Deleting vnode 'wp-10.mysql.01' from cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 14 Aug 09:41:10 | 0:00:08 | robin  |
1040          | ApplicationDelete | Deleting application 'wp-20'                                                                                                               | COMPLETED        | 14 Aug 09:41:16 | 0:00:13 | robin  |
|->1041       | VnodeDelete       | Deleting vnode 'wp-20.wordpress.01' from cscale-82-140.robinsystems.com                                                                    | COMPLETED        | 14 Aug 09:41:16 | 0:00:10 | robin  |
|->1042       | VnodeDelete       | Deleting vnode 'wp-20.mysql.01' from cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 14 Aug 09:41:16 | 0:00:09 | robin  |
1043          | ApplicationDelete | Deleting application 'wp-30'                                                                                                               | COMPLETED        | 14 Aug 09:41:20 | 0:00:19 | robin  |
|->1044       | VnodeDelete       | Deleting vnode 'wp-30.wordpress.01' from cscale-82-140.robinsystems.com                                                                    | COMPLETED        | 14 Aug 09:41:20 | 0:00:06 | robin  |
|->1045       | VnodeDelete       | Deleting vnode 'wp-30.mysql.01' from cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 14 Aug 09:41:20 | 0:00:15 | robin  |
1046          | ApplicationCreate | Adding application 'wp-1'                                                                                                                  | COMPLETED        | 14 Aug 09:42:58 | 0:00:58 | robin  |
|->1047       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:43:00 | 0:00:42 | robin  |
|  |->1049    | VnodeAdd          | Adding vnode 'wp-1.mysql.01' on cscale-82-140.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:43:00 | 0:00:42 | robin  |
|->1048       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:43:42 | 0:00:14 | robin  |
|  |->1053    | VnodeAdd          | Adding vnode 'wp-1.wordpress.01' on cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:43:42 | 0:00:14 | robin  |
1050          | ApplicationCreate | Adding application 'wp-2'                                                                                                                  | COMPLETED        | 14 Aug 09:43:39 | 0:00:46 | robin  |
|->1051       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:43:42 | 0:00:34 | robin  |
|  |->1054    | VnodeAdd          | Adding vnode 'wp-2.mysql.01' on cscale-82-140.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:43:42 | 0:00:34 | robin  |
|->1052       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:44:16 | 0:00:09 | robin  |
|  |->1055    | VnodeAdd          | Adding vnode 'wp-2.wordpress.01' on cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:44:16 | 0:00:09 | robin  |
1056          | ApplicationCreate | Adding application 'wp-3'                                                                                                                  | COMPLETED        | 14 Aug 09:44:18 | 0:00:57 | robin  |
|->1057       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:44:20 | 0:00:41 | robin  |
|  |->1059    | VnodeAdd          | Adding vnode 'wp-3.mysql.01' on cscale-82-140.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:44:20 | 0:00:41 | robin  |
|->1058       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:45:01 | 0:00:13 | robin  |
|  |->1067    | VnodeAdd          | Adding vnode 'wp-3.wordpress.01' on cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:45:02 | 0:00:12 | robin  |
1060          | ApplicationDelete | Deleting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 09:44:53 | 0:00:17 | robin  |
|->1061       | VnodeDelete       | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 09:44:53 | 0:00:05 | robin  |
|->1062       | VnodeDelete       | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:44:53 | 0:00:13 | robin  |
1063          | ApplicationDelete | Deleting application 'wp-2'                                                                                                                | COMPLETED        | 14 Aug 09:44:57 | 0:00:21 | robin  |
|->1064       | VnodeDelete       | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 09:44:57 | 0:00:09 | robin  |
|->1065       | VnodeDelete       | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:44:57 | 0:00:18 | robin  |
1066          | ApplicationDelete | ApplicationDelete                                                                                                                          | COMPLETED|FAILED | 14 Aug 09:45:01 | 0:00:00 | robin  | Another job is running on application 'w
1068          | ApplicationProbe  | Probing application 'wp-3'                                                                                                                 | COMPLETED        | 14 Aug 09:45:12 | 0:00:00 | robin  |
1069          | ApplicationDelete | Deleting application 'wp-3'                                                                                                                | COMPLETED        | 14 Aug 09:45:16 | 0:00:12 | robin  |
|->1070       | VnodeDelete       | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 09:45:16 | 0:00:05 | robin  |
|->1071       | VnodeDelete       | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:45:16 | 0:00:09 | robin  |
1072          | ApplicationCreate | Adding application 'wp-1'                                                                                                                  | COMPLETED        | 14 Aug 09:47:03 | 0:00:45 | robin  |
|->1074       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:47:39 | 0:00:08 | robin  |
|  |->1076    | VnodeAdd          | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:47:39 | 0:00:08 | robin  |
|->1073       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:47:05 | 0:00:34 | robin  |
|  |->1075    | VnodeAdd          | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:47:05 | 0:00:34 | robin  |
1077          | ApplicationCreate | Adding application 'wp-2'                                                                                                                  | COMPLETED        | 14 Aug 09:47:43 | 0:00:44 | robin  |
|->1079       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:48:18 | 0:00:09 | robin  |
|  |->1081    | VnodeAdd          | Adding vnode 'wp-2.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:48:18 | 0:00:09 | robin  |
|->1078       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:47:45 | 0:00:33 | robin  |
|  |->1080    | VnodeAdd          | Adding vnode 'wp-2.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:47:45 | 0:00:33 | robin  |
1082          | ApplicationCreate | Adding application 'wp-3'                                                                                                                  | COMPLETED        | 14 Aug 09:49:14 | 0:03:12 | robin  |
|->1083       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:49:16 | 0:02:49 | robin  |
|  |->1085    | VnodeAdd          | Adding vnode 'wp-3.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:49:16 | 0:02:49 | robin  |
|->1084       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:52:05 | 0:00:20 | robin  |
|  |->1086    | VnodeAdd          | Adding vnode 'wp-3.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:52:05 | 0:00:20 | robin  |
1087          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown.                              | COMPLETED        | 14 Aug 09:53:43 | 0:00:52 | system |
1088          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED        | 14 Aug 09:54:35 | 0:00:01 | system |
1089          | ApplicationStart  | Starting application 'wp-3'                                                                                                                | COMPLETED        | 14 Aug 09:54:38 | 0:03:41 | system |
|->1092       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 09:54:38 | 0:01:53 | system |
|  |->1094    | VnodeDeploy       | Deploying vnode 'wp-3.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 09:54:38 | 0:01:53 | system |
|->1093       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 09:56:31 | 0:01:48 | system |
|  |->1102    | VnodeDeploy       | Deploying vnode 'wp-3.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 09:56:31 | 0:01:48 | system |
1090          | ApplicationStart  | Starting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 09:54:38 | 0:03:44 | system |
|->1098       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 09:54:39 | 0:01:51 | system |
|  |->1100    | VnodeDeploy       | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 09:54:39 | 0:01:51 | system |
|->1099       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 09:56:30 | 0:01:52 | system |
|  |->1101    | VnodeDeploy       | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 09:56:30 | 0:01:52 | system |
1091          | ApplicationStart  | Starting application 'wp-2'                                                                                                                | COMPLETED        | 14 Aug 09:54:38 | 0:03:44 | system |
|->1095       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 09:54:39 | 0:01:52 | system |
|  |->1097    | VnodeDeploy       | Deploying vnode 'wp-2.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 09:54:39 | 0:01:52 | system |
|->1096       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 09:56:31 | 0:01:51 | system |
|  |->1103    | VnodeDeploy       | Deploying vnode 'wp-2.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 09:56:32 | 0:01:50 | system |
1104          | ApplicationDelete | Deleting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 10:18:34 | 0:00:15 | robin  |
|->1105       | VnodeDelete       | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 10:18:34 | 0:00:06 | robin  |
|->1106       | VnodeDelete       | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:18:34 | 0:00:11 | robin  |
1107          | ApplicationDelete | Deleting application 'wp-2'                                                                                                                | COMPLETED        | 14 Aug 10:18:38 | 0:00:14 | robin  |
|->1108       | VnodeDelete       | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 10:18:38 | 0:00:06 | robin  |
|->1109       | VnodeDelete       | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:18:38 | 0:00:08 | robin  |
1110          | ApplicationDelete | Deleting application 'wp-3'                                                                                                                | COMPLETED        | 14 Aug 10:18:43 | 0:00:15 | robin  |
|->1111       | VnodeDelete       | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 10:18:43 | 0:00:12 | robin  |
|->1112       | VnodeDelete       | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:18:43 | 0:00:13 | robin  |
1113          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp.                                | COMPLETED        | 14 Aug 10:20:02 | 0:00:50 | system |
1114          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED|FAILED | 14 Aug 10:20:52 | 0:01:40 | system | Pods do not need to be failed over as Ku
1115          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'} | COMPLETED        | 14 Aug 10:22:17 | 0:00:00 | system |
1116          | HostProbe         | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'}      | COMPLETED        | 14 Aug 10:22:47 | 0:00:00 | system |
1117          | HostProbe         | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Ready. Origin: StateChange.                                          | COMPLETED        | 14 Aug 10:22:59 | 0:00:00 | system |
1118          | ApplicationCreate | Adding application 'wp-1'                                                                                                                  | COMPLETED        | 14 Aug 10:40:21 | 0:01:05 | robin  |
|->1119       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 10:40:24 | 0:00:41 | robin  |
|  |->1121    | VnodeAdd          | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 10:40:24 | 0:00:41 | robin  |
|->1120       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 10:41:05 | 0:00:21 | robin  |
|  |->1122    | VnodeAdd          | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:41:05 | 0:00:21 | robin  |
1123          | ApplicationCreate | Adding application 'wp-2-no-aff'                                                                                                           | COMPLETED        | 14 Aug 10:45:45 | 0:00:57 | robin  |
|->1124       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 10:45:48 | 0:00:41 | robin  |
|  |->1126    | VnodeAdd          | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com                                                                      | COMPLETED        | 14 Aug 10:45:48 | 0:00:41 | robin  |
|->1125       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 10:46:29 | 0:00:13 | robin  |
|  |->1127    | VnodeAdd          | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com                                                                  | COMPLETED        | 14 Aug 10:46:29 | 0:00:13 | robin  |
1128          | ApplicationCreate | Adding application 'wp-3-no-aff'                                                                                                           | COMPLETED        | 14 Aug 10:46:33 | 0:00:39 | robin  |
|->1129       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 10:46:35 | 0:00:28 | robin  |
|  |->1131    | VnodeAdd          | Adding vnode 'wp-3-no-aff.mysql.01' on cscale-82-139.robinsystems.com                                                                      | COMPLETED        | 14 Aug 10:46:35 | 0:00:28 | robin  |
|->1130       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 10:47:03 | 0:00:09 | robin  |
|  |->1132    | VnodeAdd          | Adding vnode 'wp-3-no-aff.wordpress.01' on cscale-82-139.robinsystems.com                                                                  | COMPLETED        | 14 Aug 10:47:03 | 0:00:09 | robin  |
1133          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown.                              | COMPLETED        | 14 Aug 10:49:36 | 0:00:52 | system |
1134          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED        | 14 Aug 10:50:28 | 0:00:01 | system |
1135          | ApplicationStart  | Starting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 10:50:29 | 0:03:22 | system |
|->1141       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 10:52:16 | 0:01:35 | system |
|  |->1143    | VnodeDeploy       | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 10:52:16 | 0:01:35 | system |
|->1140       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 10:50:30 | 0:01:46 | system |
|  |->1142    | VnodeDeploy       | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 10:50:30 | 0:01:46 | system |
1136          | VnodeDeploy       | Deploying vnode 'wp-3-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                 | COMPLETED        | 14 Aug 10:50:29 | 0:01:48 | robin  |
1137          | VnodeDeploy       | Deploying vnode 'wp-3-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                     | COMPLETED        | 14 Aug 10:50:29 | 0:02:04 | robin  |
1138          | VnodeDeploy       | Deploying vnode 'wp-2-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                     | COMPLETED        | 14 Aug 10:50:29 | 0:02:07 | robin  |
1139          | VnodeDeploy       | Deploying vnode 'wp-2-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                 | COMPLETED        | 14 Aug 10:50:29 | 0:01:44 | robin  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Returns all jobs that have occurred during a cluster’s lifespan.

End Point: /api/v5/robin_server/jobs

Method: GET

URL Parameters:

  • sort=[id|-id] : Utilizing this parameter results in the list of jobs returned being sorted by their id.

  • noarchived=true : Utilizing this parameter results in archived jobs not being returned.

  • nopurged=true : Utilizing this parameter results in purged jobs not being returned.

  • failed=true : Utilizing this parameter results in only failed jobs being returned.

  • parent=true : Utilizing this parameter results in only parent jobs being returned.

  • page_size=<size> : Utilizing this parameter results in <size> number of jobs being returned.

  • page_num=<index> : Utilizing this parameter results in jobs starting from <index> being returned.

  • objtype=[APPLICATION|K8S_APPLICATION|INSTANCE|DISK|NODE] : Utilizing this parameter results in only jobs for the specified object type being returned.

  • objname=<obj_name> : Utilizing this parameter results in only jobs for objects with the specified name being returned.

  • all=true : Utilizing this parameter results in all jobs being returned. Note this option is only valid when an application name is specified.

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

Output
{
   "page_size":10,
   "items":{
      "users":[
         {
            "email":null,
            "tenantid":1,
            "firstname":"Robin",
            "username":"robin",
            "id":3,
            "lastname":"Systems"
         }
      ],
      "jobs":[
         {
            "jobid":1888,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[1889]",
            "endtime":1597456503,
            "children":[
               {
                  "jobid":1889,
                  "tenant_id":1,
                  "enabled":true,
                  "child_job_ids":"[]",
                  "endtime":1597456498,
                  "parent_jobid":1888,
                  "error":0,
                  "message":"",
                  "taskrunner":1,
                  "starttime":1597456497,
                  "dependson_job_ids":"[]",
                  "level":"child",
                  "user_id":1,
                  "jtype":"CollectionOffline",
                  "timeout":86400,
                  "state":10,
                  "desc":"Taking collection 'file-collection-1597122699552' offline (Force False)"
               }
            ],
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":1,
            "starttime":1597456496,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"CollectionOnline",
            "timeout":86400,
            "state":10,
            "desc":"Bringing collection 'file-collection-1597122699552' online"
         },
         {
            "jobid":1887,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[1890]",
            "endtime":1597456504,
            "children":[
               {
                  "jobid":1890,
                  "tenant_id":1,
                  "enabled":true,
                  "child_job_ids":"[]",
                  "endtime":1597456499,
                  "parent_jobid":1887,
                  "error":0,
                  "message":"",
                  "taskrunner":1,
                  "starttime":1597456497,
                  "dependson_job_ids":"[]",
                  "level":"child",
                  "user_id":3,
                  "jtype":"VnodeStop",
                  "timeout":86400,
                  "state":10,
                  "desc":"Stopping vnode test-ds-1.server.01 on cscale-82-140.robinsystems.com"
               }
            ],
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":1,
            "starttime":1597456496,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":3,
            "jtype":"VnodeDeploy",
            "timeout":86400,
            "state":10,
            "desc":"Deploying vnode 'test-ds-1.server.01'. Origin: Event (cscale-82-140.robinsystems.com)"
         },
         {
            "jobid":1886,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456488,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456487,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Ready. Origin: StateChange."
         },
         {
            "jobid":1885,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456476,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456475,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Notready. Origin: StateChange.. Services Down: {'iomgr-server'}"
         },
         {
            "jobid":1884,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456470,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456470,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/WaitingForMonitor ==> ONLINE\/Notready. Origin: StartingHostWatch.. Services Down: {'iomgr-server'}"
         },
         {
            "jobid":1883,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456520,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456469,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-139.robinsystems.com from UNREACHABLE\/Notready ==> UNREACHABLE\/Notready. Origin: StartingHostWatch."
         },
         {
            "jobid":1882,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x60022480940ed076551cfaf75612e24e'"
         },
         {
            "jobid":1881,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x60022480ffcf3deb224fb37d78fe7767'"
         },
         {
            "jobid":1880,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x600224804c48fd7e16c608dea0919064'"
         },
         {
            "jobid":1879,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x600224803bcdafde95b1f5cd27ceb5fb'"
         }
      ]
   },
   "total":1542,
   "num_items":10,
   "page_num":1
}

21.2. Show information about a specific job

In order to get more detailed information about a specific job including the state, duration and any errors related to it and any respective child jobs, issue the following command:

# robin job info <id>

id

Job ID

Example:

# robin job info 1123
ID         | Type              | Desc                                                                      | State     | Start           | End      | Duration | Dependson | Error | Message
-----------+-------------------+---------------------------------------------------------------------------+-----------+-----------------+----------+----------+-----------+-------+---------
1123       | ApplicationCreate | Adding application 'wp-2-no-aff'                                          | COMPLETED | 14 Aug 10:45:45 | 10:46:42 | 0:00:57  | []        | 0     |
|->1124    | RoleCreate        | Provisioning containers for role 'mysql'                                  | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41  | []        | 0     |
|  |->1126 | VnodeAdd          | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com     | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41  | []        | 0     |
|->1125    | RoleCreate        | Provisioning containers for role 'wordpress'                              | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13  | [1124]    | 0     |
|  |->1127 | VnodeAdd          | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13  | []        | 0     |

Returns details about a specific job and any of its respective child jobs.

End Point: /api/v3/robin_server/jobs/<job_id>

Method: GET

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Authorization Error)

Example Response:

Output
{
   "tenant_name":"Administrators",
   "jobid":1888,
   "tenant_id":1,
   "enabled":true,
   "json":{
      "collection_id":1597122699552,
      "state":"SuspectedOffline",
      "set_failed":true,
      "origin":2,
      "hostname":"cscale-82-140.robinsystems.com"
   },
   "user_name":"system",
   "endtime":1597456503,
   "parent_jobid":0,
   "error":0,
   "message":"",
   "taskrunner":1,
   "starttime":1597456496,
   "child_job_ids":"[1889]",
   "cjobs":[
      {
         "tenant_name":"Administrators",
         "jobid":1889,
         "tenant_id":1,
         "enabled":true,
         "json":{
            "collection_id":1597122699552
         },
         "user_name":"system",
         "endtime":1597456498,
         "parent_jobid":1888,
         "error":0,
         "message":"",
         "taskrunner":1,
         "starttime":1597456497,
         "child_job_ids":"[]",
         "cjobs":[

         ],
         "dependson_job_ids":"[]",
         "user_id":1,
         "jtype":"CollectionOffline",
         "timeout":86400,
         "state":10,
         "desc":"Taking collection 'file-collection-1597122699552' offline (Force False)",
         "priority":300
      }
   ],
   "dependson_job_ids":"[]",
   "user_id":1,
   "jtype":"CollectionOnline",
   "timeout":86400,
   "state":10,
   "desc":"Bringing collection 'file-collection-1597122699552' online",
   "priority":300
}

21.3. Retrieving Job Logs

Robin provides a utility which collects all the appropriate logs from the necessary nodes for a particular job and its consequent hierarchy. It stores these logs within a single tarball that can be provided to Robin alongside a bug report. In addition this useful for an Administrator to debug as to why a job failed unexpectedly. This functionality is extremely convienent as it automates the process of the user logging into every affected node and collecting/inspecting the relevant log files. Issue the following command to retrieve logs for a specific job:

# robin job get <id>

id

ID of job to collect the logs for

Example:

# robin job get 1
Retrieving log files...
Log files for Job ids: [1] are retrieved successfully at 1582189081.tar.gz

21.4. Archiving Job Logs

Robin Job logs can be archived in order to prevent the data loss, to improve the security, and to increase free space in the parent directory. The archival process involves moving all completed job logs to the archived sub-directory of the parent directory. The archived sub-directories are contained within the /var/log/robin/server and var/log/robin/agent directories. There are two methods via which this can be achieved. The first is via an automatic schedule, whose parameters can be configured, and the second is the robin job archive command detailed below. By default, the Robin job archive schedule automatically archives the logs for jobs that were completed successfully and are older than 24 hours.

Note

The logs for failed jobs remain in the parent directories for analysis purposes.

21.4.1. Archive a Job on demand

In order to archive Robin jobs and their respective logs on demand, run the following command:

# robin job archive --age  <age>
                    --include-failed

--age <age>

Minimum age (in minutes) of the job(s) whose logs should be archived

--include-failed

Archive the logs for failed jobs as well

Example:

# robin job archive --age 600 --wait
Job: 255170 Name: JobArchive           State: PROCESSED       Error: 0
Job: 255170 Name: JobArchive           State: PREPARED        Error: 0
Job: 255170 Name: JobArchive           State: WAITING         Error: 0
Job: 255170 Name: JobArchive           State: COMPLETED       Error: 0

21.4.2. Configure Job archive schedule attributes

Listed below are all the attributes a user can configure with regards to the scheduled job archival task.

Attribute

Default value

Description

job_archive_age

86400

The age (in seconds) of the completed job(s) whose logs should be automatically archived.

job_archive_cron

0 0 * * *

The time at which the job archival schedule is run. The value for this attribute must be a valid CRON format, details of which can be found here.

In order to update any of the aforementioned configurable attributes, run the following command:

# robin config update server <attribute> <value>

Example:

# robin config update server job_archive_age 81000
The 'server' attribute 'job_archive_age' has been updated

21.5. Purging Job logs

Robin enables users to purge logs for jobs in the case they are no longer needed and space needs to be freed up. There are two methods via which this can be achieved. The first is via an automatic schedule, whose parameters can be configured, and the second is the robin job purge command detailed in the section below. By default, the Robin job purge schedule removes jobs (and their respective logs) that fall into any of the following categories:

  • Successful jobs older than two weeks.

  • Failed jobs older than four weeks.

  • Robin maintenance jobs older than one week.

Both methods for purging a job, remove the record(s) of a job and its respective child jobs from the Robin database as well delete the following associated log files if present:

  • The server job log directory at /var/log/robin/server/<job-id> on the Robin master node.

  • The archived server job log directory at /var/log/robin/server/archived/<job-id>.tar.gz on the Robin master node.

  • The agent job log directory at /var/log/robin/agent/<job-id> on all nodes.

  • The archived agent job log directory at /var/log/robin/agent/archived/<job-id>.tar.gz on all nodes.

21.5.1. Purge a Job on demand

In order to purge Robin jobs and their respective logs on demand, run the following command:

# robin job purge --age <age>
                  --failed-job-age <failed_age>
                  --maintenance-job-age <maintenance_age>
                  --maintenance-job-types <maintenance_types>
                  --before-id <id>

--age <age>

Purge successful jobs that have completed before specified date and time in ‘%Y-%m-%dT%H:%M:%S’ format. The default is two weeks earlier than the current date.

--failed-job-age <failed_age>

Purge failed jobs that have completed before specified date and time in ‘%Y-%m-%dT%H:%M:%S’ format. The default is four weeks earlier than the current date.

--maintenance-job-age <maintenance_age>

Purge maintenance jobs that have completed before specified date and time in ‘%Y-%m-%dT%H:%M:%S’ format. The default is four weeks earlier than the current date.

--maintenance-job-types <maintenance_types>

Comma seperated list of job types to be considered maintenance jobs. The default types include: JobArchive and JobPurge.

--before-id <id>

Jobs whose IDs are lower than the specified ID will be purged. Note if --age is specified, it will take precedence.

Example:

# robin job purge --age 2021-04-06T18:14:00 --failed-job-age 2021-04-06T18:14:00 --maintenance-job-age 2021-04-06T18:14:00 --wait
Job:  309 Name: JobPurge             State: VALIDATED       Error: 0
Job:  309 Name: JobPurge             State: COMPLETED       Error: 0

21.5.2. Configure Job purge schedule attributes

Listed below are all the attributes a user can configure with regards to the scheduled job purge task.

Attribute

Default value

Description

job_purge_age

1209600

The age (in seconds) of the completed job(s) which should be automatically purged.

job_purge_cron

30 0 * * *

The time at which the job archival schedule is run. The value for this attribute must be a valid CRON format, details of which can be found here. Robin recommends that the schedule run daily

job_purge_failed_age

2419200

The age (in seconds) of the failed job(s) which should be automatically purged.

job_purge_maintenance_age

604800

The age (in seconds) of the the maintenance job(s) which should be automatically purged.

job_purge_maintenance_jtypes

JobArchive,JobPurge

The types of maintenance jobs to be purged.

job_purge_max_count

100000

The maximum number of jobs that can be purged at a time.

In order to update any of the aforementioned configurable attributes, run the following command:

# robin config update server <attribute> <value>

Example:

# robin config update server job_purge_age 13396198
The 'server' attribute 'job_purge_age' has been updated

21.6. Cleaning up stale Job logs

In certain cases logs for jobs can remain within their respective job directories or within the archived job log directory even though the record for the job has been deleted from the database. These job logs are deemed to be stale as the Robin database is considered to be the most reliable source of the jobs run on the cluster. Robin provides two methods by which these stale job logs can be removed. The first is via an automatic schedule, whose parameters can be configured, and the second is the robin job cleanup command detailed in the section below. By default, the Robin job cleanup schedule removes the logs for job whose records are no longer stored within the database on the first day of every month.

Note

It is recommended that the reconciliation between the job records stored and the logs present happen at least once a month to free up space and avoid retaining the logs for jobs which are no longer relevant.

21.6.1. Cleanup stale Job logs on demand

In order to cleanup stale job logs present on the cluster, run the following command:

# robin job cleanup

Example:

# robin job cleanup --wait
Job: 358447 Name: JobCleanupStaleLogs  State: WAITING         Error: 0
Job: 358447 Name: JobCleanupStaleLogs  State: COMPLETED       Error: 0

21.6.2. Configure Job cleanup schedule attributes

Listed below are all the attributes a user can configure with regards to the scheduled job cleanup task.

Attribute

Default value

Description

job_cleanup_cron

0 1 1 * *

The time at which the job cleanup schedule is run. The value for this attribute must be a valid CRON format, details of which can be found here. Robin recommends that the schedule run monthly.

In order to update any of the aforementioned configurable attributes, run the following command:

# robin config update server <attribute> <value>

Example:

# robin config update server job_cleanup_cron "0 1 2 * *"
The 'server' attribute 'job_cleanup_cron' has been updated

21.7. Log Collection

During any cluster wide failure or unexpected negative scenarios that affect multiple services, logs from all the system components will be needed by Robin in order to debug the issue properly. However sometimes given the scope of the issue, only a subsection of logs need to be collected. This granularity is available but it is highly recommended to always send the complete set of logs when filing a bug report with Robin. Available age-based filtering helps in reducing storage footprint. Robin supports uploading logs to the following destinations:

robin-storage

Used to store collected logs in Robin backed storage

nfs

Used to store collected logs in NFS.

s3

Used to store collected logs in Amazon S3

ssh

Used to store collected logs in a given remote location

21.7.1. Storing logs using Robin Storage

Logs collected by Robin can be stored on a volume created on the local cluster, with the following command:

Note

If you do not use the --age option, by default, Robin CNP collects the logs for the last 3 days.

# robin log collect robin-storage <rpool>
                                  --nodes <nodes>
                                  --dest-path <dest_path>
                                  --size <size>
                                  --media <media>
                                  --age <age>

rpool

Name of the resource pool name to use.

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--dest-path <dest_path>

Destination path where log files will be copied

--size <size>

Size of the storage volume for the log collect. The default is 250GB

--media <media>

Specify which type of drives to allocate storage from. Choices include: ‘HDD’, ‘SSD’. Default media type is ‘HDD’

--age <AGE>

Collects log based on age. Valid options are s(sec),m(min),h(hrs),d(days),Mo(month) y(years) and all. Example: Use 10m for 10 minutes. Default option is to collect logs of last 3 days.

Example:

# robin log collect robin-storage default --wait
Job:  123 Name: LogCollect           State: PROCESSED       Error: 0
Job:  123 Name: LogCollect           State: WAITING         Error: 0
Job:  123 Name: LogCollect           State: COMPLETED       Error: 0

21.7.2. Storing logs using NFS

Logs collected by Robin can be stored on a NFS share, with the following command:

Note

If you do not use the --age option, by default, Robin CNP collects the logs for the last 3 days.

# robin log collect nfs <nfs_share>
                        --nodes <nodes>
                        --age <age>

nfs_share

The ‘hostname’ or ‘IP’, ‘export_path’ and ‘dest_path’ for an NFS share in the form of <hostname|IP>:<export_path>:<dest_path>’

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--age <AGE>

Collects log based on age. Valid options are s(sec),m(min),h(hrs),d(days),Mo(month) y(years) and all. Example: Use 10m for 10 minutes. Default option is to collect logs of last 3 days.

Example:

# robin log collect nfs 10.9.82.162:/tmp:/demo_log_collect
Job:  126 Name: LogCollect           State: PROCESSED       Error: 0
Job:  126 Name: LogCollect           State: WAITING         Error: 0
Job:  126 Name: LogCollect           State: COMPLETED       Error: 0

21.7.3. Storing logs using AWS S3

Logs collected by Robin can be stored on a AWS S3, with the following command:

Note

If you do not use the --age option, by default, Robin CNP collects the logs for the last 3 days.

# robin log collect s3 <url> <aws_config>
                             --nodes <nodes>
                             --access_key <access_key>
                             --secret_key <secret_key>
                             --age <age>

url

S3 URL in the format https://s3-<region-name>.amazonaws.com/<bucket-name>/<directory>

aws_config

JSON file containing Access key, Secret Key and Region. Example format {“aws_access_key_id”: <key>, “aws_secret_access_key”: <key>, “region”: <region_name>}

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--access_key <access_key>

Access Key for the respective user with access to the specified S3 bucket.

--secret_key <secret_key>

Secret Key for the respective user with access to the specified S3 bucket.

--age <AGE>

Collects log based on age. Valid options are s(sec),m(min),h(hrs),d(days),Mo(month) y(years) and all. Example: Use 10m for 10 minutes. Default option is to collect logs of last 3 days.

Example:

# robin log collect s3 https://s3-us-west-2.amazonaws.com/log-collect/demo_log_collect /root/aws.json --wait
Job:  132 Name: LogCollect           State: PROCESSED       Error: 0
Job:  132 Name: LogCollect           State: WAITING         Error: 0
Job:  132 Name: LogCollect           State: COMPLETED       Error: 0

21.7.4. Storing logs in a remote location

Logs collected by Robin can be stored in a remote location, with the following command:

Note

If you do not use the --age option, by default, Robin CNP collects the logs for the last 3 days.

# robin log collect ssh <dest>
                        --nodes <nodes>
                        --password <password>
                        --age <age>

dest

Destination path where the log files will be copied to. The path should be in the form of ‘<user>@<hostname|IP>:<path>’

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--password <password>

Provide a password on the command line instead of via a prompt

--age <AGE>

Collects log based on age. Valid options are s(sec),m(min),h(hrs),d(days),Mo(month) y(years) and all. Example: Use 10m for 10 minutes. Default option is to collect logs of last 3 days.

Example:

# robin log collect ssh root@10.9.82.163:/demo_log_collect --password robin123
Job:  129 Name: LogCollect           State: PROCESSED       Error: 0
Job:  129 Name: LogCollect           State: WAITING         Error: 0
Job:  129 Name: LogCollect           State: COMPLETED       Error: 0

21.8. Cluster Auditing

Every operation that is performed by a user on an identifiable object within a Robin cluster is logged for auditing purposes. This allows admins to track the exact series of operations performed by a user as well to monitor the general activity on the concerned cluster. This not only enables more accurate backtracking for troubleshooting purposes as well improving the thoroughness of security audits. Detailed below are the methods by which a user can retrieve the audit log.

21.8.1. Retrieving audit logs from the Robin Database

In order to access the audit log containing information such as which user executed an operation, the tenant and node from which they executed it from, the type of object and operation involved, and the result of the operation issue the following command:

# robin user-audit list --exec-user <exec_user>
                        --exec-tenant <exec_tenant>
                        --owner-user <owner_user>
                        --owner-tenant <owner_tenant>
                        --id <record_id>
                        --object-type <object_type>
                        --page_size <size>
                        --page_num <num>
                        --operation <operation>
                        --result <result>
                        --full

--exec-user <exec_user>

Filter by username for the user who initiated the operation. Note this option cannot be used in conjunction with --owner-user parameter

--exec-tenant <exec_tenant>

Filter by tenant name for the user who initiated the operation. Note this option cannot be used in conjunction with --owner-tenant

--owner-user <owner_user>

Filter by username for the user who initiated the operation. Note this option cannot be used in conjunction with --exec-user

--owner-tenant <owner_tenant>

Filter by tenant name for the user who initiated the operation. Note this option cannot be used in conjunction with --exec-tenant

--id <record_id>

Filter for a specific record Id

--object-type <object_type>

Filter by object type

--operation <operation>

Filter by operation

--page_size <size>

Number of audit records that should be displayed for each page

--page_num <num>

Page number to start displaying audit records from (starting index 1)

--result <result>

Filter by operation result

--full

Display additional information about the audit records

Example 1 (List first page of audit records):

# robin user-audit list
Id  | Timestamp                | IP Addr     | Exec User | Exec Tenant    | Owner User | Owner Tenant | Object Type     | Operation | Result
----+--------------------------+-------------+-----------+----------------+------------+--------------+-----------------+-----------+---------
643 | August 10, 2021 14:17:47 | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
642 | July 13, 2021 11:24:13   | 10.9.121.40 | robin     | Administrators |            |              | USER            | login     | success
641 | July 13, 2021 11:24:12   | 172.20.0.1  | robin     | Administrators |            |              | METRICS         | enable    | success
640 | July 13, 2021 11:24:10   | 172.20.0.1  | robin     | Administrators |            |              | CONFIG          | update    | success
639 | July 13, 2021 11:24:06   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | probe     | success
638 | July 13, 2021 11:24:04   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | probe     | success
637 | July 13, 2021 11:24:04   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | probe     | success
636 | July 13, 2021 11:23:58   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
635 | July 13, 2021 11:23:57   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
634 | July 13, 2021 11:23:49   | 172.20.0.1  | robin     | Administrators |            |              | FILE_COLLECTION | online    | success
633 | July 13, 2021 11:23:44   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
632 | July 13, 2021 11:20:07   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
631 | July 13, 2021 11:20:07   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
630 | July 13, 2021 11:20:07   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
629 | July 13, 2021 11:20:01   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
628 | July 13, 2021 11:20:01   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
627 | July 13, 2021 11:20:01   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
626 | July 13, 2021 11:19:59   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
625 | July 13, 2021 11:19:01   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
624 | July 13, 2021 11:18:57   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
--------------------------------------------
537 items, page 1 of 27.
--------------------------------------------

Example 2 (List audit records filtered by object type):

# robin user-audit list --object-type APPLICATION
Id | Timestamp                 | IP Addr    | Exec User | Exec Tenant    | Owner User | Owner Tenant   | Object Type | Operation | Result
---+---------------------------+------------+-----------+----------------+------------+----------------+-------------+-----------+---------
46 | October 26, 2020 12:51:46 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
45 | October 26, 2020 12:51:25 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
44 | October 26, 2020 12:51:18 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
43 | October 26, 2020 12:51:06 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
42 | October 26, 2020 12:50:59 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
41 | October 26, 2020 12:49:44 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
40 | October 26, 2020 12:49:26 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
39 | October 26, 2020 12:49:17 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
38 | October 26, 2020 12:49:03 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
37 | October 26, 2020 12:46:17 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
36 | October 26, 2020 12:45:35 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
--------------------------------------------
11 items, page 1 of 1.
--------------------------------------------

Example 3 (Show details for a single audit record):

# robin user-audit list --id 46 --full
Id | Timestamp                 | IP Addr    | Exec User | Exec Tenant    | Owner User | Owner Tenant   | Object Type | Operation | Result
---+---------------------------+------------+-----------+----------------+------------+----------------+-------------+-----------+---------
46 | October 26, 2020 12:51:46 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
  object_attributes: {'tenant_id': 1, 'object_id': 11, 'jobid': 74, 'object_name': 'app-11', 'user_id': 3}
  details:

--------------------------------------------
1 items, page 1 of 1.
--------------------------------------------

Returns audit records containing details such as the user who performed the action, the tenant and node it was performed from, details on the operation itself and the object it was performed on.

End Point: /api/v3/robin_server/user-audit

Method: GET

URL Parameters:

  • exec_user=<exec_user> : Utilizing this parameter results in only audit records detailing operations initiated by the specified user being returned. Note this option cannot be used in conjunction with the owner_user parameter.

  • exec_tenant=<exec_tenant> : Utilizing this parameter results in only audit records detailing operations initiated in the specified tenant being returned. Note this option cannot be used in conjunction with the owner_tenant parameter.

  • owner_user=<owner_user> : Utilizing this parameter results in only audit records detailing operations initiated by the specified user being returned. Note this option cannot be used in conjunction with the exec_user parameter.

  • owner_tenant=<owner_tenant> : Utilizing this parameter results in only audit records detailing operations initiated in the specified tenant being returned. Note this option cannot be used in conjunction with the exec_tenant parameter.

  • id=<record_id> : Utilizing this parameter results in only the audit record with the specified ID being returned.

  • object_type=<object_type> : Utilizing this parameter results in only audit records associated with the specified object type being returned.

  • operation=<operation> : Utilizing this parameter results in only audit records associated with the specified operation being returned.

  • page_size=<size> : Utilizing this parameter results in <size> number of audit records being returned.

  • page_num=<index> : Utilizing this parameter results in audit records starting from <index> being returned.

  • result=<result> : Utilizing this parameter results in only audit records matching the specified result being returned.

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

Output
{
   "object_type":"UserAuditRecord",
   "start":1,
   "count":20,
   "total":538,
   "page_size":20,
   "page_num":1,
   "items":[
      {
         "id":644,
         "timestamp":"August 11, 2021 03:12:55",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":643,
         "timestamp":"August 10, 2021 14:17:47",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":642,
         "timestamp":"July 13, 2021 11:24:13",
         "ip_addr":"10.9.121.40",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":641,
         "timestamp":"July 13, 2021 11:24:12",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"METRICS",
         "operation":"enable",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":640,
         "timestamp":"July 13, 2021 11:24:10",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"CONFIG",
         "operation":"update",
         "result":"success",
         "object_attributes":{
            "section":"cluster",
            "attribute":"ignored_phases"
         },
         "details":{
            "msg":"The 'cluster' attribute 'ignored_phases' has been updated"
         }
      },
      {
         "id":639,
         "timestamp":"July 13, 2021 11:24:06",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"probe",
         "result":"success",
         "object_attributes":{
            "object_name":"systestvm-40.robinsystems.com",
            "object_id":1,
            "jobid":1539
         },
         "details":{

         }
      },
      {
         "id":638,
         "timestamp":"July 13, 2021 11:24:04",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"probe",
         "result":"success",
         "object_attributes":{
            "object_name":"systestvm-39.robinsystems.com",
            "object_id":3,
            "jobid":1538
         },
         "details":{

         }
      },
      {
         "id":637,
         "timestamp":"July 13, 2021 11:24:04",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"probe",
         "result":"success",
         "object_attributes":{
            "object_name":"systestvm-41.robinsystems.com",
            "object_id":2,
            "jobid":1537
         },
         "details":{

         }
      },
      {
         "id":636,
         "timestamp":"July 13, 2021 11:23:58",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":635,
         "timestamp":"July 13, 2021 11:23:57",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":634,
         "timestamp":"July 13, 2021 11:23:49",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"FILE_COLLECTION",
         "operation":"online",
         "result":"success",
         "object_attributes":{
            "object_id":1603741429864,
            "object_name":"file-collection-1603741429864",
            "collection_pathname":"\/usr\/local\/robin\/collections\/file-collection-1603741429864",
            "hostname":"systestvm-40.robinsystems.com",
            "jobid":1533
         },
         "details":{

         }
      },
      {
         "id":633,
         "timestamp":"July 13, 2021 11:23:44",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":632,
         "timestamp":"July 13, 2021 11:20:07",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"config",
         "result":"success",
         "object_attributes":{
            "object_id":2,
            "object_name":"systestvm-41.robinsystems.com",
            "jobid":1478
         },
         "details":{

         }
      },
      {
         "id":631,
         "timestamp":"July 13, 2021 11:20:07",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"config",
         "result":"success",
         "object_attributes":{
            "object_id":3,
            "object_name":"systestvm-39.robinsystems.com",
            "jobid":1479
         },
         "details":{

         }
      },
      {
         "id":630,
         "timestamp":"July 13, 2021 11:20:07",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"config",
         "result":"success",
         "object_attributes":{
            "object_id":1,
            "object_name":"systestvm-40.robinsystems.com",
            "jobid":1480
         },
         "details":{

         }
      },
      {
         "id":629,
         "timestamp":"July 13, 2021 11:20:01",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"config",
         "result":"success",
         "object_attributes":{
            "object_id":1,
            "object_name":"systestvm-40.robinsystems.com",
            "jobid":1468
         },
         "details":{

         }
      },
      {
         "id":628,
         "timestamp":"July 13, 2021 11:20:01",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"config",
         "result":"success",
         "object_attributes":{
            "object_id":2,
            "object_name":"systestvm-41.robinsystems.com",
            "jobid":1466
         },
         "details":{

         }
      },
      {
         "id":627,
         "timestamp":"July 13, 2021 11:20:01",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"NODE",
         "operation":"config",
         "result":"success",
         "object_attributes":{
            "object_id":3,
            "object_name":"systestvm-39.robinsystems.com",
            "jobid":1467
         },
         "details":{

         }
      },
      {
         "id":626,
         "timestamp":"July 13, 2021 11:19:59",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      },
      {
         "id":625,
         "timestamp":"July 13, 2021 11:19:01",
         "ip_addr":"172.20.0.1",
         "exec_user_id":3,
         "exec_username":"robin",
         "exec_tenant_id":1,
         "exec_tenant":"Administrators",
         "owner_user_id":null,
         "owner_username":null,
         "owner_tenant_id":null,
         "owner_tenant":null,
         "object_type":"USER",
         "operation":"login",
         "result":"success",
         "object_attributes":{

         },
         "details":{

         }
      }
   ],
   "state":"Succeed",
   "message":"NA"
}

21.8.2. Retrieving audit logs from a file

Robin supports recording all audit records within an audit log file. The information stored within the file is equivalent to that saved in the Robin database but provides what is essentially a hard copy of the audit trail. The audit log file is named robin-user-audit.log and is located in the /home/robinds/var/log/robin directory within the Robin deamonset container on the primary master node. By default, this feature is disabled.

Some points to consider with regards to the file based logging feature:

  • The log file will only be generated on the active master.

  • The log file is automatically updated by the Robin control plane processes whenever an event occurs.

  • The logs are automatically rotated to ensure that these logs do not consume the whole log partition.

21.8.2.1. Enable file based logging

By default, Robin does not log audit records to a file. In order to enable this feature, perform the following steps:

  1. Run the following command to indicate the feature should be enabled:

    # robin config update user_audit log_enable True
    
  2. Run the following command to restart the robin-server service and thus allow the above changes to take effect:

    # service robin-server restart
    

    Example

    # robin config update user_audit log_enable True
    The 'user_audit' attribute 'log_enable' has been updated
    
    # service robin-server restart
    Redirecting to /bin/systemctl restart robin-server.service
    

After you enable the feature, all audit records are saved within the aforementioned file in real-time and in a user configurable format. As an administrator, you can view the audit logs using any text editor software application. Additionally these logs can be captured with any log forwarding tool for further processing.

21.8.2.2. Disable file based logging

To disable the file based logging of audit records, perform the following steps:

  1. Run the following command to indicate the feature should be disabled:

    # robin config update user_audit log_enable False
    
  2. Run the following command to restart the robin-server service and thus allow the above changes to take effect:

    # service robin-server restart
    

    Example

    # robin config update user_audit log_enable False
    The 'user_audit' attribute 'log_enable' has been updated
    
    # service robin-server restart
    Redirecting to /bin/systemctl restart robin-server.service
    

Disabling this feature will result robin-user-audit.log file not being updated with any new audit records.

21.8.2.3. Configure file based logging attributes

Listed below are all the attributes a user can configure with regards to the file based logging feature.

Attribute

Default value

Valid value

enabled

True

True - to enable the user audit feature

False - to disable the user audit feature

log_enable

False

True – to enable the audit log feature

False – to disable the audit log feature

log_file_size

10

The maximum size in megabytes of the audit log file

log_format

JSON

The output format of each audit record. The following are valid values:

JSON – display records in JSON format

TEXT – display records in TEXT format

log_level

INFO

Indicates the level of audit records to be captured. The following are valid values:

INFO – for informational messages

DEBUG – for debug-level messages that contain information for debugging a program

WARNING – for warning messages

ERROR – for error messages

CRITICAL - for critical messages

log_retention

4

The maximum number of audit log files to retain. Any additional log files are rolled over.

In order to update any of the aforementioned configurable attributes, run the following command:

# robin config update user_audit <attribute> <valid value>

Example

# robin config update user_audit log_format TEXT
The 'user_audit' attribute 'log_format' has been updated

21.8.2.4. View records captured in audit file

To view all audit records captured in the aforementioned file, run the following command:

# cat /var/log/robin/robin-user-audit.log

Example 1 (Viewing TEXT based audit records):

# cat /var/log/robin/robin-user-audit.log
1623 | 2021-08-12T15:26:06.581513+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1624 | 2021-08-12T15:26:12.655515+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1625 | 2021-08-12T15:26:12.783629+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1626 | 2021-08-12T15:26:13.118734+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1627 | 2021-08-12T15:26:18.584252+7:00 | 192.0.2.2 | robin | Administrators | -- | -- | USER | login | success | -- | --
1628 | 2021-08-12T15:26:21.752403+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1629 | 2021-08-12T15:26:28.934639+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1630 | 2021-08-12T15:26:36.089382+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1631 | 2021-08-12T15:26:43.233911+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1632 | 2021-08-12T15:26:50.370029+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1633 | 2021-08-12T15:26:57.528168+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1634 | 2021-08-12T15:27:04.749161+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1635 | 2021-08-12T15:27:11.934771+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1636 | 2021-08-12T15:27:19.127729+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1637 | 2021-08-12T15:27:26.291575+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1638 | 2021-08-12T15:27:33.702357+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1639 | 2021-08-12T15:27:41.017244+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --

Example 2 (Viewing JSON based audit records):

# cat /var/log/robin/robin-user-audit.log
{
    "id": 197,
    "timestamp": "2021-08-12T13:56:17.230515+7:00",
    "ip_addr": "192.0.2.2",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "NAMESPACE",
    "operation": "create",
    "result": "success",
    "object_attributes": {
        "object_name": "oc8687pk4i",
        "username": "robin",
        "tenant": "Administrators",
        "import_namespace": false
     },
    "details": {}
}
{
    "id": 198,
    "timestamp": "2021-08-12T13:56:17.748933+7:00",
    "ip_addr": "192.0.2.1",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "USER",
    "operation": "login",
    "result": "success",
    "object_attributes": {},
    "details": {}
}
{
    "id": 199,
    "timestamp": "2021-08-12T13:56:33.766674+7:00",
    "ip_addr": "192.0.2.2",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "NAMESPACE",
    "operation": "delete",
    "result": "success",
    "object_attributes": {},
    "details": {}
}
{
    "id": 200,
    "timestamp": "2021-08-12T13:56:34.290960+7:00",
    "ip_addr": "192.0.2.1",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "USER",
    "operation": "login",
    "result": "success",
    "object_attributes": {},
    "details": {}
}

To view the last audit record that was captured, run the following command:

# tail -n 1 /var/log/robin/robin-user-audit.log

Example 1 (Viewing last TEXT based audit record):

# tail -n 1 /var/log/robin/robin-user-audit.log
1645 | 2021-08-12T15:28:19.298469+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --

Example 2 (Viewing last JSON based audit record):

# tail -n 1 /var/log/robin/robin-user-audit.log
{
     "id": 1646,
     "timestamp": "2021-08-12T15:31:44.069446+7:00",
     "ip_addr": "192.0.2.2",
     "exec_user_id": 3,
     "exec_username": "robin",
     "exec_tenant_id": 1,
     "exec_tenant": "Administrators",
     "owner_user_id": null,
     "owner_username": null,
     "owner_tenant_id": null,
     "owner_tenant": null,
     "object_type": "CONFIG",
     "operation": "update",
     "result": "success",
     "object_attributes": {
         "section": "user_audit",
         "attribute": "log_format"
     },
     "details": {
         "msg": "The 'user_audit' attribute 'log_format' has been updated"
     }
}

21.8.3. Kubernetes audit logs

Kubernetes audit logs are a set of records that contain a chronological list of all requests made to the Kubernetes API server. It records all API calls made to the API server. For more information about Kubernetes audit logs, see Kubernetes auditing.

Note

By default, the Kubernetes audit logs feature is enabled for Robin CNP clusters.

21.8.3.1. Points to consider for Kubernetes audit logs

  • The maximum size for storing Kubernetes audit logs in a cluster is 1 GB, which is non-configurable.

  • A log file can have a maximum size of 100 MB and a maximum of 10 log files can be stored.

21.8.3.2. View Kubernetes audit logs

Kubernetes audit logs help you to troubleshoot the issues in your cluster. You can find the Kubernetes audit logs at /var/log/Kubernetes/audit/audit.log on any master node of your cluster.

Robin CNP logs the following operations at the metadata level audit policy:

  • Create request

  • Patch request

  • Update request

  • Delete request

21.9. Sherlock

Sherlock is a troubleshooting and self-diagnostic command-line tool (CLI) in Robin. It is designed to assist Robin administrators to identify and analyze any problems with Robin clusters. Using Sherlock, an administrator can diagnose cluster-wide problems, view a general cluster health report, or gather information regarding specific applications, nodes, containers, volumes, devices, and so on. It provides an in-depth view of these problems and the objects affected by querying a range of Robin APIs and making direct database calls. Moreover the information gathered is mapped in both a top-down and bottom-up manner through the resource hierarchy in order to showcase important information on wide range of objects in a consumible manner. Some examples of the highlighted resource connections are described below:

  • Application are linked to the Pods that they are compromised of. Thus details on the health of the node providing the compute resources for the Pod(s) and the status of the attached Volumes are also presented.

  • Volumes are implicitly linked to the Node they are created on and as a result the status of the Node, the status and source of any replicas present (including the resync progress), and number of snapshots are also displayed.

  • Similarly Disks are explicitly attached to hosts and so details of the Node are displayed with relevant information such as the overall disk capacity, current utilization of the disk etc.

  • The status of critical Robin services are displayed in addition to the impacted objects including Applications, Volumes, and Disks.

Note

Given the breadth of information displayed and gathered by Sherlock, the tool is only accessible on the active master node and should only be utilized by adminstrators. In addition it needs the RCM and Storage Manager services to be running.

21.9.1. Use Cases

Given the wealth of information Sherlock that displays, it can be used practically in any given scenario. Whether it is used as the primary debugging tool for cluster wide issues or simply to gain insight into the usage statistics, relevant information can always be attained with the tool. Highlighted below are two example use cases where Sherlock could be particularly useful.

Diagnosing application health issues

Given that Sherlock primarily aims to trace problems throughout the resource hierarchy, it allows for the detection of the level from which a problem stems from. For example, an application that cannot write data anymore due to a disk failure within a cluster, might report itself as unhealthy. Using Sherlock, the primary issue of the disk being in a bad state can be deduced because whilst investigating an application the volumes attached to its respective Pods are also displayed. As a result, the unhealthy volumes will be reported alongside the device from which they are allocated and so the common point of failure, the failed disk, can be indentified.

The above example highlights the usefulness of the explicit mappings showcased by the tool and how they can be used to efficiently detect objects which are malfunctioning.

Planning maintenance activities

Since Sherlock highlights the links between several abstract objects within a cluster, it can be used to determine the impact of an object being offline for a period of time. For example, this is particularly useful if a node needs to be cordoned off for maintenance, as Sherlock can show the impacted objects (Pods, volumes, applications, and users). As a result any of the parties that are affected by the maintenance activity can be informed ahead of time with little to no guesswork involved.

21.9.2. Sherlock Report

The report generated by Sherlock, shown in the examples below, is meant provide a quick overview of the state of the Robin cluster and by default only highlights unhealthy objects as they would be the most cause for concern. It is split into the following key sections:

  • Applications - This section of the report displays unhealthy applications alongside linked resources such as the affected Pods, volumes and devices on which the application data is saved.

  • Pods - This section of the report displays unhealthy pods alongside details of the attached volume(s) and any Kubernetes errors associated with it. This section is highlighted as it includes general Kubernetes pods and Helm based applications.

  • Volumes - This section of the report displays unhealthy volumes alongside details of the device it is hosted on, logical mounts with potential IO stalls, NFS Exports, NFS Server pods, the respective node from which it is allocated, and statistics about any snapshots it may have.

  • Nodes - This section of the report displays unhealthy nodes alongside the status of the Robin and Kubernetes services running on each node, warnings for high resource usage and indicates a lack of space availability if appropriate.

  • Devices - This section of thhe report displays unhealthy devices alongside details of the node each is mounted on, the utilization of each disk and the volumes affected.

  • File Collections - This section of thhe report displays unhealthy file collections and highlights any errors that may have caused it to be in an unhealthy state.

  • Bundles - This section of thhe report displays unhealthy bundles and highlights any inherited errors from other objects in the heirarchy that may cause it to be unavailable.

Example 1 (Healthy cluster Report)

# sherlock

SHOWING APPLICATIONS THAT NEED ATTENTION:
All apps are healthy

SHOWING PODS THAT NEED ATTENTION:
All pods are healthy

SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION
All volumes are healthy

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:
All nodes are healthy

SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:
All devices are healthy

SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION:
All file collection are available

SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION:
All bundles are available

Only unhealthy objects are shown. To see everything re-run with -H|--healthy option
To see more details rerun with -V|--verbose option

sherlock produced results in 155 milliseconds (Sat Sep 18 06:14:59 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    2 bundles, 1 users and 1 tenants were analyzed

Example 2 (Report with unhealthy applications)

# sherlock

SHOWING APPLICATIONS THAT NEED ATTENTION:
|-- robinte STATE: PLANNED      Robin Systems     2/2 pods unhealthy KIND: ROBIN

SHOWING USERS WHO ARE AFFECTED:
|-- Robin Systems (Firstname: Robin LastName: Systems Email: None)
|   |-- APPS 1: robinte

SHOWING PODS THAT NEED ATTENTION:
o-- POD/VNODE ID  121: robinte.R1.01 INSTALLING/ONLINE   1 CPU, 50 MB MEM NODE: UP, RIO: UP
|-- POD/VNODE ID  122: robinte.R2.01 INSTALLING/ONLINE   1 CPU, 50 MB MEM NODE: UP, RIO: UP

SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION
All volumes are healthy

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:
All nodes are healthy

SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:
All devices are healthy

SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION:
All file collection are available

SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION:
All bundles are available

21.9.3. Command Line Options

Detailed below are the different options that can be utilized whilst using the Sherlock tool to attain the desired information.

Resource Inspection Options

These options require the names of resources (objects) to be specified in order to show detailed information about the aforementioned objects. If multiple objects need to be viewed at the same time a comma seperated list of names can be specified.

--app <name>

Displays information about the given application(s)

--node <name>

Displays information about the given node(s)

--pod <name>

Displays information about the given Pod(s)

--vol <name>

Displays information about the given volume(s)

--dev <wwn>

Displays information about the given device(s)

Note

For the --dev option, alongside specifying a list of WWNs to match several other values are supported: ‘all’ can be given to display information on all devices, ‘full’ can be given to display details on devices that are nearly full, a list of nodenames can be specified to show devices on the given nodes, and lastly to uniquely identify a device a combination of the nodename and devpath can be given in the format <nodename>:<devpath.

Advisory Rebalancing Options

The rebalancing command options can be used to discover disks which are being over or under utilized. The given advice can be used to adjust the load management for a given device or volume.

--dev-rebalance-advice <wwn>

Provides advice on device rebalancing

--vol-rebalance-advice <volname>

Provides advice on volume rebalancing

--devs-needing-rebalance

Displays information about devices that need rebalancing

--vols-needing-rebalance

Displays information about volumes that need rebalancing

Behavior Controlling Options

The following options allows for the manipulation of the generated report to include details that might not be present by default.

--mon <secs>

Monitor the resource metrics for the given interval. Use this option alongside options such as --app, --pod or --vol

--start <time>

Start scanning jobs starting at this date/time. The default time is 72 hours before the current time. This option is only valid when --scan-joblogs is specified

--end <time>

End scanning jobs at this date/time. The default time is the current time. This option is only valid when --scan-joblogs is specified

--server <port>

Run in server mode on the given port so Sherlock can be viewed from a web brower

--strict

Mark resources that are not fully online as unhealthy. This option displays the resources (objects) that are partially healthy as unhelathy

--cache

Build and use cache to speed up queries. It caches resources once and use the same cache for subsequent queries. Use when you run Sherlock repeatedly run with different options

--healthy

Also show healthy resources. Displays healthy resources (objects) along with unhealthy objects

--verbose

Displays detailed report

--html

Print output in HTML format. Use this along with --outfile and provide a path to save the HTML file

--outfile

Redirect output to a file at the specified file path

--no-skip

Don’t skip unimportant resources to minimize output

--scan-joblogs

Scan job logs for errors

--prom

Run in server mode to serve metrics in Prometheus format

21.9.4. Web Server Access

The Sherlock tool can be accessed via a web browser in order for a more interactive viewing experience. For the server mode to be utilized an available port number between 1-65535 needs to be specified alongside the --server option. An example is given below.

Example

# sherlock --server 45536
running the read_config now

Running in server mode. Point your web browser to the following address:

https://eqx01-flash15:45536

21.9.5. Examples

21.9.5.1. View health of all objects

In order to view a report containing the status of all healthy objects alongside the unhealthy ones use the --healthy parameter as shown in the example(s) below.

Example 1 (Display health of all objects)

# sherlock --healthy

No matching apps found
No matching pods found

SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER
|-- VOLID     1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b, usage:  448 MB /  20 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|-- VOLID   132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839                          , usage:  352 MB /   5 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|-- VOLID   131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1                          , usage:  576 MB /  11 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
All volumes are healthy

SHOWING HEALTH OF 3/3 NODES RUNNING IN THE CLUSTER
|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
|-- eqx04-flash05 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
|-- eqx01-flash15 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY

SHOWING HEALTH OF 26/26 DEVICES IN THE CLUSTER
|-- /dev/sdi@eqx01-flash16 | 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1)
|
|-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8)
|
|-- /dev/sde@eqx01-flash15 | 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0)
|
|-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90)
|
|-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2)
|
|-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33)
|
|-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7)
|
|-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT)
|
|-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998)
|
|-- /dev/sdb@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a07510ec79d1f PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F)
|
|-- /dev/sdh@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW)
|
|-- /dev/sdf@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS)
|
|-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3)
|
|-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS)
|
|-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2)
|
|-- /dev/sdc@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a07510ee9a052 PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052)
|

SHOWING 1 FILE COLLECTIONS IN THE CLUSTER
|-- file-collection-1631971248912 Online     0 errors  0 warnings

SHOWING 1 BUNDLES IN THE CLUSTER
|-- wordpress ONLINE     0 errors  0 warnings

To see more details rerun with -V|--verbose option
sherlock produced results in 200 milliseconds (Sat Sep 18 11:51:21 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    1 bundles, 3 users and 3 tenants were analyzed

Example 2 (Verbose report for all objects)

# sherlock --healthy --verbose

No matching apps found
No matching pods found

SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER
|-- VOLID     1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b, usage:  448 MB /  20 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|   |-- DEVID 1: /dev/sdb on eqx01-flash15 using 448 MB/894.3 GB capacity, 14/20 slices, 14 segs, segspernap=1 RDVM: UP, DEV: READY
|   |             (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998)
|   |
|   |-- SNAPSHOTS: 1     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 1969/12/31 16:00:00    14   14      0 READY        448 MB
|   |   |
|
|-- VOLID   132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839                          , usage:  352 MB /   5 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|   |-- DEVID 2: /dev/sde on eqx01-flash15 using 352 MB/1.8 TB capacity, 11/5 slices, 11 segs, segspernap=3 RDVM: UP, DEV: READY
|   |             (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0)
|   |
|   |-- SNAPSHOTS: 1     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 1969/12/31 16:00:00    11   11      0 READY        352 MB
|   |   |
|
|-- VOLID   131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1                          , usage:  576 MB /  11 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|   |-- DEVID 11: /dev/sdi on eqx01-flash16 using 576 MB/1.8 TB capacity, 18/11 slices, 18 segs, segspernap=2 RDVM: UP, DEV: READY
|   |             (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1)
|   |
|   |-- SNAPSHOTS: 1     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 1969/12/31 16:00:00    18   18      0 READY        576 MB
|   |   |
|
All volumes are healthy
|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

9 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID   11: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 11/119194 slices, 18 segs
|   |-- VOLID  131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1                             576 MB nslices=11  nsnaps=1  nsegs=18   nsegs_per_snap=2
|
|-- DEVID   12: /dev/sde READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    9: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    8: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   10: /dev/sdb READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   14: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   13: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    0: /dev/sda INIT 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/0 slices, 0 segs

|-- eqx04-flash05 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

8 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID   16: /dev/sdb READY 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/57194 slices, 0 segs
|-- DEVID    0: /dev/sda INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/sdd INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID   15: /dev/sdc READY 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/57194 slices, 0 segs
|-- DEVID    0: /dev/sde INIT 59.6 GB free=59.6 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/dm-1 INIT 17.4 GB free=17.4 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/dm-2 INIT 35.7 GB free=35.7 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/dm-0 INIT 6.0 GB free=6.0 GB (100%) 0/100 vols, 0/0 slices, 0 segs

|-- eqx01-flash15 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

9 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID    0: /dev/sda INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    2: /dev/sde READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 5/119194 slices, 11 segs
|   |-- VOLID  132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839                             352 MB nslices=5   nsnaps=1  nsegs=11   nsegs_per_snap=3
|
|-- DEVID    3: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    7: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    5: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    6: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    4: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    1: /dev/sdb READY 894.3 GB free=893.8 GB (100%) 1/100 vols, 20/57194 slices, 14 segs
|   |-- VOLID    1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b   448 MB nslices=20  nsnaps=1  nsegs=14   nsegs_per_snap=1
|


DEVICE /dev/sdi on eqx01-flash16 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0fe71f | PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1
|-- VOL: 131 pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1    576 MB nslices=11  nsegs=18   (2  ) nsnaps=1

DEVICE /dev/sde on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0feea2 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8

DEVICE /dev/sde on eqx01-flash15 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0db2c7 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0
|-- VOL: 132 pvc-94229d46-e381-4e3c-99a1-ddfe389d7839    352 MB nslices=5   nsegs=11   (3  ) nsnaps=1

DEVICE /dev/sdf on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0db9be | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90

DEVICE /dev/sdi on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dbae3 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2

DEVICE /dev/sdg on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0ddd62 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33

DEVICE /dev/sdh on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0df3ba | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7

DEVICE /dev/sdd on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c101de8 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT

DEVICE /dev/sdb on eqx01-flash15 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a075109604998 | PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998
|-- VOL: 1 file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b    448 MB nslices=20  nsegs=14   (1  ) nsnaps=1

DEVICE /dev/sdb on eqx04-flash05 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a07510ec79d1f | PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F

DEVICE /dev/sdh on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0d9e30 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW

DEVICE /dev/sdf on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dc21f | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS

DEVICE /dev/sdb on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dd039 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3

DEVICE /dev/sdg on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dee42 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS

DEVICE /dev/sdd on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0df26c | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2

DEVICE /dev/sdc on eqx04-flash05 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a07510ee9a052 | PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052

SHOWING 1 FILE COLLECTIONS IN THE CLUSTER

SHOWING 1 BUNDLES IN THE CLUSTER

sherlock produced results in 141 milliseconds (Sat Sep 18 11:45:50 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    1 bundles, 3 users and 3 tenants were analyzed

21.9.5.2. Check Application Health

In order to view a report containing the status of an application use the --app parameter along with an application name as shown in the example below. The following details about an application will be displayed:

  • Volumes or devices on which the application data is stored

  • Pods associated with the application

  • Node(s) on which the application Pods reside

  • Failed jobs related to the specified application

Example

# sherlock --app mysql-test -V

APPNAME: mysql-test STATE: ONLINE Robin Systems 1/1 vnodes healthy
==============================================================================================================================================================================================================================================
APP HAS 1 VNODES:
VNODEID 2: mysql-test.mysql.01 on centos-60-181 INST: ONLINE/INST: STARTED, NODE: ONLINE, RIO: UP
|-- VOLID 4: mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 1 GB
| |-- DEVID : /dev/sdd segs=centos-60-181 slices=14 rawspace=1 448 MB
|-- VOLID 5: mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 10 GB
| |-- DEVID : /dev/sdd segs=centos-60-181 slices=10 rawspace=10 320 MB
|
APP IS RUNNING ON THE FOLLOWING 1 NODES:
|-- centos-60-181 RIO: UP
| |-- mysql-test.mysql.01 ONLINE/STARTED
|
APP IS STORING DATA ON THE FOLLOWING 1 DEVICES:
|-- DEVID 6: /dev/sdd on centos-60-181 2 vols
| |-- VOLID 4: mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 448 MB nslices=1 nsegs=14 nsnaps=3 segspersnap=5
| |-- VOLID 5: mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 320 MB nslices=10 nsegs=10 nsnaps=3 segspersnap=1
|

THERE ARE 23 FAILED JOBS TO INSPECT BETWEEN Fri May 3 01:22:23 AM 2019 - Fri May 10 01:22:23 AM 2019
|-- mysql-test.mysql.01
| |-- VnodeDelete jobid=98 state=10 error=1 start=Thu May 9 00:23:31 2019 end=Thu May 9 00:23:32 2019
| | predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to execute
| |-- VnodeDelete jobid=88 state=10 error=1 start=Thu May 9 00:20:37 2019 end=Thu May 9 00:20:43 2019
| | postdestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to execute
|
|-- mysql-test
| |-- ApplicationDelete jobid=97 state=10 error=1 start=Thu May 9 00:23:31 2019 end=Thu May 9 00:23:32 2019
| | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
| |-- ApplicationDelete jobid=87 state=10 error=1 start=Thu May 9 00:20:37 2019 end=Thu May 9 00:20:43 2019
| | Job failed. One or more child jobs reported errors. Error: 'postdestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
| |-- ApplicationDelete jobid=92 state=10 error=1 start=Thu May 9 00:22:17 2019 end=Thu May 9 00:22:18 2019
| | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
| |-- ApplicationDelete jobid=95 state=10 error=1 start=Thu May 9 00:22:59 2019 end=Thu May 9 00:23:00 2019
| | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
|
|-- mysql-test1
| |-- ApplicationCreate jobid=129 state=10 error=1 start=Thu May 9 03:50:54 2019 end=Thu May 9 03:50:54 2019
| | Invalid Zone Id and/or Bundle Id: 1/2
| |-- ApplicationCreate jobid=128 state=10 error=1 start=Thu May 9 03:50:08 2019 end=Thu May 9 03:50:08 2019
| | Invalid Zone Id and/or Bundle Id: 1/2
|
sherlock produced results in 90 milliseconds (Fri May 10 01:22:23 AM 2019).
|-- 3 nodes, 12 disks, 3 vols, 7 snapshots, 1 apps, 1 vnodes, 2 users and 1 tenants were analyzed

21.9.5.3. Check Node Health

In order to view a report containing the status of a node use the --node parameter along with an primary hostname of the node as shown in the example below. Information on the objects associated with the node such as applications, pods, volumes, devices, file collections and bundles will be displayed alongside details of the node’s heath with regards to the services being run on it.

Example

# sherlock --node eqx01-flash16 -V

|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

9 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID    0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    9: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 11/119194 slices, 18 segs
|   |-- VOLID  162: pvc-eb63979d-720e-41c9-808f-145306dc1259                             576 MB nslices=11  nsnaps=1  nsegs=18   nsegs_per_snap=2
|
|-- DEVID   13: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 5/119194 slices, 11 segs
|   |-- VOLID  163: pvc-66646581-0210-46e2-b945-9ea880be38d7                             352 MB nslices=5   nsnaps=1  nsegs=11   nsegs_per_snap=3
|
|-- DEVID   12: /dev/sdb READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   14: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    8: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    0: /dev/sda INIT 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID   10: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   11: /dev/sde READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs


THERE ARE 1 FAILED JOBS TO INSPECT BETWEEN Sun Sep 12 05:58:15 PM 2021 - Sun Sep 19 05:58:15 PM 2021
    |-- eqx01-flash16.robinsystems.com
    |   |-- HostAddResourcePool jobid=30 state=10 error=1 start=Sun Sep 19 03:23:54 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Host 'eqx01-flash16.robinsystems.com' already has a resource pool 'default'
    |

sherlock produced results in 166 milliseconds (Sun Sep 19 05:58:15 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    0 bundles, 5 users and 5 tenants were analyzed

21.9.5.4. Check Pod Health

In order to view a report containing the status of a Pod use the --pod parameter along with the Pod name as shown in the example below. Information on the objects associated with the Pod such as the application it is a part of, the volumes attached to it and the respective devices will also be displayed.

Example

# sherlock --pod centos1.server.01 -V

SHOWING HEALTH OF 1 PODS IN THE CLUSTER:
o-- POD/VNODE ID  187: centos1.server.01 STARTED/ONLINE   1 CPU, 200 MB MEM NODE: UP, RIO: UP
|   |-- VOLID 238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313 64 MB/1 GB nsnaps=1
|   |   |-- DEVID 15: /dev/sdb on eqx04-flash05 nsegs=2   nslices=1   64 MB
|   |-- VOLID 237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851    0/1 GB nsnaps=1
|   |   |-- DEVID 16: /dev/sdc on eqx04-flash05 nsegs=0   nslices=1   0
|

sherlock produced results in 169 milliseconds (Sun Sep 19 10:42:37 PM 2021).
|-- 3 nodes, 26 disks, 9 vols, 9 snapshots, 3 apps, 3 pods, 1 file-collections,
    1 bundles, 15 users and 16 tenants were analyzed

21.9.5.5. Check Volume Health

In order to view a report containing the status of a volume use the --vol parameter along with the name of the volume as shown in the example below. Information on the objects associated with the volume such as the device on which it is mounted, potential IO stalls on mounts, NFS Exports, NFS Server pods, and the existing snapshots of it will also be displayed alongside details of the volumes usage.

Note

It is recommended that this report is validated often especially for volumes which are used frequently.

Example

  # sherlock --vol pvc-c4f1bc87-c9f5-433a-bb60-f9ee46e7a9e1 -V

|-- VOLID    50: pvc-c4f1bc87-c9f5-433a-bb60-f9ee46e7a9e1, used by pvc(s): personal/nfs-exclusive-repl-2, usage:  512 MB /   4 GB,  2 snapshots, resync progress: SYNCED, using 2 devices
|   |-- Potential IO Stalls:
|   |   |-- 37 pending IOs on hypervvm-62-45.robinsystems.com:/dev/sde
|   |
|   |-- NFS_EXPORTS:
|   |   |-- NFS_EXPORTID 7: EXPORTS: READY
|   |             (CLIENTS:["hypervvm-62-46.robinsystems.com","hypervvm-62-45.robinsystems.com"])
|   |
|   |-- NFS_SERVER_POD:
|   |   |-- PODID 10: robin-nfs-excl-v50-10 , HOSTNAME: hypervvm-62-47, STATUS: ONLINE
|   |
|   |-- DEVID 1: /dev/sdb on hypervvm-62-45 using 256 MB/100 GB capacity, 8/76 slices, 8 segs, segspernap=1, RDVM: UP, DEV: READY
|   |             (WWN: 0x600224801148c13acf11110ea26830ff PATH: /dev/disk/by-id/scsi-3600224801148c13acf11110ea26830ff)
|   |
|   |-- SNAPSHOTS: 2     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 2022/08/11 08:54:51    12   12      0 READY        384 MB
|   |   |-- SNAPID    2: 1969/12/31 16:00:00     4    4      0 READY        128 MB
|   |   |
|
|   |-- DEVID 4: /dev/sdc on hypervvm-62-47 using 256 MB/100 GB capacity, 8/34 slices, 8 segs, segspernap=1, RDVM: UP, DEV: READY
|   |             (WWN: 0x60022480d00e683e8687aae16482dcd0 PATH: /dev/disk/by-id/scsi-360022480d00e683e8687aae16482dcd0)
|   |
|   |-- SNAPSHOTS: 2     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 2022/08/11 08:54:51    12   12      0 READY        384 MB
|   |   |-- SNAPID    2: 1969/12/31 16:00:00     4    4      0 READY        128 MB
|   |   |

 All volumes are healthy
 sherlock produced results in 243 milliseconds (Tue Sep 27 11:03:57 PM 2022).
|-- 1 nodes, 6 disks, 5 vols, 5 snapshots, 2 apps, 0 protection groups, 2 pods, 1 file-collections,
    2 bundles, 3 users and 2 tenants were analyzed

Note

In certain cases a volume might be marked as needing attention when its state is DEGRADED however this simply indicates a replica of the volume is offline. It does not indicate that it is unhealthy or faulted as the respective volume is still capable of serving I/Os.

21.9.5.6. Check Device Health

In order to view a report containing the status of a device use the --dev parameter along with the WWN of the device as shown in the example below. Information on the objects associated with the device such as the volumes allocated from it will also be displayed alongside details of the devices usage.

Note

In addition to accepting the WWN, the --dev parameter can be utilized with the following values: ‘all’ can be given to display information on all devices, ‘full’ can be given to display details on devices that are nearly full, a list of nodenames can be specified to show devices on the given nodes, and lastly to uniquely identify a device a combination of the nodename and devpath can be given in the format <nodename>:<devpath.

Example

# sherlock --dev 0x500a075109604998 -V

DEVICE /dev/sdb on eqx01-flash15 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a075109604998 | PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998
|-- VOL: 1 file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44    448 MB nslices=20  nsegs=14   (1  ) nsnaps=1

sherlock produced results in 130 milliseconds (Sun Sep 19 03:34:51 AM 2021).
|-- 2 nodes, 18 disks, 1 vols, 1 snapshots, 0 apps, 0 pods, 1 file-collections,
    0 bundles, 1 users and 1 tenants were analyzed

21.9.5.7. Check Devices Nearing Maximum Capacity

In order to view a report containing device(s) nearing their maximum capacity the --dev parameter along with the keyword ‘full’ can be used as shown in the example below. Information on the space usage statistics for the concerned device(s) will be displayed alongside the allocations utilizing the space on the device. If any device(s) are nearing their maximum capacity, Robin recommends adding more devices to the respective nodes in order to boost the performance of the cluster as well ensure it can host more applications.

Example

# sherlock --dev full -V

DEVICE /dev/sdc on telxvm-53-159 9/10 vols | 31/6390 slices | 2 KB free of 100 GB NODE: ONLINE, RDVM: UP, DEV: READY
=======================================================================================================================================================================
  |==> WWN: 0x600224802a495d29715780d6f9be9eb5 | PATH: /dev/disk/by-id/scsi-3600224802a495d29715780d6f9be9eb5
  |-- VOL: 8 jm1.R1.01.data.1.12725312-2294-4eea-8eac-704938facd69                 576 MB nslices=10  nsegs=18   (1  ) nsnaps=2
  |-- VOL: 9 jm1.R2.01.data.1.e505bc9c-3bbc-4ec7-81c4-63e845aef949                 576 MB nslices=10  nsegs=18   (1  ) nsnaps=2
  |-- VOL: 1 file-collection-1637286348291.a8492d9c-591d-404f-926d-a4d647adcffc    160 MB nslices=5   nsegs=5    (1  ) nsnaps=1
  |-- VOL: 4 test.server.01.data.1.11e20b4b-4d40-419c-959d-fb0583321c11             64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 10 pvc-aa6a4996-3212-4b41-8c2a-44375be6834c                               64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 11 pvc-c2d086b0-fee8-4704-ab87-f937a09fb40e                               64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 24 clone1.R2.01.data.1.ab94abef-96ba-4073-816d-f2e4e0614a42               32 MB nslices=1   nsegs=1    (1  ) nsnaps=1
  |-- VOL: 25 clone1.R1.01.data.1.bca7107a-3dd9-4d45-af93-69819e88504a               32 MB nslices=1   nsegs=1    (1  ) nsnaps=1
  |-- VOL: 5 test.server.01.block.1.ec6ac7c1-19c1-45af-a578-0c7c4c1dca0c                0 nslices=1   nsegs=0    (0  ) nsnaps=1

DEVICE /dev/sdb on telxvm-53-159 9/10 vols | 9/6390 slices | 2 KB free of 100 GB NODE: ONLINE, RDVM: UP, DEV: READY
=======================================================================================================================================================================
  |==> WWN: 0x60022480a2e824923e91646995e0da4b | PATH: /dev/disk/by-id/scsi-360022480a2e824923e91646995e0da4b
  |-- VOL: 19 test2.server.01.data.1.f04ef40a-6218-477c-8f02-e5cf4b2899a6     64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 17 test2.server.02.data.1.6011eb22-bf3e-4e16-ba9c-7de8d2fb97dc     64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 18 test2.server.03.data.1.05139336-6657-49cb-8663-11da2a4a5d0f     64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 21 pvc-cf88884a-80c2-4544-80d3-82334331e529                        64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 22 pvc-b8a55274-0e96-48a3-94d9-d1fe2aa846e6                        64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 23 pvc-ba4e627b-1795-46c2-bca9-65dfdfbbab7e                        64 MB nslices=1   nsegs=2    (2  ) nsnaps=1
  |-- VOL: 20 test2.server.03.block.1.4dfa240b-126b-448d-843e-9262ee8cdca9        0 nslices=1   nsegs=0    (0  ) nsnaps=1
  |-- VOL: 16 test2.server.01.block.1.0e3a6e89-bb00-413b-a25d-5a416d523bae        0 nslices=1   nsegs=0    (0  ) nsnaps=1
  |-- VOL: 15 test2.server.02.block.1.000c723b-11c5-4fc3-8eec-032141da223f        0 nslices=1   nsegs=0    (0  ) nsnaps=1

sherlock produced results in 129 milliseconds (Mon Nov 29 05:11:03 AM 2021).
|-- 1 nodes, 6 disks, 18 vols, 20 snapshots, 4 apps, 8 pods, 1 file-collections,
    2 bundles, 2 users and 2 tenants were analyzed

21.9.5.8. Find Devices With Rebalance Need

In order to view a report containing device(s) that might need to be rebalanced use the --devs-needing-rebalance parameter as shown in the example below.

Example

# sherlock --devs-needing-rebalance

SHOWING APPLICATIONS THAT NEED ATTENTION:
All apps are healthy

SHOWING PODS THAT NEED ATTENTION:
All pods are healthy

SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION
All volumes are healthy

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:
All nodes are healthy

SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:
All devices are healthy

SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION:
All file collection are available

SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION:
All bundles are available

Moving 4 vols, 20 slices and 256 segments:

eqx04-flash05        /dev/sdb  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/57194   segs=      0/57194   vols= 0/100 [ 1.26 ]
eqx04-flash05        /dev/sdc  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/57194   segs=      0/57194   vols= 0/100 [ 1.26 ]
eqx04-flash05        /dev/sda  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05        /dev/sdd  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05        /dev/sde   59.6 GB/59.6 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05       /dev/dm-1   17.4 GB/17.4 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05       /dev/dm-2   35.7 GB/35.7 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05       /dev/dm-0    6.0 GB/6.0 GB    (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
------------------------------------------------------------------------------------------------------------------------
eqx01-flash16        /dev/sdb    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdg    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdd    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdi    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sde    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdf    1.8 TB/1.8 TB    (free=100.0 %) slices=      5/119194  segs=     11/119194  vols= 1/100 [ 1.25 ]
eqx01-flash16        /dev/sdh    1.8 TB/1.8 TB    (free=100.0 %) slices=     11/119194  segs=     18/119194  vols= 1/100 [ 1.25 ]
eqx01-flash16        /dev/sdc   14.9 GB/14.9 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx01-flash16        /dev/sda    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
------------------------------------------------------------------------------------------------------------------------
eqx01-flash15        /dev/sde    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdf    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdi    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdg    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdh    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdd    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdb  893.8 GB/894.3 GB  (free=100.0 %) slices=     20/57194   segs=     14/57194   vols= 1/100 [ 1.24 ]
eqx01-flash15        /dev/sda  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx01-flash15        /dev/sdc   14.9 GB/14.9 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]

Only unhealthy objects are shown. To see everything re-run with -H|--healthy option
To see more details rerun with -V|--verbose option
sherlock produced results in 131 milliseconds (Sun Sep 19 05:19:44 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    1 bundles, 2 users and 2 tenants were analyzed