20. Troubleshooting

Robin Platform provides a number of native tools and commands for an administrator to utilize in order troubleshoot their Robin cluster and/or report issues. These tools vary in their use case but provide enough information to provide insight as to why as the cluster is not functioning as intended or the reason for unexpected failures. As a result, they should be the go-to utilities when debugging potential issues and their outputs should be sent alongside any bug reports filed to Robin. Each tool has been described in their respective sections below.

Alongside the aforementioned tools for administrators, Robin Platform also provides more granular commands for individual users to track the progress of their executed operations and determine reasons for their failure. These operations are referred to as jobs and are identified by a unique ID.

Robin jobs are the operations executed during a cluster’s lifespan. Each job has a unique ID. The job log contains all information about the job such as job ID, job type, description, and so on. Robin stores the job logs in its database.

As an administrator, you can view the job logs and troubleshoot your cluster using these job logs. Robin recommends that you provide the complete job logs when reporting issues to Robin for debugging purposes.

The Robin job logs are stored in the following directory:

  • /var/log/robin/server is present only in the Robin master nodes.

  • /var/log/robin/agent is present in all Robin nodes.

You can also access the job logs from the host in the following directories:

  • /home/robinds/var/log/robin/server is present only in the Robin master nodes.

  • /home/robinds/var/log/robin/agent is present in all Robin nodes.

20.1. Listing all jobs

Robin stores all jobs that have occurred during a cluster’s lifespan. To view these jobs alongside details such as their start time, state etc. issue the following command:

# robin job list --verbose
                 --ignoredeps
                 --noarchived
                 --nopurged
                 --states  <states>
                 --failed
                 --nocolor
                 --page_size <size>
                 --page_num <num>
                 --total
                 --all
                 --app <app_name>
                 --k8sapp <k8sapp_name>
                 --vnode <vnode_name>
                 --node <node_name>
                 --disk <disk_wwn>

--verbose

Show complete job information instead of truncating it for display purposes.

--ignoredeps

Do not show child jobs

--noarchived

Do not show archived jobs

--nopurged

Do not show purged jobs

--states <states>

Filter jobs based on states. Choose one or more from: active, failed, succeeded, archived, purged

--failed

Show only jobs which have failed

--nocolor

Show uncolored output

--page_size <size>

Number of jobs that should be displayed for each page

--page_num <num>

Page number to start displaying jobs from (starting index 1)

--total

Return the total number of qualified root jobs

--all

Display all jobs associated with a specific application. Note this option must be used in conjunction with the --app option

--app <app_name>

Filter jobs based on specified application

--k8sapp <k8sapp_name>

Filter jobs based on specified K8s/Helm registered application name

--vnode <vnode_name>

Filter jobs based on specified Vnode name

--node <node_name>

Filter jobs based on specified physical node name

--disk <disk_wwn>

Filter jobs based on specified disk WWN

Example:

Output
# robin job list
ID            | Type              | Description                                                                                                                                | State            | Start           | End     | User   | Message
--------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+---------+--------+------------------------------------------
1013          | ApplicationStart  | Starting application 'wp-10'                                                                                                               | COMPLETED        | 13 Aug 23:28:29 | 0:00:54 | system |
|->1015       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 13 Aug 23:28:30 | 0:00:38 | system |
|  |->1017    | VnodeDeploy       | Deploying vnode 'wp-10.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                            | COMPLETED        | 13 Aug 23:28:30 | 0:00:38 | system |
|  |  |->1018 | VnodeStop         | Stopping vnode wp-10.mysql.01 on cscale-82-140.robinsystems.com                                                                            | COMPLETED        | 13 Aug 23:28:30 | 0:00:15 | system |
|->1016       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |->1024    | VnodeDeploy       | Deploying vnode 'wp-10.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |  |->1025 | VnodeStop         | Stopping vnode wp-10.wordpress.01 on cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:07 | system |
1014          | ApplicationStart  | ApplicationStart                                                                                                                           | COMPLETED|FAILED | 13 Aug 23:28:29 | 0:00:00 | system | Another job is running on application 'w
1019          | ApplicationStart  | Starting application 'wp-20'                                                                                                               | COMPLETED        | 13 Aug 23:28:31 | 0:00:51 | system |
|->1020       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 13 Aug 23:28:32 | 0:00:36 | system |
|  |->1022    | VnodeDeploy       | Deploying vnode 'wp-20.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                            | COMPLETED        | 13 Aug 23:28:32 | 0:00:36 | system |
|  |  |->1023 | VnodeStop         | Stopping vnode wp-20.mysql.01 on cscale-82-140.robinsystems.com                                                                            | COMPLETED        | 13 Aug 23:28:32 | 0:00:13 | system |
|->1021       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |->1026    | VnodeDeploy       | Deploying vnode 'wp-20.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:14 | system |
|  |  |->1027 | VnodeStop         | Stopping vnode wp-20.wordpress.01 on cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 13 Aug 23:29:08 | 0:00:05 | system |
1028          | JobArchive        | Archiving job/s on all hosts                                                                                                               | COMPLETED        | 14 Aug 00:00:00 | 0:00:02 | system |
|->1029       | AgentJobArchive   | Archiving job/s on host cscale-82-140.robinsystems.com                                                                                     | COMPLETED        | 14 Aug 00:00:01 | 0:00:00 | system |
1030          | HostProbe         | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch.                                       | COMPLETED        | 14 Aug 07:54:37 | 0:00:01 | system |
1031          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch.                       | COMPLETED        | 14 Aug 07:54:37 | 0:00:51 | system |
1032          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch.                       | COMPLETED        | 14 Aug 08:11:11 | 0:00:50 | system |
1033          | HostProbe         | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch.                                       | COMPLETED        | 14 Aug 08:11:11 | 0:00:01 | system |
1034          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp.                                | COMPLETED        | 14 Aug 09:24:17 | 0:00:50 | system |
1035          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED|FAILED | 14 Aug 09:25:07 | 0:01:40 | system | Pods do not need to be failed over as Ku
1036          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Ready. Origin: StateChange.                                     | COMPLETED        | 14 Aug 09:25:17 | 0:00:01 | system |
1037          | ApplicationDelete | Deleting application 'wp-10'                                                                                                               | COMPLETED        | 14 Aug 09:41:10 | 0:00:12 | robin  |
|->1038       | VnodeDelete       | Deleting vnode 'wp-10.wordpress.01' from cscale-82-140.robinsystems.com                                                                    | COMPLETED        | 14 Aug 09:41:10 | 0:00:06 | robin  |
|->1039       | VnodeDelete       | Deleting vnode 'wp-10.mysql.01' from cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 14 Aug 09:41:10 | 0:00:08 | robin  |
1040          | ApplicationDelete | Deleting application 'wp-20'                                                                                                               | COMPLETED        | 14 Aug 09:41:16 | 0:00:13 | robin  |
|->1041       | VnodeDelete       | Deleting vnode 'wp-20.wordpress.01' from cscale-82-140.robinsystems.com                                                                    | COMPLETED        | 14 Aug 09:41:16 | 0:00:10 | robin  |
|->1042       | VnodeDelete       | Deleting vnode 'wp-20.mysql.01' from cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 14 Aug 09:41:16 | 0:00:09 | robin  |
1043          | ApplicationDelete | Deleting application 'wp-30'                                                                                                               | COMPLETED        | 14 Aug 09:41:20 | 0:00:19 | robin  |
|->1044       | VnodeDelete       | Deleting vnode 'wp-30.wordpress.01' from cscale-82-140.robinsystems.com                                                                    | COMPLETED        | 14 Aug 09:41:20 | 0:00:06 | robin  |
|->1045       | VnodeDelete       | Deleting vnode 'wp-30.mysql.01' from cscale-82-140.robinsystems.com                                                                        | COMPLETED        | 14 Aug 09:41:20 | 0:00:15 | robin  |
1046          | ApplicationCreate | Adding application 'wp-1'                                                                                                                  | COMPLETED        | 14 Aug 09:42:58 | 0:00:58 | robin  |
|->1047       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:43:00 | 0:00:42 | robin  |
|  |->1049    | VnodeAdd          | Adding vnode 'wp-1.mysql.01' on cscale-82-140.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:43:00 | 0:00:42 | robin  |
|->1048       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:43:42 | 0:00:14 | robin  |
|  |->1053    | VnodeAdd          | Adding vnode 'wp-1.wordpress.01' on cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:43:42 | 0:00:14 | robin  |
1050          | ApplicationCreate | Adding application 'wp-2'                                                                                                                  | COMPLETED        | 14 Aug 09:43:39 | 0:00:46 | robin  |
|->1051       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:43:42 | 0:00:34 | robin  |
|  |->1054    | VnodeAdd          | Adding vnode 'wp-2.mysql.01' on cscale-82-140.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:43:42 | 0:00:34 | robin  |
|->1052       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:44:16 | 0:00:09 | robin  |
|  |->1055    | VnodeAdd          | Adding vnode 'wp-2.wordpress.01' on cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:44:16 | 0:00:09 | robin  |
1056          | ApplicationCreate | Adding application 'wp-3'                                                                                                                  | COMPLETED        | 14 Aug 09:44:18 | 0:00:57 | robin  |
|->1057       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:44:20 | 0:00:41 | robin  |
|  |->1059    | VnodeAdd          | Adding vnode 'wp-3.mysql.01' on cscale-82-140.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:44:20 | 0:00:41 | robin  |
|->1058       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:45:01 | 0:00:13 | robin  |
|  |->1067    | VnodeAdd          | Adding vnode 'wp-3.wordpress.01' on cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:45:02 | 0:00:12 | robin  |
1060          | ApplicationDelete | Deleting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 09:44:53 | 0:00:17 | robin  |
|->1061       | VnodeDelete       | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 09:44:53 | 0:00:05 | robin  |
|->1062       | VnodeDelete       | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:44:53 | 0:00:13 | robin  |
1063          | ApplicationDelete | Deleting application 'wp-2'                                                                                                                | COMPLETED        | 14 Aug 09:44:57 | 0:00:21 | robin  |
|->1064       | VnodeDelete       | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 09:44:57 | 0:00:09 | robin  |
|->1065       | VnodeDelete       | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:44:57 | 0:00:18 | robin  |
1066          | ApplicationDelete | ApplicationDelete                                                                                                                          | COMPLETED|FAILED | 14 Aug 09:45:01 | 0:00:00 | robin  | Another job is running on application 'w
1068          | ApplicationProbe  | Probing application 'wp-3'                                                                                                                 | COMPLETED        | 14 Aug 09:45:12 | 0:00:00 | robin  |
1069          | ApplicationDelete | Deleting application 'wp-3'                                                                                                                | COMPLETED        | 14 Aug 09:45:16 | 0:00:12 | robin  |
|->1070       | VnodeDelete       | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 09:45:16 | 0:00:05 | robin  |
|->1071       | VnodeDelete       | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:45:16 | 0:00:09 | robin  |
1072          | ApplicationCreate | Adding application 'wp-1'                                                                                                                  | COMPLETED        | 14 Aug 09:47:03 | 0:00:45 | robin  |
|->1074       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:47:39 | 0:00:08 | robin  |
|  |->1076    | VnodeAdd          | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:47:39 | 0:00:08 | robin  |
|->1073       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:47:05 | 0:00:34 | robin  |
|  |->1075    | VnodeAdd          | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:47:05 | 0:00:34 | robin  |
1077          | ApplicationCreate | Adding application 'wp-2'                                                                                                                  | COMPLETED        | 14 Aug 09:47:43 | 0:00:44 | robin  |
|->1079       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:48:18 | 0:00:09 | robin  |
|  |->1081    | VnodeAdd          | Adding vnode 'wp-2.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:48:18 | 0:00:09 | robin  |
|->1078       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:47:45 | 0:00:33 | robin  |
|  |->1080    | VnodeAdd          | Adding vnode 'wp-2.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:47:45 | 0:00:33 | robin  |
1082          | ApplicationCreate | Adding application 'wp-3'                                                                                                                  | COMPLETED        | 14 Aug 09:49:14 | 0:03:12 | robin  |
|->1083       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 09:49:16 | 0:02:49 | robin  |
|  |->1085    | VnodeAdd          | Adding vnode 'wp-3.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 09:49:16 | 0:02:49 | robin  |
|->1084       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 09:52:05 | 0:00:20 | robin  |
|  |->1086    | VnodeAdd          | Adding vnode 'wp-3.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 09:52:05 | 0:00:20 | robin  |
1087          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown.                              | COMPLETED        | 14 Aug 09:53:43 | 0:00:52 | system |
1088          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED        | 14 Aug 09:54:35 | 0:00:01 | system |
1089          | ApplicationStart  | Starting application 'wp-3'                                                                                                                | COMPLETED        | 14 Aug 09:54:38 | 0:03:41 | system |
|->1092       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 09:54:38 | 0:01:53 | system |
|  |->1094    | VnodeDeploy       | Deploying vnode 'wp-3.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 09:54:38 | 0:01:53 | system |
|->1093       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 09:56:31 | 0:01:48 | system |
|  |->1102    | VnodeDeploy       | Deploying vnode 'wp-3.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 09:56:31 | 0:01:48 | system |
1090          | ApplicationStart  | Starting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 09:54:38 | 0:03:44 | system |
|->1098       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 09:54:39 | 0:01:51 | system |
|  |->1100    | VnodeDeploy       | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 09:54:39 | 0:01:51 | system |
|->1099       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 09:56:30 | 0:01:52 | system |
|  |->1101    | VnodeDeploy       | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 09:56:30 | 0:01:52 | system |
1091          | ApplicationStart  | Starting application 'wp-2'                                                                                                                | COMPLETED        | 14 Aug 09:54:38 | 0:03:44 | system |
|->1095       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 09:54:39 | 0:01:52 | system |
|  |->1097    | VnodeDeploy       | Deploying vnode 'wp-2.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 09:54:39 | 0:01:52 | system |
|->1096       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 09:56:31 | 0:01:51 | system |
|  |->1103    | VnodeDeploy       | Deploying vnode 'wp-2.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 09:56:32 | 0:01:50 | system |
1104          | ApplicationDelete | Deleting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 10:18:34 | 0:00:15 | robin  |
|->1105       | VnodeDelete       | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 10:18:34 | 0:00:06 | robin  |
|->1106       | VnodeDelete       | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:18:34 | 0:00:11 | robin  |
1107          | ApplicationDelete | Deleting application 'wp-2'                                                                                                                | COMPLETED        | 14 Aug 10:18:38 | 0:00:14 | robin  |
|->1108       | VnodeDelete       | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 10:18:38 | 0:00:06 | robin  |
|->1109       | VnodeDelete       | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:18:38 | 0:00:08 | robin  |
1110          | ApplicationDelete | Deleting application 'wp-3'                                                                                                                | COMPLETED        | 14 Aug 10:18:43 | 0:00:15 | robin  |
|->1111       | VnodeDelete       | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com                                                                     | COMPLETED        | 14 Aug 10:18:43 | 0:00:12 | robin  |
|->1112       | VnodeDelete       | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:18:43 | 0:00:13 | robin  |
1113          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp.                                | COMPLETED        | 14 Aug 10:20:02 | 0:00:50 | system |
1114          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED|FAILED | 14 Aug 10:20:52 | 0:01:40 | system | Pods do not need to be failed over as Ku
1115          | HostProbe         | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'} | COMPLETED        | 14 Aug 10:22:17 | 0:00:00 | system |
1116          | HostProbe         | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'}      | COMPLETED        | 14 Aug 10:22:47 | 0:00:00 | system |
1117          | HostProbe         | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Ready. Origin: StateChange.                                          | COMPLETED        | 14 Aug 10:22:59 | 0:00:00 | system |
1118          | ApplicationCreate | Adding application 'wp-1'                                                                                                                  | COMPLETED        | 14 Aug 10:40:21 | 0:01:05 | robin  |
|->1119       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 10:40:24 | 0:00:41 | robin  |
|  |->1121    | VnodeAdd          | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com                                                                             | COMPLETED        | 14 Aug 10:40:24 | 0:00:41 | robin  |
|->1120       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 10:41:05 | 0:00:21 | robin  |
|  |->1122    | VnodeAdd          | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com                                                                         | COMPLETED        | 14 Aug 10:41:05 | 0:00:21 | robin  |
1123          | ApplicationCreate | Adding application 'wp-2-no-aff'                                                                                                           | COMPLETED        | 14 Aug 10:45:45 | 0:00:57 | robin  |
|->1124       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 10:45:48 | 0:00:41 | robin  |
|  |->1126    | VnodeAdd          | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com                                                                      | COMPLETED        | 14 Aug 10:45:48 | 0:00:41 | robin  |
|->1125       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 10:46:29 | 0:00:13 | robin  |
|  |->1127    | VnodeAdd          | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com                                                                  | COMPLETED        | 14 Aug 10:46:29 | 0:00:13 | robin  |
1128          | ApplicationCreate | Adding application 'wp-3-no-aff'                                                                                                           | COMPLETED        | 14 Aug 10:46:33 | 0:00:39 | robin  |
|->1129       | RoleCreate        | Provisioning containers for role 'mysql'                                                                                                   | COMPLETED        | 14 Aug 10:46:35 | 0:00:28 | robin  |
|  |->1131    | VnodeAdd          | Adding vnode 'wp-3-no-aff.mysql.01' on cscale-82-139.robinsystems.com                                                                      | COMPLETED        | 14 Aug 10:46:35 | 0:00:28 | robin  |
|->1130       | RoleCreate        | Provisioning containers for role 'wordpress'                                                                                               | COMPLETED        | 14 Aug 10:47:03 | 0:00:09 | robin  |
|  |->1132    | VnodeAdd          | Adding vnode 'wp-3-no-aff.wordpress.01' on cscale-82-139.robinsystems.com                                                                  | COMPLETED        | 14 Aug 10:47:03 | 0:00:09 | robin  |
1133          | HostProbe         | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown.                              | COMPLETED        | 14 Aug 10:49:36 | 0:00:52 | system |
1134          | HostFailoverPods  | Failing over pods on host cscale-82-139.robinsystems.com                                                                                   | COMPLETED        | 14 Aug 10:50:28 | 0:00:01 | system |
1135          | ApplicationStart  | Starting application 'wp-1'                                                                                                                | COMPLETED        | 14 Aug 10:50:29 | 0:03:22 | system |
|->1141       | RoleStart         | Starting instances for role 'wordpress'                                                                                                    | COMPLETED        | 14 Aug 10:52:16 | 0:01:35 | system |
|  |->1143    | VnodeDeploy       | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                         | COMPLETED        | 14 Aug 10:52:16 | 0:01:35 | system |
|->1140       | RoleStart         | Starting instances for role 'mysql'                                                                                                        | COMPLETED        | 14 Aug 10:50:30 | 0:01:46 | system |
|  |->1142    | VnodeDeploy       | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com)                                                             | COMPLETED        | 14 Aug 10:50:30 | 0:01:46 | system |
1136          | VnodeDeploy       | Deploying vnode 'wp-3-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                 | COMPLETED        | 14 Aug 10:50:29 | 0:01:48 | robin  |
1137          | VnodeDeploy       | Deploying vnode 'wp-3-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                     | COMPLETED        | 14 Aug 10:50:29 | 0:02:04 | robin  |
1138          | VnodeDeploy       | Deploying vnode 'wp-2-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                     | COMPLETED        | 14 Aug 10:50:29 | 0:02:07 | robin  |
1139          | VnodeDeploy       | Deploying vnode 'wp-2-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com)                                                 | COMPLETED        | 14 Aug 10:50:29 | 0:01:44 | robin  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Returns all jobs that have occurred during a cluster’s lifespan.

End Point: /api/v5/robin_server/jobs

Method: GET

URL Parameters:

  • sort=[id|-id] : Utilizing this parameter results in the list of jobs returned being sorted by their id.

  • noarchived=true : Utilizing this parameter results in archived jobs not being returned.

  • nopurged=true : Utilizing this parameter results in purged jobs not being returned.

  • failed=true : Utilizing this parameter results in only failed jobs being returned.

  • parent=true : Utilizing this parameter results in only parent jobs being returned.

  • page_size=<size> : Utilizing this parameter results in <size> number of jobs being returned.

  • page_num=<index> : Utilizing this parameter results in jobs starting from <index> being returned.

  • objtype=[APPLICATION|K8S_APPLICATION|INSTANCE|DISK|NODE] : Utilizing this parameter results in only jobs for the specified object type being returned.

  • objname=<obj_name> : Utilizing this parameter results in only jobs for objects with the specified name being returned.

  • all=true : Utilizing this parameter results in all jobs being returned. Note this option is only valid when an application name is specified.

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

Output
{
   "page_size":10,
   "items":{
      "users":[
         {
            "email":null,
            "tenantid":1,
            "firstname":"Robin",
            "username":"robin",
            "id":3,
            "lastname":"Systems"
         }
      ],
      "jobs":[
         {
            "jobid":1888,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[1889]",
            "endtime":1597456503,
            "children":[
               {
                  "jobid":1889,
                  "tenant_id":1,
                  "enabled":true,
                  "child_job_ids":"[]",
                  "endtime":1597456498,
                  "parent_jobid":1888,
                  "error":0,
                  "message":"",
                  "taskrunner":1,
                  "starttime":1597456497,
                  "dependson_job_ids":"[]",
                  "level":"child",
                  "user_id":1,
                  "jtype":"CollectionOffline",
                  "timeout":86400,
                  "state":10,
                  "desc":"Taking collection 'file-collection-1597122699552' offline (Force False)"
               }
            ],
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":1,
            "starttime":1597456496,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"CollectionOnline",
            "timeout":86400,
            "state":10,
            "desc":"Bringing collection 'file-collection-1597122699552' online"
         },
         {
            "jobid":1887,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[1890]",
            "endtime":1597456504,
            "children":[
               {
                  "jobid":1890,
                  "tenant_id":1,
                  "enabled":true,
                  "child_job_ids":"[]",
                  "endtime":1597456499,
                  "parent_jobid":1887,
                  "error":0,
                  "message":"",
                  "taskrunner":1,
                  "starttime":1597456497,
                  "dependson_job_ids":"[]",
                  "level":"child",
                  "user_id":3,
                  "jtype":"VnodeStop",
                  "timeout":86400,
                  "state":10,
                  "desc":"Stopping vnode test-ds-1.server.01 on cscale-82-140.robinsystems.com"
               }
            ],
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":1,
            "starttime":1597456496,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":3,
            "jtype":"VnodeDeploy",
            "timeout":86400,
            "state":10,
            "desc":"Deploying vnode 'test-ds-1.server.01'. Origin: Event (cscale-82-140.robinsystems.com)"
         },
         {
            "jobid":1886,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456488,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456487,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Ready. Origin: StateChange."
         },
         {
            "jobid":1885,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456476,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456475,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Notready. Origin: StateChange.. Services Down: {'iomgr-server'}"
         },
         {
            "jobid":1884,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456470,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456470,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/WaitingForMonitor ==> ONLINE\/Notready. Origin: StartingHostWatch.. Services Down: {'iomgr-server'}"
         },
         {
            "jobid":1883,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456520,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456469,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"HostProbe",
            "timeout":86400,
            "state":10,
            "desc":"Probed cscale-82-139.robinsystems.com from UNREACHABLE\/Notready ==> UNREACHABLE\/Notready. Origin: StartingHostWatch."
         },
         {
            "jobid":1882,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x60022480940ed076551cfaf75612e24e'"
         },
         {
            "jobid":1881,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x60022480ffcf3deb224fb37d78fe7767'"
         },
         {
            "jobid":1880,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x600224804c48fd7e16c608dea0919064'"
         },
         {
            "jobid":1879,
            "tenant_id":1,
            "enabled":true,
            "child_job_ids":"[]",
            "endtime":1597456467,
            "parent_jobid":0,
            "error":0,
            "message":"",
            "taskrunner":0,
            "starttime":1597456467,
            "dependson_job_ids":"[]",
            "level":"parent",
            "user_id":1,
            "jtype":"DiskNotify",
            "timeout":86400,
            "state":10,
            "desc":"Event on disk '0x600224803bcdafde95b1f5cd27ceb5fb'"
         }
      ]
   },
   "total":1542,
   "num_items":10,
   "page_num":1
}

20.2. Show information about a specific job

In order to get more detailed information about a specific job including the state, duration and any errors related to it and any respective child jobs, issue the following command:

# robin job info <id>

id

Job ID

Example:

# robin job info 1123
ID         | Type              | Desc                                                                      | State     | Start           | End      | Duration | Dependson | Error | Message
-----------+-------------------+---------------------------------------------------------------------------+-----------+-----------------+----------+----------+-----------+-------+---------
1123       | ApplicationCreate | Adding application 'wp-2-no-aff'                                          | COMPLETED | 14 Aug 10:45:45 | 10:46:42 | 0:00:57  | []        | 0     |
|->1124    | RoleCreate        | Provisioning containers for role 'mysql'                                  | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41  | []        | 0     |
|  |->1126 | VnodeAdd          | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com     | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41  | []        | 0     |
|->1125    | RoleCreate        | Provisioning containers for role 'wordpress'                              | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13  | [1124]    | 0     |
|  |->1127 | VnodeAdd          | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13  | []        | 0     |

Returns details about a specific job and any of its respective child jobs.

End Point: /api/v3/robin_server/jobs/<job_id>

Method: GET

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)

Headers:

  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Authorization Error)

Example Response:

Output
{
   "tenant_name":"Administrators",
   "jobid":1888,
   "tenant_id":1,
   "enabled":true,
   "json":{
      "collection_id":1597122699552,
      "state":"SuspectedOffline",
      "set_failed":true,
      "origin":2,
      "hostname":"cscale-82-140.robinsystems.com"
   },
   "user_name":"system",
   "endtime":1597456503,
   "parent_jobid":0,
   "error":0,
   "message":"",
   "taskrunner":1,
   "starttime":1597456496,
   "child_job_ids":"[1889]",
   "cjobs":[
      {
         "tenant_name":"Administrators",
         "jobid":1889,
         "tenant_id":1,
         "enabled":true,
         "json":{
            "collection_id":1597122699552
         },
         "user_name":"system",
         "endtime":1597456498,
         "parent_jobid":1888,
         "error":0,
         "message":"",
         "taskrunner":1,
         "starttime":1597456497,
         "child_job_ids":"[]",
         "cjobs":[

         ],
         "dependson_job_ids":"[]",
         "user_id":1,
         "jtype":"CollectionOffline",
         "timeout":86400,
         "state":10,
         "desc":"Taking collection 'file-collection-1597122699552' offline (Force False)",
         "priority":300
      }
   ],
   "dependson_job_ids":"[]",
   "user_id":1,
   "jtype":"CollectionOnline",
   "timeout":86400,
   "state":10,
   "desc":"Bringing collection 'file-collection-1597122699552' online",
   "priority":300
}

20.3. Archive Robin Job Logs

You can archive the Robin job logs to prevent the data loss, to improve the security, and to increase free space in the parent directory. The archive is the process of moving all completed job logs to the archived sub-directory of the parent directory for long periods. The archived sub-directories are the part of /var/log/robin/server and var/log/robin/agent directories.

Robin archives the job logs that were completed successfully before 24 hours. The failed job logs remain in the parent directories for analysis purposes.

To archive the Robin job logs, run the following command:

# robin job archive --age
                    --include-failed

--age

Age of the job logs (in mins) to archive.

--include-failed

Include the failed job logs.

Example:

# robin job archive --age 600 --wait
Job: 255170 Name: JobArchive           State: PROCESSED       Error: 0
Job: 255170 Name: JobArchive           State: PREPARED        Error: 0
Job: 255170 Name: JobArchive           State: WAITING         Error: 0
Job: 255170 Name: JobArchive           State: COMPLETED       Error: 0

20.3.1. Configure job archive attributes

You can configure the job archive attributes other than the default value for your cluster. If required, you can change these values and set your own values for the cluster.

The robin job archive task runs automatically on a CronJob. For more information about CronJob.

20.3.1.1. View job archive attributes

You can view the job archive attributes available in your cluster to know the current value of these attributes.

To view the job archive attributes, run the following command:

# robin config list | grep arch

Example:

# robin config list | grep arch
server               | job_archive_age                  | 86400
server               | job_archive_cron                 | 0 0 * * *

20.3.1.2. Job archive attributes and their values

The following are the job archive attributes and their default values in a cluster.

Attribute

Default value

Description

job_archive_age

86400

The age (in seconds) of the completed job logs to be archived.

job_archive_cron

0 0 * * *

The time at which the job archive task executes automatically to archive the Robin job logs. For more information about Cron schedule syntax.

20.3.1.3. Update job archive attributes

You can update the job archive attributes and set your own values for the cluster.

To update the job archive attributes, run the following command:

# robin config update server <attribute> <value>

Example:

# robin config update server job_archive_age 81000
The 'server' attribute 'job_archive_age' has been updated

20.4. Purge Robin Job logs

You can purge the Robin job logs when you don’t want to store the old job logs. You can purge these job logs by running the robin job purge command or the job_purge_cron.

The job purge task deletes the following job logs from the database and the nodes’ directories:

  • Successful jobs older than two weeks.

  • Failed jobs older than four weeks.

  • Robin maintenance jobs older than one week.

The job purge task takes the following actions for the job-ids and its children job-ids:

  • For the job logs present in the Robin master nodes:

    • Deletes the server job log directory /var/log/robin/server/<job-id>.

    • Deletes the server job log archive file /var/log/robin/server/archived/<job-id>.tar.gz.

  • For the job logs present in the Robin master and agent nodes:

    • Deletes the agent job log directory /var/log/robin/agent/<job-id>.

    • Deletes the agent job log archive file /var/log/robin/agent/archived/<job-id>.tar.gz.

To purge the Robin job logs, run the following command:

# robin job purge --age
                  --failed-job-age
                  --maintenance-job-age
                  --maintenance-job-types
                  --before-id

--age

Purge successful jobs with completed time lesser than specified date and time (in ‘%Y-%m-%dT%H:%M:%S’ format). Default: 2 weeks old from now.

--failed-job-age

Purge failed jobs with completed time lesser than specified date and time (in ‘%Y-%m-%dT%H:%M:%S’ format). Default: 4 weeks old from now.

--maintenance-job-age

Purge maintenance jobs with completed time lesser than specified date and time (in ‘%Y-%m-%dT%H:%M:%S’ format). Default: 1 week old from now.

--maintenance-job-types

Purge the maintenance jobs as per the job types separated by comma. Default: JobArchive,JobPurge.

--before-id

Logs for jobs less than this ID will be purged if age is not specified.

Example:

# robin job purge --age 2021-04-06T18:14:00 --failed-job-age 2021-04-06T18:14:00 --maintenance-job-age 2021-04-06T18:14:00 --wait
Job:  309 Name: JobPurge             State: VALIDATED       Error: 0
Job:  309 Name: JobPurge             State: COMPLETED       Error: 0

20.4.1. Configure job purge attributes

You can configure the job purge attributes other than the default value for your cluster. If required, you can change these values and set your own values for the cluster.

20.4.1.1. View job purge attributes

You can view the job purge attributes available in your cluster to know the current value of these attributes.

To view the job purge attributes, run the following command:

# robin config list | grep job_purge

Example:

# robin config list | grep job_purge
server   | job_purge_age                   | 1209600
server   | job_purge_cron                  | 30 0 * * *
server   | job_purge_failed_age            | 2419200
server   | job_purge_maintenance_age       | 604800
server   | job_purge_maintenance_jtypes    | JobArchive,JobPurge
server   | job_purge_max_count             | 100000

20.4.1.2. Job purge attributes and their values

The following are the job purge attributes and their default values in a cluster.

Attribute

Default value

Description

job_purge_age

1209600

The age (in seconds) of the completed job logs to be purged.

job_purge_cron

30 0 * * *

The time at which the job purge task executes automatically to purge the Robin job logs. For more information about Cron schedule syntax.

job_purge_failed_age

2419200

The age (in seconds) of the failed job logs to be purged.

job_purge_maintenance_age

604800

The age (in seconds) of the maintenance job logs to be purged.

job_purge_maintenance_jtypes

JobArchive,JobPurge

The types of maintenance jobs to be purged.

job_purge_max_count

100000

The maximum number of job logs that can be purged at a time.

20.4.1.3. Update job purge attributes

You can update the job purge attributes and set your own values for the cluster.

To update the job purge attributes, run the following command:

# robin config update server <attribute> <value>

Example:

# robin config update server job_purge_age 13396198
The 'server' attribute 'job_purge_age' has been updated

Note

Robin recommends that the job_purge_cron task must be run daily.

20.5. Clean stale Robin job log directory

You can clean the stale Robin job log directory and the archived job log directory. Robin considers its database to be the most reliable source for the job logs.

You must reconcile the stale job log directory with the archived job log directory at least once a month to avoid retaining the job logs when you delete them manually from the database. The job logs from the stale job log directory and the archived job log directory are deleted when the job logs are not available in Robin’s database.

To clean the stale job logs, run the following command:

# robin job cleanup

Example:

# robin job cleanup --wait
Job: 358447 Name: JobCleanupStaleLogs  State: WAITING         Error: 0
Job: 358447 Name: JobCleanupStaleLogs  State: COMPLETED       Error: 0

20.5.1. Configure job cleanup attribute

You can configure the time at which the job cleanup task executes automatically to clean Robin job logs other than the default value for your cluster.

20.5.1.1. View job cleanup attribute

You can view the job cleanup attribute available in your cluster to know the current value of the attribute.

To view the job cleanup attribute, run the following command:

# robin config list | grep job_cleanup

Example:

# robin config list | grep job_cleanup
server        | job_cleanup_cron                     | 0 1 1 * *

20.5.1.2. Job cleanup attribute and its value

The following is the job cleanup attribute and its default value in a cluster.

Attribute

Default value

Description

job_cleanup_cron

0 1 1 * *

The time at which the job cleanup task executes automatically to clean the Robin job logs from the stale Robin job log directory and the archived job log directory. For more information about Cron schedule syntax.

20.5.1.3. Update job cleanup attribute

You can update the job cleanup attribute and set your own value for the cluster.

To configure job cleanup attribute, run the following command:

# robin config update server job_cleanup_cron <value>

Example:

# robin config update server job_cleanup_cron "0 1 2 * *"
The 'server' attribute 'job_cleanup_cron' has been updated

20.6. Log Collection

During any cluster wide failure or unexpected negative scenarios that affect multiple services, logs from all the system components will be needed by Robin in order to debug the issue properly. However sometimes given the scope of the issue, only a subsection of logs need to be collected. This granularity is available but it is highly recommended to always send the complete set of logs when filing a bug report with Robin. Available age-based filtering helps in reducing storage footprint. Robin supports uploading logs to the following destinations:

robin-storage

Used to store collected logs in Robin backed storage

nfs

Used to store collected logs in NFS.

s3

Used to store collected logs in Amazon S3

ssh

Used to store collected logs in a given remote location

20.6.1. Storing logs using Robin Storage

Logs collected by Robin can be stored on a volume created on the local cluster, with the following command:

# robin log collect robin-storage <rpool>
                                  --nodes <nodes>
                                  --dest-path <dest_path>
                                  --size <size>
                                  --media <media>
                                  --age <age>

rpool

Name of the resource pool name to use.

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--dest-path <dest_path>

Destination path where log files will be copied

--size <size>

Size of the storage volume for the log collect. The default is 250GB

--media <media>

Specify which type of drives to allocate storage from. Choices include: ‘HDD’, ‘SSD’. Default media type is ‘HDD’

--age <age>

Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes.

Example:

# robin log collect robin-storage default --wait
Job:  123 Name: LogCollect           State: PROCESSED       Error: 0
Job:  123 Name: LogCollect           State: WAITING         Error: 0
Job:  123 Name: LogCollect           State: COMPLETED       Error: 0

20.6.2. Storing logs using NFS

Logs collected by Robin can be stored on a NFS share, with the following command:

# robin log collect nfs <nfs_share>
                        --nodes <nodes>
                        --age <age>

nfs_share

The ‘hostname’ or ‘IP’, ‘export_path’ and ‘dest_path’ for an NFS share in the form of <hostname|IP>:<export_path>:<dest_path>’

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--age <age>

Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes.

Example:

# robin log collect nfs 10.9.82.162:/tmp:/demo_log_collect
Job:  126 Name: LogCollect           State: PROCESSED       Error: 0
Job:  126 Name: LogCollect           State: WAITING         Error: 0
Job:  126 Name: LogCollect           State: COMPLETED       Error: 0

20.6.3. Storing logs using AWS S3

Logs collected by Robin can be stored on a AWS S3, with the following command:

# robin log collect s3 <url> <aws_config>
                             --nodes <nodes>
                             --access_key <access_key>
                             --secret_key <secret_key>
                             --age <age>

url

S3 URL in the format https://s3-<region-name>.amazonaws.com/<bucket-name>/<directory>

aws_config

JSON file containing Access key, Secret Key and Region. Example format {“aws_access_key_id”: <key>, “aws_secret_access_key”: <key>, “region”: <region_name>}

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--access_key <access_key>

Access Key for the respective user with access to the specified S3 bucket.

--secret_key <secret_key>

Secret Key for the respective user with access to the specified S3 bucket.

--age <age>

Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes.

Example:

# robin log collect s3 https://s3-us-west-2.amazonaws.com/log-collect/demo_log_collect /root/aws.json --wait
Job:  132 Name: LogCollect           State: PROCESSED       Error: 0
Job:  132 Name: LogCollect           State: WAITING         Error: 0
Job:  132 Name: LogCollect           State: COMPLETED       Error: 0

20.6.4. Storing logs in a remote location

Logs collected by Robin can be stored in a remote location, with the following command:

# robin log collect ssh <dest>
                        --nodes <nodes>
                        --password <password>
                        --age <age>

dest

Destination path where the log files will be copied to. The path should be in the form of ‘<user>@<hostname|IP>:<path>’

--nodes <nodes>

Comma separated list of nodes from which to collect. The default is to collect all

--password <password>

Provide a password on the command line instead of via a prompt

--age <age>

Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes.

Example:

# robin log collect ssh root@10.9.82.163:/demo_log_collect --password robin123
Job:  129 Name: LogCollect           State: PROCESSED       Error: 0
Job:  129 Name: LogCollect           State: WAITING         Error: 0
Job:  129 Name: LogCollect           State: COMPLETED       Error: 0

20.7. Retrieving Job Logs

Robin provides a utility which collects all the appropriate logs from the necessary nodes for a particular job and its consequent hierarchy. It stores these logs within a single tarball that can be provided to Robin alongside a bug report. In addition this useful for an Administrator to debug as to why a job failed unexpectedly. This functionality is extremely convienent as it automates the process of the user logging into every affected node and collecting/inspecting the relevant log files. Issue the following command to retrieve logs for a specific job:

# robin job get <id>

id

ID of job to collect the logs for

Example:

# robin job get 1
Retrieving log files...
Log files for Job ids: [1] are retrieved successfully at 1582189081.tar.gz

20.8. Cluster Auditing

Every operation that is performed by a user on an identifiable object within a Robin cluster is logged for auditing purposes. This allows admins to track the exact series of operations performed by a user as well to monitor the general activity on the concerned cluster. This not only enables more accurate backtracking for troubleshooting purposes as well improving the thoroughness of security audits. Detailed below are the methods by which a user can retrieve the audit log.

20.8.1. Retrieving audit logs from the Robin Database

In order to access the audit log containing information such as which user executed an operation, the tenant and node from which they executed it from, the type of object and operation involved, and the result of the operation issue the following command:

  # robin user-audit list --exec-user <exec_user>
                          --exec-tenant <exec_tenant>
                          --owner-user <owner_user>
                          --owner-tenant <owner_tenant>
                          --id <record_id>
                          --object-type <object_type>
                          --page_size <size>
                          --page_num <num>
                          --operation <operation>
                          --result <result>
                          --full


========================== ========================================================================================================================================
``exec-user <exec_user>``           Filter by username for the user who initiated the operation. Note this option cannot be used in conjunction with ``--owner-user`` parameter
``--exec-tenant <exec_tenant> ``    Filter by tenant name for the user who initiated the operation. Note this option cannot be used in conjunction with ``--owner-tenant``
``--owner-user <owner_user>``       Filter by username for the user who initiated the operation. Note this option cannot be used in conjunction with ``--exec-user``
``--owner-tenant <owner_tenant> ``  Filter by tenant name for the user who initiated the operation. Note this option cannot be used in conjunction with ``--exec-tenant``
``--id <record_id>``                Filter for a specific record Id
``--object-type <object_type> ``    Filter by object type
``--operation <operation>``         Filter by operation
``--page_size <size>``              Number of audit records that should be displayed for each page
``--page_num <num>``                Page number to start displaying audit records from (starting index 1)
``--result <result>``               Filter by operation result
``--full``                          Display additional information about the audit records
========================== ========================================================================================================================================

Example 1 (List first page of audit records):

# robin user-audit list
Id  | Timestamp                | IP Addr     | Exec User | Exec Tenant    | Owner User | Owner Tenant | Object Type     | Operation | Result
----+--------------------------+-------------+-----------+----------------+------------+--------------+-----------------+-----------+---------
643 | August 10, 2021 14:17:47 | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
642 | July 13, 2021 11:24:13   | 10.9.121.40 | robin     | Administrators |            |              | USER            | login     | success
641 | July 13, 2021 11:24:12   | 172.20.0.1  | robin     | Administrators |            |              | METRICS         | enable    | success
640 | July 13, 2021 11:24:10   | 172.20.0.1  | robin     | Administrators |            |              | CONFIG          | update    | success
639 | July 13, 2021 11:24:06   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | probe     | success
638 | July 13, 2021 11:24:04   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | probe     | success
637 | July 13, 2021 11:24:04   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | probe     | success
636 | July 13, 2021 11:23:58   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
635 | July 13, 2021 11:23:57   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
634 | July 13, 2021 11:23:49   | 172.20.0.1  | robin     | Administrators |            |              | FILE_COLLECTION | online    | success
633 | July 13, 2021 11:23:44   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
632 | July 13, 2021 11:20:07   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
631 | July 13, 2021 11:20:07   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
630 | July 13, 2021 11:20:07   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
629 | July 13, 2021 11:20:01   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
628 | July 13, 2021 11:20:01   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
627 | July 13, 2021 11:20:01   | 172.20.0.1  | robin     | Administrators |            |              | NODE            | config    | success
626 | July 13, 2021 11:19:59   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
625 | July 13, 2021 11:19:01   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
624 | July 13, 2021 11:18:57   | 172.20.0.1  | robin     | Administrators |            |              | USER            | login     | success
--------------------------------------------
537 items, page 1 of 27.
--------------------------------------------

Example 2 (List audit records filtered by object type):

# robin user-audit list --object-type APPLICATION
Id | Timestamp                 | IP Addr    | Exec User | Exec Tenant    | Owner User | Owner Tenant   | Object Type | Operation | Result
---+---------------------------+------------+-----------+----------------+------------+----------------+-------------+-----------+---------
46 | October 26, 2020 12:51:46 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
45 | October 26, 2020 12:51:25 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
44 | October 26, 2020 12:51:18 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
43 | October 26, 2020 12:51:06 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
42 | October 26, 2020 12:50:59 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
41 | October 26, 2020 12:49:44 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
40 | October 26, 2020 12:49:26 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
39 | October 26, 2020 12:49:17 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
38 | October 26, 2020 12:49:03 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
37 | October 26, 2020 12:46:17 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
36 | October 26, 2020 12:45:35 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
--------------------------------------------
11 items, page 1 of 1.
--------------------------------------------

Example 3 (Show details for a single audit record):

# robin user-audit list --id 46 --full
Id | Timestamp                 | IP Addr    | Exec User | Exec Tenant    | Owner User | Owner Tenant   | Object Type | Operation | Result
---+---------------------------+------------+-----------+----------------+------------+----------------+-------------+-----------+---------
46 | October 26, 2020 12:51:46 | 172.17.0.1 | robin     | Administrators | robin      | Administrators | APPLICATION | create    | success
  object_attributes: {'tenant_id': 1, 'object_id': 11, 'jobid': 74, 'object_name': 'app-11', 'user_id': 3}
  details:

--------------------------------------------
1 items, page 1 of 1.
--------------------------------------------

20.8.2. Robin audit logs

Robin supports the audit log feature. The audit logs capture the requests of all event changes, such as creating, updating, and deleting requests within the Robin cluster. The audit logs let you view the user log details in the audit log file.

The audit logs are stored in the robin-user-audit.log file and the file is located in the /home/robinds/var/log/robin/robin-user-audit.log directory.

Note

By default, the audit log feature is disabled.

20.8.2.1. Enable audit logs

You must enable the audit log feature to view the user log details.

Note

As part of enabling the audit log feature, you must run the service robin-server restart command to restart the Robin server for the changes to take effect.

After you enable the audit log feature, all log messages are provided in a JSON format or text format. As an administrator, you can view the audit logs using any text editor software application. You can also collect these logs using any log forwarding tool for processing.

To enable the audit log feature, perform the following steps:

  1. Run the following command to enable the audit log feature:

    # robin config update user_audit log_enable True
    
  2. Run the following command to restart the Robin server for the changes to take effect:

    # service robin-server restart
    

Example

# robin config update user_audit log_enable True
  The 'user_audit' attribute 'log_enable' has been updated

# service robin-server restart
  Redirecting to /bin/systemctl restart robin-server.service

20.8.2.2. Disable audit logs

You can also disable the audit log feature to stop recording the audit log messages in the robin-user-audit.log file.

Note

As part of disabling the audit log feature, you must run the service robin-server restart command to restart the Robin server for the changes to take effect.

To disable the audit log feature, perform the following steps:

  1. Run the following command to disable the audit log feature:

    # robin config update user_audit log_enable False
    
  2. Run the following command to restart the Robin server for the changes to take effect:

    # service robin-server restart
    

Example

# robin config update user_audit log_enable False
  The 'user_audit' attribute 'log_enable' has been updated

# service robin-server restart
  Redirecting to /bin/systemctl restart robin-server.service

20.8.2.3. Configure user audit attributes

You can configure the user audit attributes other than the default configuration.

View user audit attributes

To view the list of the user audit attributes, run the following command:

# robin config list user_audit

User audit attributes and their values

Attribute

Default value

Valid value

enabled

True

True - to enable the user audit feature

False - to disable the user audit feature

log_enable

False

True – to enable the audit log feature

False – to disable the audit log feature

log_file_size

10

The maximum size in megabytes of the audit log file

log_format

JSON

JSON – display output in JSON format

TEXT – display output in TEXT format

log_level

INFO

Log level generates the messages and assigns a severity level to them. The following are the valid values:

INFO – for informational messages

DEBUG – for debug-level messages that contain information for debugging a program

WARNING – for warning messages

ERROR – for error messages

CRITICAL - for critical messages

log_retention

4

The maximum number of audit log files to retain. Any additional log file is rolled over.

Update user audit attributes

To update the user audit attributes, run the following command:

# robin config update user_audit <attribute> <valid value>

Example

Update the log_format attribute of the user audit.

# robin config update user_audit log_format TEXT
  The 'user_audit' attribute 'log_format' has been updated

20.8.2.4. View user audit logs

To view all user log details, run the following command:

# cat /var/log/robin/robin-user-audit.log

Example

All user log details in text format.

# cat /var/log/robin/robin-user-audit.log
1623 | 2021-08-12T15:26:06.581513+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1624 | 2021-08-12T15:26:12.655515+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1625 | 2021-08-12T15:26:12.783629+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1626 | 2021-08-12T15:26:13.118734+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1627 | 2021-08-12T15:26:18.584252+7:00 | 192.0.2.2 | robin | Administrators | -- | -- | USER | login | success | -- | --
1628 | 2021-08-12T15:26:21.752403+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1629 | 2021-08-12T15:26:28.934639+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1630 | 2021-08-12T15:26:36.089382+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1631 | 2021-08-12T15:26:43.233911+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1632 | 2021-08-12T15:26:50.370029+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1633 | 2021-08-12T15:26:57.528168+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1634 | 2021-08-12T15:27:04.749161+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1635 | 2021-08-12T15:27:11.934771+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1636 | 2021-08-12T15:27:19.127729+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1637 | 2021-08-12T15:27:26.291575+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1638 | 2021-08-12T15:27:33.702357+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --
1639 | 2021-08-12T15:27:41.017244+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --

All user log details in JSON format.

# cat /var/log/robin/robin-user-audit.log
{
    "id": 197,
    "timestamp": "2021-08-12T13:56:17.230515+7:00",
    "ip_addr": "192.0.2.2",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "NAMESPACE",
    "operation": "create",
    "result": "success",
    "object_attributes": {
        "object_name": "oc8687pk4i",
        "username": "robin",
        "tenant": "Administrators",
        "import_namespace": false
     },
    "details": {}
}
{
    "id": 198,
    "timestamp": "2021-08-12T13:56:17.748933+7:00",
    "ip_addr": "192.0.2.1",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "USER",
    "operation": "login",
    "result": "success",
    "object_attributes": {},
    "details": {}
}
{
    "id": 199,
    "timestamp": "2021-08-12T13:56:33.766674+7:00",
    "ip_addr": "192.0.2.2",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "NAMESPACE",
    "operation": "delete",
    "result": "success",
    "object_attributes": {},
    "details": {}
}
{
    "id": 200,
    "timestamp": "2021-08-12T13:56:34.290960+7:00",
    "ip_addr": "192.0.2.1",
    "exec_user_id": 3,
    "exec_username": "robin",
    "exec_tenant_id": 1,
    "exec_tenant": "Administrators",
    "owner_user_id": null,
    "owner_username": null,
    "owner_tenant_id": null,
    "owner_tenant": null,
    "object_type": "USER",
    "operation": "login",
    "result": "success",
    "object_attributes": {},
    "details": {}
}

To view the last user log detail, run the following command:

# tail -n 1 /var/log/robin/robin-user-audit.log

Example

The last user log detail in text format.

# tail -n 1 /var/log/robin/robin-user-audit.log
1645 | 2021-08-12T15:28:19.298469+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | --

The last user log detail in JSON format.

# tail -n 1 /var/log/robin/robin-user-audit.log
{
     "id": 1646,
     "timestamp": "2021-08-12T15:31:44.069446+7:00",
     "ip_addr": "192.0.2.2",
     "exec_user_id": 3,
     "exec_username": "robin",
     "exec_tenant_id": 1,
     "exec_tenant": "Administrators",
     "owner_user_id": null,
     "owner_username": null,
     "owner_tenant_id": null,
     "owner_tenant": null,
     "object_type": "CONFIG",
     "operation": "update",
     "result": "success",
     "object_attributes": {
         "section": "user_audit",
         "attribute": "log_format"
     },
     "details": {
         "msg": "The 'user_audit' attribute 'log_format' has been updated"
     }
}
  • Points to consider for the Robin audit logs

    • The active master can generate the log file.

    • The log file is automatically updated by the Robin control plane processes whenever an event occurs.

    • The logs are automatically rotated to ensure that these logs do not consume the whole log partition.