17. Troubleshooting¶
Robin CNS provides a number of native tools and commands for an administrator to utilize in order troubleshoot their Robin cluster and/or report issues. These tools vary in their use case but provide enough information to provide insight as to why as the cluster is not functioning as intended or the reason for unexpected failures. As a result, they should be the go-to utilities when debugging potential issues and their outputs should be sent alongside any bug reports filed to Robin. Each tool has been described in their respective sections below.
Alongside the aforementioned tools for administrators, Robin CNS also provides more granular commands for individual users to track the progress of
their executed operations and determine reasons for their failure. These operations are referred to as jobs
and are identified by a unique ID.
17.1. Listing all jobs¶
Robin stores all jobs that have occurred during a cluster’s lifespan. To view these jobs alongside details such as their start time, state etc. issue the following command:
# robin job list --verbose
--ignoredeps
--noarchived
--nopurged
--states <states>
--failed
--nocolor
--page_size <size>
--page_num <num>
--total
--all
--app <app_name>
--k8sapp <k8sapp_name>
--vnode <vnode_name>
--node <node_name>
--disk <disk_wwn>
--json
|
Show complete job information instead of truncating it for display purposes. |
|
Do not show child jobs |
|
Do not show archived jobs |
|
Do not show purged jobs |
|
Filter jobs based on states. Choose one or more from: active, failed, succeeded, archived, purged |
|
Show only jobs which have failed |
|
Show uncolored output |
|
Number of jobs that should be displayed for each page |
|
Page number to start displaying jobs from (starting index 1) |
|
Return the total number of qualified root jobs |
|
Display all jobs associated with a specific application. Note this option must be used in conjunction with the |
|
Filter jobs based on specified application |
|
Filter jobs based on specified K8s/Helm registered application name |
|
Filter jobs based on specified Vnode name |
|
Filter jobs based on specified physical node name |
|
Filter jobs based on specified disk WWN |
|
Display output in JSON format |
Example:
Output
# robin job list
ID | Type | Description | State | Start | End | User | Message
--------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+---------+--------+------------------------------------------
1013 | ApplicationStart | Starting application 'wp-10' | COMPLETED | 13 Aug 23:28:29 | 0:00:54 | system |
|->1015 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 13 Aug 23:28:30 | 0:00:38 | system |
| |->1017 | VnodeDeploy | Deploying vnode 'wp-10.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:28:30 | 0:00:38 | system |
| | |->1018 | VnodeStop | Stopping vnode wp-10.mysql.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:28:30 | 0:00:15 | system |
|->1016 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system |
| |->1024 | VnodeDeploy | Deploying vnode 'wp-10.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system |
| | |->1025 | VnodeStop | Stopping vnode wp-10.wordpress.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:29:08 | 0:00:07 | system |
1014 | ApplicationStart | ApplicationStart | COMPLETED|FAILED | 13 Aug 23:28:29 | 0:00:00 | system | Another job is running on application 'w
1019 | ApplicationStart | Starting application 'wp-20' | COMPLETED | 13 Aug 23:28:31 | 0:00:51 | system |
|->1020 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 13 Aug 23:28:32 | 0:00:36 | system |
| |->1022 | VnodeDeploy | Deploying vnode 'wp-20.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:28:32 | 0:00:36 | system |
| | |->1023 | VnodeStop | Stopping vnode wp-20.mysql.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:28:32 | 0:00:13 | system |
|->1021 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system |
| |->1026 | VnodeDeploy | Deploying vnode 'wp-20.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system |
| | |->1027 | VnodeStop | Stopping vnode wp-20.wordpress.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:29:08 | 0:00:05 | system |
1028 | JobArchive | Archiving job/s on all hosts | COMPLETED | 14 Aug 00:00:00 | 0:00:02 | system |
|->1029 | AgentJobArchive | Archiving job/s on host cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 00:00:01 | 0:00:00 | system |
1030 | HostProbe | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 07:54:37 | 0:00:01 | system |
1031 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 07:54:37 | 0:00:51 | system |
1032 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 08:11:11 | 0:00:50 | system |
1033 | HostProbe | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 08:11:11 | 0:00:01 | system |
1034 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp. | COMPLETED | 14 Aug 09:24:17 | 0:00:50 | system |
1035 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED|FAILED | 14 Aug 09:25:07 | 0:01:40 | system | Pods do not need to be failed over as Ku
1036 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Ready. Origin: StateChange. | COMPLETED | 14 Aug 09:25:17 | 0:00:01 | system |
1037 | ApplicationDelete | Deleting application 'wp-10' | COMPLETED | 14 Aug 09:41:10 | 0:00:12 | robin |
|->1038 | VnodeDelete | Deleting vnode 'wp-10.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:10 | 0:00:06 | robin |
|->1039 | VnodeDelete | Deleting vnode 'wp-10.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:10 | 0:00:08 | robin |
1040 | ApplicationDelete | Deleting application 'wp-20' | COMPLETED | 14 Aug 09:41:16 | 0:00:13 | robin |
|->1041 | VnodeDelete | Deleting vnode 'wp-20.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:16 | 0:00:10 | robin |
|->1042 | VnodeDelete | Deleting vnode 'wp-20.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:16 | 0:00:09 | robin |
1043 | ApplicationDelete | Deleting application 'wp-30' | COMPLETED | 14 Aug 09:41:20 | 0:00:19 | robin |
|->1044 | VnodeDelete | Deleting vnode 'wp-30.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:20 | 0:00:06 | robin |
|->1045 | VnodeDelete | Deleting vnode 'wp-30.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:20 | 0:00:15 | robin |
1046 | ApplicationCreate | Adding application 'wp-1' | COMPLETED | 14 Aug 09:42:58 | 0:00:58 | robin |
|->1047 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:43:00 | 0:00:42 | robin |
| |->1049 | VnodeAdd | Adding vnode 'wp-1.mysql.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:43:00 | 0:00:42 | robin |
|->1048 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:43:42 | 0:00:14 | robin |
| |->1053 | VnodeAdd | Adding vnode 'wp-1.wordpress.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:43:42 | 0:00:14 | robin |
1050 | ApplicationCreate | Adding application 'wp-2' | COMPLETED | 14 Aug 09:43:39 | 0:00:46 | robin |
|->1051 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:43:42 | 0:00:34 | robin |
| |->1054 | VnodeAdd | Adding vnode 'wp-2.mysql.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:43:42 | 0:00:34 | robin |
|->1052 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:44:16 | 0:00:09 | robin |
| |->1055 | VnodeAdd | Adding vnode 'wp-2.wordpress.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:16 | 0:00:09 | robin |
1056 | ApplicationCreate | Adding application 'wp-3' | COMPLETED | 14 Aug 09:44:18 | 0:00:57 | robin |
|->1057 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:44:20 | 0:00:41 | robin |
| |->1059 | VnodeAdd | Adding vnode 'wp-3.mysql.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:20 | 0:00:41 | robin |
|->1058 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:45:01 | 0:00:13 | robin |
| |->1067 | VnodeAdd | Adding vnode 'wp-3.wordpress.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:45:02 | 0:00:12 | robin |
1060 | ApplicationDelete | Deleting application 'wp-1' | COMPLETED | 14 Aug 09:44:53 | 0:00:17 | robin |
|->1061 | VnodeDelete | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:53 | 0:00:05 | robin |
|->1062 | VnodeDelete | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:53 | 0:00:13 | robin |
1063 | ApplicationDelete | Deleting application 'wp-2' | COMPLETED | 14 Aug 09:44:57 | 0:00:21 | robin |
|->1064 | VnodeDelete | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:57 | 0:00:09 | robin |
|->1065 | VnodeDelete | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:57 | 0:00:18 | robin |
1066 | ApplicationDelete | ApplicationDelete | COMPLETED|FAILED | 14 Aug 09:45:01 | 0:00:00 | robin | Another job is running on application 'w
1068 | ApplicationProbe | Probing application 'wp-3' | COMPLETED | 14 Aug 09:45:12 | 0:00:00 | robin |
1069 | ApplicationDelete | Deleting application 'wp-3' | COMPLETED | 14 Aug 09:45:16 | 0:00:12 | robin |
|->1070 | VnodeDelete | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:45:16 | 0:00:05 | robin |
|->1071 | VnodeDelete | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:45:16 | 0:00:09 | robin |
1072 | ApplicationCreate | Adding application 'wp-1' | COMPLETED | 14 Aug 09:47:03 | 0:00:45 | robin |
|->1074 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:47:39 | 0:00:08 | robin |
| |->1076 | VnodeAdd | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:47:39 | 0:00:08 | robin |
|->1073 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:47:05 | 0:00:34 | robin |
| |->1075 | VnodeAdd | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:47:05 | 0:00:34 | robin |
1077 | ApplicationCreate | Adding application 'wp-2' | COMPLETED | 14 Aug 09:47:43 | 0:00:44 | robin |
|->1079 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:48:18 | 0:00:09 | robin |
| |->1081 | VnodeAdd | Adding vnode 'wp-2.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:48:18 | 0:00:09 | robin |
|->1078 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:47:45 | 0:00:33 | robin |
| |->1080 | VnodeAdd | Adding vnode 'wp-2.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:47:45 | 0:00:33 | robin |
1082 | ApplicationCreate | Adding application 'wp-3' | COMPLETED | 14 Aug 09:49:14 | 0:03:12 | robin |
|->1083 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:49:16 | 0:02:49 | robin |
| |->1085 | VnodeAdd | Adding vnode 'wp-3.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:49:16 | 0:02:49 | robin |
|->1084 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:52:05 | 0:00:20 | robin |
| |->1086 | VnodeAdd | Adding vnode 'wp-3.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:52:05 | 0:00:20 | robin |
1087 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown. | COMPLETED | 14 Aug 09:53:43 | 0:00:52 | system |
1088 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:54:35 | 0:00:01 | system |
1089 | ApplicationStart | Starting application 'wp-3' | COMPLETED | 14 Aug 09:54:38 | 0:03:41 | system |
|->1092 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 09:54:38 | 0:01:53 | system |
| |->1094 | VnodeDeploy | Deploying vnode 'wp-3.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:54:38 | 0:01:53 | system |
|->1093 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 09:56:31 | 0:01:48 | system |
| |->1102 | VnodeDeploy | Deploying vnode 'wp-3.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:56:31 | 0:01:48 | system |
1090 | ApplicationStart | Starting application 'wp-1' | COMPLETED | 14 Aug 09:54:38 | 0:03:44 | system |
|->1098 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 09:54:39 | 0:01:51 | system |
| |->1100 | VnodeDeploy | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:54:39 | 0:01:51 | system |
|->1099 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 09:56:30 | 0:01:52 | system |
| |->1101 | VnodeDeploy | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:56:30 | 0:01:52 | system |
1091 | ApplicationStart | Starting application 'wp-2' | COMPLETED | 14 Aug 09:54:38 | 0:03:44 | system |
|->1095 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 09:54:39 | 0:01:52 | system |
| |->1097 | VnodeDeploy | Deploying vnode 'wp-2.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:54:39 | 0:01:52 | system |
|->1096 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 09:56:31 | 0:01:51 | system |
| |->1103 | VnodeDeploy | Deploying vnode 'wp-2.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:56:32 | 0:01:50 | system |
1104 | ApplicationDelete | Deleting application 'wp-1' | COMPLETED | 14 Aug 10:18:34 | 0:00:15 | robin |
|->1105 | VnodeDelete | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:34 | 0:00:06 | robin |
|->1106 | VnodeDelete | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:34 | 0:00:11 | robin |
1107 | ApplicationDelete | Deleting application 'wp-2' | COMPLETED | 14 Aug 10:18:38 | 0:00:14 | robin |
|->1108 | VnodeDelete | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:38 | 0:00:06 | robin |
|->1109 | VnodeDelete | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:38 | 0:00:08 | robin |
1110 | ApplicationDelete | Deleting application 'wp-3' | COMPLETED | 14 Aug 10:18:43 | 0:00:15 | robin |
|->1111 | VnodeDelete | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:43 | 0:00:12 | robin |
|->1112 | VnodeDelete | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:43 | 0:00:13 | robin |
1113 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp. | COMPLETED | 14 Aug 10:20:02 | 0:00:50 | system |
1114 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED|FAILED | 14 Aug 10:20:52 | 0:01:40 | system | Pods do not need to be failed over as Ku
1115 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'} | COMPLETED | 14 Aug 10:22:17 | 0:00:00 | system |
1116 | HostProbe | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'} | COMPLETED | 14 Aug 10:22:47 | 0:00:00 | system |
1117 | HostProbe | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Ready. Origin: StateChange. | COMPLETED | 14 Aug 10:22:59 | 0:00:00 | system |
1118 | ApplicationCreate | Adding application 'wp-1' | COMPLETED | 14 Aug 10:40:21 | 0:01:05 | robin |
|->1119 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:40:24 | 0:00:41 | robin |
| |->1121 | VnodeAdd | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:40:24 | 0:00:41 | robin |
|->1120 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:41:05 | 0:00:21 | robin |
| |->1122 | VnodeAdd | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:41:05 | 0:00:21 | robin |
1123 | ApplicationCreate | Adding application 'wp-2-no-aff' | COMPLETED | 14 Aug 10:45:45 | 0:00:57 | robin |
|->1124 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:45:48 | 0:00:41 | robin |
| |->1126 | VnodeAdd | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:45:48 | 0:00:41 | robin |
|->1125 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:46:29 | 0:00:13 | robin |
| |->1127 | VnodeAdd | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:29 | 0:00:13 | robin |
1128 | ApplicationCreate | Adding application 'wp-3-no-aff' | COMPLETED | 14 Aug 10:46:33 | 0:00:39 | robin |
|->1129 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:46:35 | 0:00:28 | robin |
| |->1131 | VnodeAdd | Adding vnode 'wp-3-no-aff.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:35 | 0:00:28 | robin |
|->1130 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:47:03 | 0:00:09 | robin |
| |->1132 | VnodeAdd | Adding vnode 'wp-3-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:47:03 | 0:00:09 | robin |
1133 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown. | COMPLETED | 14 Aug 10:49:36 | 0:00:52 | system |
1134 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:50:28 | 0:00:01 | system |
1135 | ApplicationStart | Starting application 'wp-1' | COMPLETED | 14 Aug 10:50:29 | 0:03:22 | system |
|->1141 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 10:52:16 | 0:01:35 | system |
| |->1143 | VnodeDeploy | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:52:16 | 0:01:35 | system |
|->1140 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 10:50:30 | 0:01:46 | system |
| |->1142 | VnodeDeploy | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:30 | 0:01:46 | system |
1136 | VnodeDeploy | Deploying vnode 'wp-3-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:01:48 | robin |
1137 | VnodeDeploy | Deploying vnode 'wp-3-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:02:04 | robin |
1138 | VnodeDeploy | Deploying vnode 'wp-2-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:02:07 | robin |
1139 | VnodeDeploy | Deploying vnode 'wp-2-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:01:44 | robin |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Returns all jobs that have occurred during a cluster’s lifespan.
End Point: /api/v5/robin_server/jobs
Method: GET
URL Parameters:
sort=[id|-id]
: Utilizing this parameter results in the list of jobs returned being sorted by their id.noarchived=true
: Utilizing this parameter results in archived jobs not being returned.nopurged=true
: Utilizing this parameter results in purged jobs not being returned.failed=true
: Utilizing this parameter results in only failed jobs being returned.parent=true
: Utilizing this parameter results in only parent jobs being returned.page_size=<size>
: Utilizing this parameter results in <size> number of jobs being returned.page_num=<index>
: Utilizing this parameter results in jobs starting from <index> being returned.objtype=[APPLICATION|K8S_APPLICATION|INSTANCE|DISK|NODE]
: Utilizing this parameter results in only jobs for the specified object type being returned.objname=<obj_name>
: Utilizing this parameter results in only jobs for objects with the specified name being returned.all=true
: Utilizing this parameter results in all jobs being returned. Note this option is only valid when an application name is specified.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"page_size":10,
"items":{
"users":[
{
"email":null,
"tenantid":1,
"firstname":"Robin",
"username":"robin",
"id":3,
"lastname":"Systems"
}
],
"jobs":[
{
"jobid":1888,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[1889]",
"endtime":1597456503,
"children":[
{
"jobid":1889,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456498,
"parent_jobid":1888,
"error":0,
"message":"",
"taskrunner":1,
"starttime":1597456497,
"dependson_job_ids":"[]",
"level":"child",
"user_id":1,
"jtype":"CollectionOffline",
"timeout":86400,
"state":10,
"desc":"Taking collection 'file-collection-1597122699552' offline (Force False)"
}
],
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":1,
"starttime":1597456496,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"CollectionOnline",
"timeout":86400,
"state":10,
"desc":"Bringing collection 'file-collection-1597122699552' online"
},
{
"jobid":1887,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[1890]",
"endtime":1597456504,
"children":[
{
"jobid":1890,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456499,
"parent_jobid":1887,
"error":0,
"message":"",
"taskrunner":1,
"starttime":1597456497,
"dependson_job_ids":"[]",
"level":"child",
"user_id":3,
"jtype":"VnodeStop",
"timeout":86400,
"state":10,
"desc":"Stopping vnode test-ds-1.server.01 on cscale-82-140.robinsystems.com"
}
],
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":1,
"starttime":1597456496,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":3,
"jtype":"VnodeDeploy",
"timeout":86400,
"state":10,
"desc":"Deploying vnode 'test-ds-1.server.01'. Origin: Event (cscale-82-140.robinsystems.com)"
},
{
"jobid":1886,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456488,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456487,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"HostProbe",
"timeout":86400,
"state":10,
"desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Ready. Origin: StateChange."
},
{
"jobid":1885,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456476,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456475,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"HostProbe",
"timeout":86400,
"state":10,
"desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Notready. Origin: StateChange.. Services Down: {'iomgr-server'}"
},
{
"jobid":1884,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456470,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456470,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"HostProbe",
"timeout":86400,
"state":10,
"desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/WaitingForMonitor ==> ONLINE\/Notready. Origin: StartingHostWatch.. Services Down: {'iomgr-server'}"
},
{
"jobid":1883,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456520,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456469,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"HostProbe",
"timeout":86400,
"state":10,
"desc":"Probed cscale-82-139.robinsystems.com from UNREACHABLE\/Notready ==> UNREACHABLE\/Notready. Origin: StartingHostWatch."
},
{
"jobid":1882,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456467,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456467,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"DiskNotify",
"timeout":86400,
"state":10,
"desc":"Event on disk '0x60022480940ed076551cfaf75612e24e'"
},
{
"jobid":1881,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456467,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456467,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"DiskNotify",
"timeout":86400,
"state":10,
"desc":"Event on disk '0x60022480ffcf3deb224fb37d78fe7767'"
},
{
"jobid":1880,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456467,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456467,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"DiskNotify",
"timeout":86400,
"state":10,
"desc":"Event on disk '0x600224804c48fd7e16c608dea0919064'"
},
{
"jobid":1879,
"tenant_id":1,
"enabled":true,
"child_job_ids":"[]",
"endtime":1597456467,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":0,
"starttime":1597456467,
"dependson_job_ids":"[]",
"level":"parent",
"user_id":1,
"jtype":"DiskNotify",
"timeout":86400,
"state":10,
"desc":"Event on disk '0x600224803bcdafde95b1f5cd27ceb5fb'"
}
]
},
"total":1542,
"num_items":10,
"page_num":1
}
17.2. Show information about a specific job¶
In order to get more detailed information about a specific job including the state, duration and any errors related to it and any respective child jobs, issue the following command:
# robin job info <id>
--json
|
Job ID |
|
Display output in JSON format |
Example:
# robin job info 1123
ID | Type | Desc | State | Start | End | Duration | Dependson | Error | Message
-----------+-------------------+---------------------------------------------------------------------------+-----------+-----------------+----------+----------+-----------+-------+---------
1123 | ApplicationCreate | Adding application 'wp-2-no-aff' | COMPLETED | 14 Aug 10:45:45 | 10:46:42 | 0:00:57 | [] | 0 |
|->1124 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41 | [] | 0 |
| |->1126 | VnodeAdd | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41 | [] | 0 |
|->1125 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13 | [1124] | 0 |
| |->1127 | VnodeAdd | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13 | [] | 0 |
Returns details about a specific job and any of its respective child jobs.
End Point: /api/v3/robin_server/jobs/<job_id>
Method: GET
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Authorization Error)
Example Response:
Output
{
"tenant_name":"Administrators",
"jobid":1888,
"tenant_id":1,
"enabled":true,
"json":{
"collection_id":1597122699552,
"state":"SuspectedOffline",
"set_failed":true,
"origin":2,
"hostname":"cscale-82-140.robinsystems.com"
},
"user_name":"system",
"endtime":1597456503,
"parent_jobid":0,
"error":0,
"message":"",
"taskrunner":1,
"starttime":1597456496,
"child_job_ids":"[1889]",
"cjobs":[
{
"tenant_name":"Administrators",
"jobid":1889,
"tenant_id":1,
"enabled":true,
"json":{
"collection_id":1597122699552
},
"user_name":"system",
"endtime":1597456498,
"parent_jobid":1888,
"error":0,
"message":"",
"taskrunner":1,
"starttime":1597456497,
"child_job_ids":"[]",
"cjobs":[
],
"dependson_job_ids":"[]",
"user_id":1,
"jtype":"CollectionOffline",
"timeout":86400,
"state":10,
"desc":"Taking collection 'file-collection-1597122699552' offline (Force False)",
"priority":300
}
],
"dependson_job_ids":"[]",
"user_id":1,
"jtype":"CollectionOnline",
"timeout":86400,
"state":10,
"desc":"Bringing collection 'file-collection-1597122699552' online",
"priority":300
}
17.3. Log Collection¶
During any cluster wide failure or unexpected negative scenarios that affect multiple services, logs from all the system components will be needed by Robin in order to debug the issue properly. However sometimes given the scope of the issue, only a subsection of logs need to be collected. This granularity is available but it is highly recommended to always send the complete set of logs when filing a bug report with Robin. Available age-based filtering helps in reducing storage footprint. Robin supports uploading logs to the following destinations:
|
Used to store collected logs in Robin backed storage |
|
Used to store collected logs in NFS. |
|
Used to store collected logs in Amazon S3 |
|
Used to store collected logs in a given remote location |
17.3.1. Storing logs using Robin Storage¶
Logs collected by Robin can be stored on a volume created on the local cluster, with the following command:
# robin log collect robin-storage <rpool>
--nodes <nodes>
--dest-path <dest_path>
--size <size>
--media <media>
--age <age>
|
Name of the resource pool name to use. |
|
Comma separated list of nodes from which to collect. The default is to collect all |
|
Destination path where log files will be copied |
|
Size of the storage volume for the log collect. The default is 250GB |
|
Specify which type of drives to allocate storage from. Choices include: ‘HDD’, ‘SSD’. Default media type is ‘HDD’ |
|
Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. |
Example:
# robin log collect robin-storage default --wait
Job: 123 Name: LogCollect State: PROCESSED Error: 0
Job: 123 Name: LogCollect State: WAITING Error: 0
Job: 123 Name: LogCollect State: COMPLETED Error: 0
17.3.2. Storing logs using NFS¶
Logs collected by Robin can be stored on a NFS share, with the following command:
# robin log collect nfs <nfs_share>
--nodes <nodes>
--age <age>
|
The ‘hostname’ or ‘IP’, ‘export_path’ and ‘dest_path’ for an NFS share in the form of <hostname|IP>:<export_path>:<dest_path>’ |
|
Comma separated list of nodes from which to collect. The default is to collect all |
|
Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. |
Example:
# robin log collect nfs 10.9.82.162:/tmp:/demo_log_collect
Job: 126 Name: LogCollect State: PROCESSED Error: 0
Job: 126 Name: LogCollect State: WAITING Error: 0
Job: 126 Name: LogCollect State: COMPLETED Error: 0
17.3.3. Storing logs using AWS S3¶
Logs collected by Robin can be stored on a NFS share, with the following command:
# robin log collect s3 <url> <aws_config>
--nodes <nodes>
--access_key <access_key>
--secret_key <secret_key>
--age <age>
|
S3 URL in the format https://s3-<region-name>.amazonaws.com/<bucket-name>/<directory> |
|
JSON file containing Access key, Secret Key and Region. Example format {“aws_access_key_id”: <key>, “aws_secret_access_key”: <key>, “region”: <region_name>} |
|
Comma separated list of nodes from which to collect. The default is to collect all |
|
Access Key for the respective user with access to the specified S3 bucket. |
|
Secret Key for the respective user with access to the specified S3 bucket. |
|
Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. |
Example:
# robin log collect s3 https://s3-us-west-2.amazonaws.com/log-collect/demo_log_collect /root/aws.json --wait
Job: 132 Name: LogCollect State: PROCESSED Error: 0
Job: 132 Name: LogCollect State: WAITING Error: 0
Job: 132 Name: LogCollect State: COMPLETED Error: 0
17.3.4. Storing logs in a remote location¶
Logs collected by Robin can be stored on a NFS share, with the following command:
# robin log collect ssh <dest>
--nodes <nodes>
--password <password>
--age <age>
|
Destination path where the log files will be copied to. The path should be in the form of ‘<user>@<hostname|IP>:<path>’ |
|
Comma separated list of nodes from which to collect. The default is to collect all |
|
Provide a password on the command line instead of via a prompt |
|
Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. |
Example:
# robin log collect ssh root@10.9.82.163:/demo_log_collect --password robin123
Job: 129 Name: LogCollect State: PROCESSED Error: 0
Job: 129 Name: LogCollect State: WAITING Error: 0
Job: 129 Name: LogCollect State: COMPLETED Error: 0
17.4. Retrieving Job Logs¶
Robin provides a utility which collects all the appropriate logs from the necessary nodes for a particular job and its consequent hierarchy. It stores these logs within a single tarball that can be provided to Robin alongside a bug report. In addition this useful for an Administrator to debug as to why a job failed unexpectedly. This functionality is extremely convienent as it automates the process of the user logging into every affected node and collecting/inspecting the relevant log files. Issue the following command to retrieve logs for a specific job:
# robin job get <id>
|
ID of job to collect the logs for |
Example:
# robin job get 1
Retrieving log files...
Log files for Job ids: [1] are retrieved successfully at 1582189081.tar.gz