****************************** Troubleshooting ****************************** Robin Platform provides a number of native tools and commands for an administrator to utilize in order troubleshoot their Robin cluster and/or report issues. These tools vary in their use case but provide enough information to provide insight as to why as the cluster is not functioning as intended or the reason for unexpected failures. As a result, they should be the go-to utilities when debugging potential issues and their outputs should be sent alongside any bug reports filed to Robin. Each tool has been described in their respective sections below. Alongside the aforementioned tools for administrators, Robin Platform also provides more granular commands for individual users to track the progress of their executed operations and determine reasons for their failure. These operations are referred to as ``jobs`` and are identified by a unique ID. Robin jobs are the operations executed during a cluster’s lifespan. Each job has a unique ID. The job log contains all information about the job such as job ID, job type, description, and so on. Robin stores the job logs in its database. As an administrator, you can view the job logs and troubleshoot your cluster using these job logs. Robin recommends that you provide the complete job logs when reporting issues to Robin for debugging purposes. The Robin job logs are stored in the following directory: * ``/var/log/robin/server`` is present only in the Robin master nodes. * ``/var/log/robin/agent`` is present in all Robin nodes. You can also access the job logs from the host in the following directories: * ``/home/robinds/var/log/robin/server`` is present only in the Robin master nodes. * ``/home/robinds/var/log/robin/agent`` is present in all Robin nodes. Listing all jobs ================= .. tabs:: .. tab:: CLI Robin stores all jobs that have occurred during a cluster's lifespan. To view these jobs alongside details such as their start time, state etc. issue the following command: .. code-block:: text # robin job list --verbose --ignoredeps --noarchived --nopurged --states --failed --nocolor --page_size --page_num --total --all --app --k8sapp --vnode --node --disk ========================== ======================================================================================================================================== ``--verbose`` Show complete job information instead of truncating it for display purposes. ``--ignoredeps`` Do not show child jobs ``--noarchived`` Do not show archived jobs ``--nopurged`` Do not show purged jobs ``--states `` Filter jobs based on states. Choose one or more from: active, failed, succeeded, archived, purged ``--failed`` Show only jobs which have failed ``--nocolor`` Show uncolored output ``--page_size `` Number of jobs that should be displayed for each page ``--page_num `` Page number to start displaying jobs from (starting index 1) ``--total`` Return the total number of qualified root jobs ``--all`` Display all jobs associated with a specific application. Note this option must be used in conjunction with the ``--app`` option ``--app `` Filter jobs based on specified application ``--k8sapp `` Filter jobs based on specified K8s/Helm registered application name ``--vnode `` Filter jobs based on specified Vnode name ``--node `` Filter jobs based on specified physical node name ``--disk `` Filter jobs based on specified disk WWN ========================== ======================================================================================================================================== **Example:** .. raw:: html
Output .. code-block:: text # robin job list ID | Type | Description | State | Start | End | User | Message --------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------+------------------+-----------------+---------+--------+------------------------------------------ 1013 | ApplicationStart | Starting application 'wp-10' | COMPLETED | 13 Aug 23:28:29 | 0:00:54 | system | |->1015 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 13 Aug 23:28:30 | 0:00:38 | system | | |->1017 | VnodeDeploy | Deploying vnode 'wp-10.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:28:30 | 0:00:38 | system | | | |->1018 | VnodeStop | Stopping vnode wp-10.mysql.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:28:30 | 0:00:15 | system | |->1016 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system | | |->1024 | VnodeDeploy | Deploying vnode 'wp-10.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system | | | |->1025 | VnodeStop | Stopping vnode wp-10.wordpress.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:29:08 | 0:00:07 | system | 1014 | ApplicationStart | ApplicationStart | COMPLETED|FAILED | 13 Aug 23:28:29 | 0:00:00 | system | Another job is running on application 'w 1019 | ApplicationStart | Starting application 'wp-20' | COMPLETED | 13 Aug 23:28:31 | 0:00:51 | system | |->1020 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 13 Aug 23:28:32 | 0:00:36 | system | | |->1022 | VnodeDeploy | Deploying vnode 'wp-20.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:28:32 | 0:00:36 | system | | | |->1023 | VnodeStop | Stopping vnode wp-20.mysql.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:28:32 | 0:00:13 | system | |->1021 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system | | |->1026 | VnodeDeploy | Deploying vnode 'wp-20.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 13 Aug 23:29:08 | 0:00:14 | system | | | |->1027 | VnodeStop | Stopping vnode wp-20.wordpress.01 on cscale-82-140.robinsystems.com | COMPLETED | 13 Aug 23:29:08 | 0:00:05 | system | 1028 | JobArchive | Archiving job/s on all hosts | COMPLETED | 14 Aug 00:00:00 | 0:00:02 | system | |->1029 | AgentJobArchive | Archiving job/s on host cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 00:00:01 | 0:00:00 | system | 1030 | HostProbe | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 07:54:37 | 0:00:01 | system | 1031 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 07:54:37 | 0:00:51 | system | 1032 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> UNREACHABLE/Notready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 08:11:11 | 0:00:50 | system | 1033 | HostProbe | Probed cscale-82-140.robinsystems.com from ONLINE/Ready ==> ONLINE/Ready. Origin: StartingHostWatch. | COMPLETED | 14 Aug 08:11:11 | 0:00:01 | system | 1034 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp. | COMPLETED | 14 Aug 09:24:17 | 0:00:50 | system | 1035 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED|FAILED | 14 Aug 09:25:07 | 0:01:40 | system | Pods do not need to be failed over as Ku 1036 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Ready. Origin: StateChange. | COMPLETED | 14 Aug 09:25:17 | 0:00:01 | system | 1037 | ApplicationDelete | Deleting application 'wp-10' | COMPLETED | 14 Aug 09:41:10 | 0:00:12 | robin | |->1038 | VnodeDelete | Deleting vnode 'wp-10.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:10 | 0:00:06 | robin | |->1039 | VnodeDelete | Deleting vnode 'wp-10.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:10 | 0:00:08 | robin | 1040 | ApplicationDelete | Deleting application 'wp-20' | COMPLETED | 14 Aug 09:41:16 | 0:00:13 | robin | |->1041 | VnodeDelete | Deleting vnode 'wp-20.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:16 | 0:00:10 | robin | |->1042 | VnodeDelete | Deleting vnode 'wp-20.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:16 | 0:00:09 | robin | 1043 | ApplicationDelete | Deleting application 'wp-30' | COMPLETED | 14 Aug 09:41:20 | 0:00:19 | robin | |->1044 | VnodeDelete | Deleting vnode 'wp-30.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:20 | 0:00:06 | robin | |->1045 | VnodeDelete | Deleting vnode 'wp-30.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:41:20 | 0:00:15 | robin | 1046 | ApplicationCreate | Adding application 'wp-1' | COMPLETED | 14 Aug 09:42:58 | 0:00:58 | robin | |->1047 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:43:00 | 0:00:42 | robin | | |->1049 | VnodeAdd | Adding vnode 'wp-1.mysql.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:43:00 | 0:00:42 | robin | |->1048 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:43:42 | 0:00:14 | robin | | |->1053 | VnodeAdd | Adding vnode 'wp-1.wordpress.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:43:42 | 0:00:14 | robin | 1050 | ApplicationCreate | Adding application 'wp-2' | COMPLETED | 14 Aug 09:43:39 | 0:00:46 | robin | |->1051 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:43:42 | 0:00:34 | robin | | |->1054 | VnodeAdd | Adding vnode 'wp-2.mysql.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:43:42 | 0:00:34 | robin | |->1052 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:44:16 | 0:00:09 | robin | | |->1055 | VnodeAdd | Adding vnode 'wp-2.wordpress.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:16 | 0:00:09 | robin | 1056 | ApplicationCreate | Adding application 'wp-3' | COMPLETED | 14 Aug 09:44:18 | 0:00:57 | robin | |->1057 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:44:20 | 0:00:41 | robin | | |->1059 | VnodeAdd | Adding vnode 'wp-3.mysql.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:20 | 0:00:41 | robin | |->1058 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:45:01 | 0:00:13 | robin | | |->1067 | VnodeAdd | Adding vnode 'wp-3.wordpress.01' on cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:45:02 | 0:00:12 | robin | 1060 | ApplicationDelete | Deleting application 'wp-1' | COMPLETED | 14 Aug 09:44:53 | 0:00:17 | robin | |->1061 | VnodeDelete | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:53 | 0:00:05 | robin | |->1062 | VnodeDelete | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:53 | 0:00:13 | robin | 1063 | ApplicationDelete | Deleting application 'wp-2' | COMPLETED | 14 Aug 09:44:57 | 0:00:21 | robin | |->1064 | VnodeDelete | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:57 | 0:00:09 | robin | |->1065 | VnodeDelete | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:44:57 | 0:00:18 | robin | 1066 | ApplicationDelete | ApplicationDelete | COMPLETED|FAILED | 14 Aug 09:45:01 | 0:00:00 | robin | Another job is running on application 'w 1068 | ApplicationProbe | Probing application 'wp-3' | COMPLETED | 14 Aug 09:45:12 | 0:00:00 | robin | 1069 | ApplicationDelete | Deleting application 'wp-3' | COMPLETED | 14 Aug 09:45:16 | 0:00:12 | robin | |->1070 | VnodeDelete | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:45:16 | 0:00:05 | robin | |->1071 | VnodeDelete | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 09:45:16 | 0:00:09 | robin | 1072 | ApplicationCreate | Adding application 'wp-1' | COMPLETED | 14 Aug 09:47:03 | 0:00:45 | robin | |->1074 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:47:39 | 0:00:08 | robin | | |->1076 | VnodeAdd | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:47:39 | 0:00:08 | robin | |->1073 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:47:05 | 0:00:34 | robin | | |->1075 | VnodeAdd | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:47:05 | 0:00:34 | robin | 1077 | ApplicationCreate | Adding application 'wp-2' | COMPLETED | 14 Aug 09:47:43 | 0:00:44 | robin | |->1079 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:48:18 | 0:00:09 | robin | | |->1081 | VnodeAdd | Adding vnode 'wp-2.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:48:18 | 0:00:09 | robin | |->1078 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:47:45 | 0:00:33 | robin | | |->1080 | VnodeAdd | Adding vnode 'wp-2.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:47:45 | 0:00:33 | robin | 1082 | ApplicationCreate | Adding application 'wp-3' | COMPLETED | 14 Aug 09:49:14 | 0:03:12 | robin | |->1083 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 09:49:16 | 0:02:49 | robin | | |->1085 | VnodeAdd | Adding vnode 'wp-3.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:49:16 | 0:02:49 | robin | |->1084 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 09:52:05 | 0:00:20 | robin | | |->1086 | VnodeAdd | Adding vnode 'wp-3.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:52:05 | 0:00:20 | robin | 1087 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown. | COMPLETED | 14 Aug 09:53:43 | 0:00:52 | system | 1088 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 09:54:35 | 0:00:01 | system | 1089 | ApplicationStart | Starting application 'wp-3' | COMPLETED | 14 Aug 09:54:38 | 0:03:41 | system | |->1092 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 09:54:38 | 0:01:53 | system | | |->1094 | VnodeDeploy | Deploying vnode 'wp-3.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:54:38 | 0:01:53 | system | |->1093 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 09:56:31 | 0:01:48 | system | | |->1102 | VnodeDeploy | Deploying vnode 'wp-3.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:56:31 | 0:01:48 | system | 1090 | ApplicationStart | Starting application 'wp-1' | COMPLETED | 14 Aug 09:54:38 | 0:03:44 | system | |->1098 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 09:54:39 | 0:01:51 | system | | |->1100 | VnodeDeploy | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:54:39 | 0:01:51 | system | |->1099 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 09:56:30 | 0:01:52 | system | | |->1101 | VnodeDeploy | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:56:30 | 0:01:52 | system | 1091 | ApplicationStart | Starting application 'wp-2' | COMPLETED | 14 Aug 09:54:38 | 0:03:44 | system | |->1095 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 09:54:39 | 0:01:52 | system | | |->1097 | VnodeDeploy | Deploying vnode 'wp-2.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:54:39 | 0:01:52 | system | |->1096 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 09:56:31 | 0:01:51 | system | | |->1103 | VnodeDeploy | Deploying vnode 'wp-2.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 09:56:32 | 0:01:50 | system | 1104 | ApplicationDelete | Deleting application 'wp-1' | COMPLETED | 14 Aug 10:18:34 | 0:00:15 | robin | |->1105 | VnodeDelete | Deleting vnode 'wp-1.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:34 | 0:00:06 | robin | |->1106 | VnodeDelete | Deleting vnode 'wp-1.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:34 | 0:00:11 | robin | 1107 | ApplicationDelete | Deleting application 'wp-2' | COMPLETED | 14 Aug 10:18:38 | 0:00:14 | robin | |->1108 | VnodeDelete | Deleting vnode 'wp-2.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:38 | 0:00:06 | robin | |->1109 | VnodeDelete | Deleting vnode 'wp-2.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:38 | 0:00:08 | robin | 1110 | ApplicationDelete | Deleting application 'wp-3' | COMPLETED | 14 Aug 10:18:43 | 0:00:15 | robin | |->1111 | VnodeDelete | Deleting vnode 'wp-3.wordpress.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:43 | 0:00:12 | robin | |->1112 | VnodeDelete | Deleting vnode 'wp-3.mysql.01' from cscale-82-140.robinsystems.com | COMPLETED | 14 Aug 10:18:43 | 0:00:13 | robin | 1113 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeUp. | COMPLETED | 14 Aug 10:20:02 | 0:00:50 | system | 1114 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED|FAILED | 14 Aug 10:20:52 | 0:01:40 | system | Pods do not need to be failed over as Ku 1115 | HostProbe | Probed cscale-82-139.robinsystems.com from UNREACHABLE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'} | COMPLETED | 14 Aug 10:22:17 | 0:00:00 | system | 1116 | HostProbe | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Notready. Origin: StateChange.. Services Down: {'iomgr-server'} | COMPLETED | 14 Aug 10:22:47 | 0:00:00 | system | 1117 | HostProbe | Probed cscale-82-139.robinsystems.com from ONLINE/Notready ==> ONLINE/Ready. Origin: StateChange. | COMPLETED | 14 Aug 10:22:59 | 0:00:00 | system | 1118 | ApplicationCreate | Adding application 'wp-1' | COMPLETED | 14 Aug 10:40:21 | 0:01:05 | robin | |->1119 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:40:24 | 0:00:41 | robin | | |->1121 | VnodeAdd | Adding vnode 'wp-1.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:40:24 | 0:00:41 | robin | |->1120 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:41:05 | 0:00:21 | robin | | |->1122 | VnodeAdd | Adding vnode 'wp-1.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:41:05 | 0:00:21 | robin | 1123 | ApplicationCreate | Adding application 'wp-2-no-aff' | COMPLETED | 14 Aug 10:45:45 | 0:00:57 | robin | |->1124 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:45:48 | 0:00:41 | robin | | |->1126 | VnodeAdd | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:45:48 | 0:00:41 | robin | |->1125 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:46:29 | 0:00:13 | robin | | |->1127 | VnodeAdd | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:29 | 0:00:13 | robin | 1128 | ApplicationCreate | Adding application 'wp-3-no-aff' | COMPLETED | 14 Aug 10:46:33 | 0:00:39 | robin | |->1129 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:46:35 | 0:00:28 | robin | | |->1131 | VnodeAdd | Adding vnode 'wp-3-no-aff.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:35 | 0:00:28 | robin | |->1130 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:47:03 | 0:00:09 | robin | | |->1132 | VnodeAdd | Adding vnode 'wp-3-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:47:03 | 0:00:09 | robin | 1133 | HostProbe | Probed cscale-82-139.robinsystems.com from PROBE_PENDING/Notready ==> UNREACHABLE/Notready. Origin: NodeDown. | COMPLETED | 14 Aug 10:49:36 | 0:00:52 | system | 1134 | HostFailoverPods | Failing over pods on host cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:50:28 | 0:00:01 | system | 1135 | ApplicationStart | Starting application 'wp-1' | COMPLETED | 14 Aug 10:50:29 | 0:03:22 | system | |->1141 | RoleStart | Starting instances for role 'wordpress' | COMPLETED | 14 Aug 10:52:16 | 0:01:35 | system | | |->1143 | VnodeDeploy | Deploying vnode 'wp-1.wordpress.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:52:16 | 0:01:35 | system | |->1140 | RoleStart | Starting instances for role 'mysql' | COMPLETED | 14 Aug 10:50:30 | 0:01:46 | system | | |->1142 | VnodeDeploy | Deploying vnode 'wp-1.mysql.01'. Origin: REST (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:30 | 0:01:46 | system | 1136 | VnodeDeploy | Deploying vnode 'wp-3-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:01:48 | robin | 1137 | VnodeDeploy | Deploying vnode 'wp-3-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:02:04 | robin | 1138 | VnodeDeploy | Deploying vnode 'wp-2-no-aff.mysql.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:02:07 | robin | 1139 | VnodeDeploy | Deploying vnode 'wp-2-no-aff.wordpress.01'. Origin: Event (cscale-82-140.robinsystems.com) | COMPLETED | 14 Aug 10:50:29 | 0:01:44 | robin | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- .. raw:: html
.. tab:: API Returns all jobs that have occurred during a cluster's lifespan. **End Point:** /api/v5/robin_server/jobs **Method:** GET **URL Parameters:** - ``sort=[id|-id]`` : Utilizing this parameter results in the list of jobs returned being sorted by their id. - ``noarchived=true`` : Utilizing this parameter results in archived jobs not being returned. - ``nopurged=true`` : Utilizing this parameter results in purged jobs not being returned. - ``failed=true`` : Utilizing this parameter results in only failed jobs being returned. - ``parent=true`` : Utilizing this parameter results in only parent jobs being returned. - ``page_size=`` : Utilizing this parameter results in number of jobs being returned. - ``page_num=`` : Utilizing this parameter results in jobs starting from being returned. - ``objtype=[APPLICATION|K8S_APPLICATION|INSTANCE|DISK|NODE]`` : Utilizing this parameter results in only jobs for the specified object type being returned. - ``objname=`` : Utilizing this parameter results in only jobs for objects with the specified name being returned. - ``all=true`` : Utilizing this parameter results in all jobs being returned. Note this option is only valid when an application name is specified. **Data Parameters:** None **Port:** RCM Port (default value is 29442) **Headers:** - ``Authorization: `` : Authorization token to identify which user is sending the request. The token can be acquired from the login API. **Success Response Code:** 200 **Error Response Code:** 500 (Internal Server Error) **Example Response:** .. raw:: html
Output .. code-block:: text { "page_size":10, "items":{ "users":[ { "email":null, "tenantid":1, "firstname":"Robin", "username":"robin", "id":3, "lastname":"Systems" } ], "jobs":[ { "jobid":1888, "tenant_id":1, "enabled":true, "child_job_ids":"[1889]", "endtime":1597456503, "children":[ { "jobid":1889, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456498, "parent_jobid":1888, "error":0, "message":"", "taskrunner":1, "starttime":1597456497, "dependson_job_ids":"[]", "level":"child", "user_id":1, "jtype":"CollectionOffline", "timeout":86400, "state":10, "desc":"Taking collection 'file-collection-1597122699552' offline (Force False)" } ], "parent_jobid":0, "error":0, "message":"", "taskrunner":1, "starttime":1597456496, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"CollectionOnline", "timeout":86400, "state":10, "desc":"Bringing collection 'file-collection-1597122699552' online" }, { "jobid":1887, "tenant_id":1, "enabled":true, "child_job_ids":"[1890]", "endtime":1597456504, "children":[ { "jobid":1890, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456499, "parent_jobid":1887, "error":0, "message":"", "taskrunner":1, "starttime":1597456497, "dependson_job_ids":"[]", "level":"child", "user_id":3, "jtype":"VnodeStop", "timeout":86400, "state":10, "desc":"Stopping vnode test-ds-1.server.01 on cscale-82-140.robinsystems.com" } ], "parent_jobid":0, "error":0, "message":"", "taskrunner":1, "starttime":1597456496, "dependson_job_ids":"[]", "level":"parent", "user_id":3, "jtype":"VnodeDeploy", "timeout":86400, "state":10, "desc":"Deploying vnode 'test-ds-1.server.01'. Origin: Event (cscale-82-140.robinsystems.com)" }, { "jobid":1886, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456488, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456487, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"HostProbe", "timeout":86400, "state":10, "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Ready. Origin: StateChange." }, { "jobid":1885, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456476, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456475, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"HostProbe", "timeout":86400, "state":10, "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/Notready ==> ONLINE\/Notready. Origin: StateChange.. Services Down: {'iomgr-server'}" }, { "jobid":1884, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456470, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456470, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"HostProbe", "timeout":86400, "state":10, "desc":"Probed cscale-82-140.robinsystems.com from ONLINE\/WaitingForMonitor ==> ONLINE\/Notready. Origin: StartingHostWatch.. Services Down: {'iomgr-server'}" }, { "jobid":1883, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456520, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456469, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"HostProbe", "timeout":86400, "state":10, "desc":"Probed cscale-82-139.robinsystems.com from UNREACHABLE\/Notready ==> UNREACHABLE\/Notready. Origin: StartingHostWatch." }, { "jobid":1882, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456467, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456467, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"DiskNotify", "timeout":86400, "state":10, "desc":"Event on disk '0x60022480940ed076551cfaf75612e24e'" }, { "jobid":1881, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456467, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456467, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"DiskNotify", "timeout":86400, "state":10, "desc":"Event on disk '0x60022480ffcf3deb224fb37d78fe7767'" }, { "jobid":1880, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456467, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456467, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"DiskNotify", "timeout":86400, "state":10, "desc":"Event on disk '0x600224804c48fd7e16c608dea0919064'" }, { "jobid":1879, "tenant_id":1, "enabled":true, "child_job_ids":"[]", "endtime":1597456467, "parent_jobid":0, "error":0, "message":"", "taskrunner":0, "starttime":1597456467, "dependson_job_ids":"[]", "level":"parent", "user_id":1, "jtype":"DiskNotify", "timeout":86400, "state":10, "desc":"Event on disk '0x600224803bcdafde95b1f5cd27ceb5fb'" } ] }, "total":1542, "num_items":10, "page_num":1 } .. raw:: html
Show information about a specific job ====================================== .. tabs:: .. tab:: CLI In order to get more detailed information about a specific job including the state, duration and any errors related to it and any respective child jobs, issue the following command: .. code-block:: text # robin job info ====================== =========================================================================================== ``id`` Job ID ====================== =========================================================================================== **Example:** .. code-block:: text # robin job info 1123 ID | Type | Desc | State | Start | End | Duration | Dependson | Error | Message -----------+-------------------+---------------------------------------------------------------------------+-----------+-----------------+----------+----------+-----------+-------+--------- 1123 | ApplicationCreate | Adding application 'wp-2-no-aff' | COMPLETED | 14 Aug 10:45:45 | 10:46:42 | 0:00:57 | [] | 0 | |->1124 | RoleCreate | Provisioning containers for role 'mysql' | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41 | [] | 0 | | |->1126 | VnodeAdd | Adding vnode 'wp-2-no-aff.mysql.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:45:48 | 10:46:29 | 0:00:41 | [] | 0 | |->1125 | RoleCreate | Provisioning containers for role 'wordpress' | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13 | [1124] | 0 | | |->1127 | VnodeAdd | Adding vnode 'wp-2-no-aff.wordpress.01' on cscale-82-139.robinsystems.com | COMPLETED | 14 Aug 10:46:29 | 10:46:42 | 0:00:13 | [] | 0 | .. tab:: API Returns details about a specific job and any of its respective child jobs. **End Point:** /api/v3/robin_server/jobs/ **Method:** GET **URL Parameters:** None **Data Parameters:** None **Port:** RCM Port (default value is 29442) **Headers:** - ``Authorization: `` : Authorization token to identify which user is sending the request. The token can be acquired from the login API. **Success Response Code:** 200 **Error Response Code:** 500 (Internal Server Error), 404 (Not Found Error), 401 (Authorization Error) **Example Response:** .. raw:: html
Output .. code-block:: text { "tenant_name":"Administrators", "jobid":1888, "tenant_id":1, "enabled":true, "json":{ "collection_id":1597122699552, "state":"SuspectedOffline", "set_failed":true, "origin":2, "hostname":"cscale-82-140.robinsystems.com" }, "user_name":"system", "endtime":1597456503, "parent_jobid":0, "error":0, "message":"", "taskrunner":1, "starttime":1597456496, "child_job_ids":"[1889]", "cjobs":[ { "tenant_name":"Administrators", "jobid":1889, "tenant_id":1, "enabled":true, "json":{ "collection_id":1597122699552 }, "user_name":"system", "endtime":1597456498, "parent_jobid":1888, "error":0, "message":"", "taskrunner":1, "starttime":1597456497, "child_job_ids":"[]", "cjobs":[ ], "dependson_job_ids":"[]", "user_id":1, "jtype":"CollectionOffline", "timeout":86400, "state":10, "desc":"Taking collection 'file-collection-1597122699552' offline (Force False)", "priority":300 } ], "dependson_job_ids":"[]", "user_id":1, "jtype":"CollectionOnline", "timeout":86400, "state":10, "desc":"Bringing collection 'file-collection-1597122699552' online", "priority":300 } .. raw:: html
Archive Robin Job Logs ====================== You can archive the Robin job logs to prevent the data loss, to improve the security, and to increase free space in the parent directory. The archive is the process of moving all completed job logs to the archived sub-directory of the parent directory for long periods. The archived sub-directories are the part of ``/var/log/robin/server`` and ``var/log/robin/agent`` directories. Robin archives the job logs that were completed successfully before 24 hours. The failed job logs remain in the parent directories for analysis purposes. .. tabs:: .. tab:: CLI To archive the Robin job logs, run the following command: .. code-block:: text # robin job archive --age --include-failed ========================== ========================================= ``--age`` Age of the job logs (in mins) to archive. ``--include-failed`` Include the failed job logs. ========================== ========================================= **Example:** .. code-block:: text # robin job archive --age 600 --wait Job: 255170 Name: JobArchive State: PROCESSED Error: 0 Job: 255170 Name: JobArchive State: PREPARED Error: 0 Job: 255170 Name: JobArchive State: WAITING Error: 0 Job: 255170 Name: JobArchive State: COMPLETED Error: 0 Configure job archive attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can configure the job archive attributes other than the default value for your cluster. If required, you can change these values and set your own values for the cluster. The ``robin job archive`` task runs automatically on a CronJob. For more information about `CronJob `_. View job archive attributes --------------------------- You can view the job archive attributes available in your cluster to know the current value of these attributes. To view the job archive attributes, run the following command: .. code-block:: text # robin config list | grep arch **Example:** .. code-block:: text # robin config list | grep arch server | job_archive_age | 86400 server | job_archive_cron | 0 0 * * * Job archive attributes and their values --------------------------------------- The following are the job archive attributes and their default values in a cluster. ===================== ============== ============================================================================================================================================================================================================================================ Attribute Default value Description ===================== ============== ============================================================================================================================================================================================================================================ ``job_archive_age`` 86400 The age (in seconds) of the completed job logs to be archived. ``job_archive_cron`` 0 0 * * * The time at which the job archive task executes automatically to archive the Robin job logs. For more information about `Cron schedule syntax `_. ===================== ============== ============================================================================================================================================================================================================================================ Update job archive attributes ----------------------------- You can update the job archive attributes and set your own values for the cluster. To update the job archive attributes, run the following command: .. code-block:: text # robin config update server **Example:** .. code-block:: text # robin config update server job_archive_age 81000 The 'server' attribute 'job_archive_age' has been updated Purge Robin Job logs ==================== You can purge the Robin job logs when you don’t want to store the old job logs. You can purge these job logs by running the ``robin job purge`` command or the ``job_purge_cron``. The job purge task deletes the following job logs from the database and the nodes’ directories: * Successful jobs older than two weeks. * Failed jobs older than four weeks. * Robin maintenance jobs older than one week. The job purge task takes the following actions for the job-ids and its children job-ids: * For the job logs present in the Robin master nodes: - Deletes the server job log directory ``/var/log/robin/server/``. - Deletes the server job log archive file ``/var/log/robin/server/archived/.tar.gz``. * For the job logs present in the Robin master and agent nodes: - Deletes the agent job log directory ``/var/log/robin/agent/``. - Deletes the agent job log archive file ``/var/log/robin/agent/archived/.tar.gz``. .. tabs:: .. tab:: CLI To purge the Robin job logs, run the following command: .. code-block:: text # robin job purge --age --failed-job-age --maintenance-job-age --maintenance-job-types --before-id =========================== ================================================================================================================================================= ``--age`` Purge successful jobs with completed time lesser than specified date and time (in '%Y-%m-%dT%H:%M:%S' format). **Default:** 2 weeks old from now. ``--failed-job-age`` Purge failed jobs with completed time lesser than specified date and time (in '%Y-%m-%dT%H:%M:%S' format). **Default:** 4 weeks old from now. ``--maintenance-job-age`` Purge maintenance jobs with completed time lesser than specified date and time (in '%Y-%m-%dT%H:%M:%S' format). **Default:** 1 week old from now. ``--maintenance-job-types`` Purge the maintenance jobs as per the job types separated by comma. **Default:** ``JobArchive,JobPurge``. ``--before-id`` Logs for jobs less than this ID will be purged if age is not specified. =========================== ================================================================================================================================================= **Example:** .. code-block:: text # robin job purge --age 2021-04-06T18:14:00 --failed-job-age 2021-04-06T18:14:00 --maintenance-job-age 2021-04-06T18:14:00 --wait Job: 309 Name: JobPurge State: VALIDATED Error: 0 Job: 309 Name: JobPurge State: COMPLETED Error: 0 Configure job purge attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can configure the job purge attributes other than the default value for your cluster. If required, you can change these values and set your own values for the cluster. View job purge attributes ------------------------- You can view the job purge attributes available in your cluster to know the current value of these attributes. To view the job purge attributes, run the following command: .. code-block:: text # robin config list | grep job_purge **Example:** .. code-block:: text # robin config list | grep job_purge server | job_purge_age | 1209600 server | job_purge_cron | 30 0 * * * server | job_purge_failed_age | 2419200 server | job_purge_maintenance_age | 604800 server | job_purge_maintenance_jtypes | JobArchive,JobPurge server | job_purge_max_count | 100000 Job purge attributes and their values ------------------------------------- The following are the job purge attributes and their default values in a cluster. ================================ =================== ======================================================================================================================================================================================================================================== Attribute Default value Description ================================ =================== ======================================================================================================================================================================================================================================== ``job_purge_age`` 1209600 The age (in seconds) of the completed job logs to be purged. ``job_purge_cron`` 30 0 * * * The time at which the job purge task executes automatically to purge the Robin job logs. For more information about `Cron schedule syntax `_. ``job_purge_failed_age`` 2419200 The age (in seconds) of the failed job logs to be purged. ``job_purge_maintenance_age`` 604800 The age (in seconds) of the maintenance job logs to be purged. ``job_purge_maintenance_jtypes`` JobArchive,JobPurge The types of maintenance jobs to be purged. ``job_purge_max_count`` 100000 The maximum number of job logs that can be purged at a time. ================================ =================== ======================================================================================================================================================================================================================================== Update job purge attributes --------------------------- You can update the job purge attributes and set your own values for the cluster. To update the job purge attributes, run the following command: .. code-block:: text # robin config update server **Example:** .. code-block:: text # robin config update server job_purge_age 13396198 The 'server' attribute 'job_purge_age' has been updated .. Note:: Robin recommends that the ``job_purge_cron`` task must be run daily. Clean stale Robin job log directory =================================== You can clean the stale Robin job log directory and the archived job log directory. Robin considers its database to be the most reliable source for the job logs. You must reconcile the stale job log directory with the archived job log directory at least once a month to avoid retaining the job logs when you delete them manually from the database. The job logs from the stale job log directory and the archived job log directory are deleted when the job logs are not available in Robin’s database. .. tabs:: .. tab:: CLI To clean the stale job logs, run the following command: .. code-block:: text # robin job cleanup **Example:** .. code-block:: text # robin job cleanup --wait Job: 358447 Name: JobCleanupStaleLogs State: WAITING Error: 0 Job: 358447 Name: JobCleanupStaleLogs State: COMPLETED Error: 0 Configure job cleanup attribute ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can configure the time at which the job cleanup task executes automatically to clean Robin job logs other than the default value for your cluster. View job cleanup attribute -------------------------- You can view the job cleanup attribute available in your cluster to know the current value of the attribute. To view the job cleanup attribute, run the following command: .. code-block:: text # robin config list | grep job_cleanup **Example:** .. code-block:: text # robin config list | grep job_cleanup server | job_cleanup_cron | 0 1 1 * * Job cleanup attribute and its value ----------------------------------- The following is the job cleanup attribute and its default value in a cluster. ===================== ============== ==================================================================================================================================================================================================================================================================================================================== Attribute Default value Description ===================== ============== ==================================================================================================================================================================================================================================================================================================================== ``job_cleanup_cron`` 0 1 1 * * The time at which the job cleanup task executes automatically to clean the Robin job logs from the stale Robin job log directory and the archived job log directory. For more information about `Cron schedule syntax `_. ===================== ============== ==================================================================================================================================================================================================================================================================================================================== Update job cleanup attribute ---------------------------- You can update the job cleanup attribute and set your own value for the cluster. To configure job cleanup attribute, run the following command: .. code-block:: text # robin config update server job_cleanup_cron **Example:** .. code-block:: text # robin config update server job_cleanup_cron "0 1 2 * *" The 'server' attribute 'job_cleanup_cron' has been updated Log Collection ================ During any cluster wide failure or unexpected negative scenarios that affect multiple services, logs from all the system components will be needed by Robin in order to debug the issue properly. However sometimes given the scope of the issue, only a subsection of logs need to be collected. This granularity is available but it is highly recommended to always send the complete set of logs when filing a bug report with Robin. Available age-based filtering helps in reducing storage footprint. Robin supports uploading logs to the following destinations: ========================= ========================================================= ``robin-storage`` Used to store collected logs in Robin backed storage ``nfs`` Used to store collected logs in NFS. ``s3`` Used to store collected logs in Amazon S3 ``ssh`` Used to store collected logs in a given remote location ========================= ========================================================= Storing logs using Robin Storage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. tabs:: .. tab:: CLI Logs collected by Robin can be stored on a volume created on the local cluster, with the following command: .. code-block:: text # robin log collect robin-storage --nodes --dest-path --size --media --age ============================= ========================================================= ``rpool`` Name of the resource pool name to use. ``--nodes `` Comma separated list of nodes from which to collect. The default is to collect all ``--dest-path `` Destination path where log files will be copied ``--size `` Size of the storage volume for the log collect. The default is 250GB ``--media `` Specify which type of drives to allocate storage from. Choices include: 'HDD', 'SSD'. Default media type is 'HDD' ``--age `` Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. ============================= ========================================================= **Example:** .. code-block:: text # robin log collect robin-storage default --wait Job: 123 Name: LogCollect State: PROCESSED Error: 0 Job: 123 Name: LogCollect State: WAITING Error: 0 Job: 123 Name: LogCollect State: COMPLETED Error: 0 Storing logs using NFS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. tabs:: .. tab:: CLI Logs collected by Robin can be stored on a NFS share, with the following command: .. code-block:: text # robin log collect nfs --nodes --age ============================= ========================================================= ``nfs_share`` The 'hostname' or 'IP', 'export_path' and 'dest_path' for an NFS share in the form of ::' ``--nodes `` Comma separated list of nodes from which to collect. The default is to collect all ``--age `` Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. ============================= ========================================================= **Example:** .. code-block:: text # robin log collect nfs 10.9.82.162:/tmp:/demo_log_collect Job: 126 Name: LogCollect State: PROCESSED Error: 0 Job: 126 Name: LogCollect State: WAITING Error: 0 Job: 126 Name: LogCollect State: COMPLETED Error: 0 Storing logs using AWS S3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. tabs:: .. tab:: CLI Logs collected by Robin can be stored on a AWS S3, with the following command: .. code-block:: text # robin log collect s3 --nodes --access_key --secret_key --age ================================= ========================================================= ``url`` S3 URL in the format https://s3-.amazonaws.com// ``aws_config`` JSON file containing Access key, Secret Key and Region. Example format {"aws_access_key_id": , "aws_secret_access_key": , "region": } ``--nodes `` Comma separated list of nodes from which to collect. The default is to collect all ``--access_key `` Access Key for the respective user with access to the specified S3 bucket. ``--secret_key `` Secret Key for the respective user with access to the specified S3 bucket. ``--age `` Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. ================================= ========================================================= **Example:** .. code-block:: text # robin log collect s3 https://s3-us-west-2.amazonaws.com/log-collect/demo_log_collect /root/aws.json --wait Job: 132 Name: LogCollect State: PROCESSED Error: 0 Job: 132 Name: LogCollect State: WAITING Error: 0 Job: 132 Name: LogCollect State: COMPLETED Error: 0 Storing logs in a remote location ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. tabs:: .. tab:: CLI Logs collected by Robin can be stored in a remote location, with the following command: .. code-block:: text # robin log collect ssh --nodes --password --age ================================= ========================================================= ``dest`` Destination path where the log files will be copied to. The path should be in the form of '@:' ``--nodes `` Comma separated list of nodes from which to collect. The default is to collect all ``--password `` Provide a password on the command line instead of via a prompt ``--age `` Collects log based on age. Valid options are s (sec), m (min), h (hrs), d (days), Mo (month) and y (years) For example 10m represents 10 minutes. ================================= ========================================================= **Example:** .. code-block:: text # robin log collect ssh root@10.9.82.163:/demo_log_collect --password robin123 Job: 129 Name: LogCollect State: PROCESSED Error: 0 Job: 129 Name: LogCollect State: WAITING Error: 0 Job: 129 Name: LogCollect State: COMPLETED Error: 0 Retrieving Job Logs ==================== .. tabs:: .. tab:: CLI Robin provides a utility which collects all the appropriate logs from the necessary nodes for a particular job and its consequent hierarchy. It stores these logs within a single tarball that can be provided to Robin alongside a bug report. In addition this useful for an Administrator to debug as to why a job failed unexpectedly. This functionality is extremely convienent as it automates the process of the user logging into every affected node and collecting/inspecting the relevant log files. Issue the following command to retrieve logs for a specific job: .. code-block:: text # robin job get ====================== =========================================================================================== ``id`` ID of job to collect the logs for ====================== =========================================================================================== **Example:** .. code-block:: text # robin job get 1 Retrieving log files... Log files for Job ids: [1] are retrieved successfully at 1582189081.tar.gz Cluster Auditing ================ Every operation that is performed by a user on an identifiable object within a Robin cluster is logged for auditing purposes. This allows admins to track the exact series of operations performed by a user as well to monitor the general activity on the concerned cluster. This not only enables more accurate backtracking for troubleshooting purposes as well improving the thoroughness of security audits. Detailed below are the methods by which a user can retrieve the audit log. Retrieving audit logs from the Robin Database ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. tabs:: .. tab:: CLI In order to access the audit log containing information such as which user executed an operation, the tenant and node from which they executed it from, the type of object and operation involved, and the result of the operation issue the following command: .. code-block:: text # robin user-audit list --exec-user --exec-tenant --owner-user --owner-tenant --id --object-type --page_size --page_num --operation --result --full ========================== ======================================================================================================================================== ``exec-user `` Filter by username for the user who initiated the operation. Note this option cannot be used in conjunction with ``--owner-user`` parameter ``--exec-tenant `` Filter by tenant name for the user who initiated the operation. Note this option cannot be used in conjunction with ``--owner-tenant`` ``--owner-user `` Filter by username for the user who initiated the operation. Note this option cannot be used in conjunction with ``--exec-user`` ``--owner-tenant `` Filter by tenant name for the user who initiated the operation. Note this option cannot be used in conjunction with ``--exec-tenant`` ``--id `` Filter for a specific record Id ``--object-type `` Filter by object type ``--operation `` Filter by operation ``--page_size `` Number of audit records that should be displayed for each page ``--page_num `` Page number to start displaying audit records from (starting index 1) ``--result `` Filter by operation result ``--full`` Display additional information about the audit records ========================== ======================================================================================================================================== **Example 1 (List first page of audit records):** .. code-block:: text # robin user-audit list Id | Timestamp | IP Addr | Exec User | Exec Tenant | Owner User | Owner Tenant | Object Type | Operation | Result ----+--------------------------+-------------+-----------+----------------+------------+--------------+-----------------+-----------+--------- 643 | August 10, 2021 14:17:47 | 172.20.0.1 | robin | Administrators | | | USER | login | success 642 | July 13, 2021 11:24:13 | 10.9.121.40 | robin | Administrators | | | USER | login | success 641 | July 13, 2021 11:24:12 | 172.20.0.1 | robin | Administrators | | | METRICS | enable | success 640 | July 13, 2021 11:24:10 | 172.20.0.1 | robin | Administrators | | | CONFIG | update | success 639 | July 13, 2021 11:24:06 | 172.20.0.1 | robin | Administrators | | | NODE | probe | success 638 | July 13, 2021 11:24:04 | 172.20.0.1 | robin | Administrators | | | NODE | probe | success 637 | July 13, 2021 11:24:04 | 172.20.0.1 | robin | Administrators | | | NODE | probe | success 636 | July 13, 2021 11:23:58 | 172.20.0.1 | robin | Administrators | | | USER | login | success 635 | July 13, 2021 11:23:57 | 172.20.0.1 | robin | Administrators | | | USER | login | success 634 | July 13, 2021 11:23:49 | 172.20.0.1 | robin | Administrators | | | FILE_COLLECTION | online | success 633 | July 13, 2021 11:23:44 | 172.20.0.1 | robin | Administrators | | | USER | login | success 632 | July 13, 2021 11:20:07 | 172.20.0.1 | robin | Administrators | | | NODE | config | success 631 | July 13, 2021 11:20:07 | 172.20.0.1 | robin | Administrators | | | NODE | config | success 630 | July 13, 2021 11:20:07 | 172.20.0.1 | robin | Administrators | | | NODE | config | success 629 | July 13, 2021 11:20:01 | 172.20.0.1 | robin | Administrators | | | NODE | config | success 628 | July 13, 2021 11:20:01 | 172.20.0.1 | robin | Administrators | | | NODE | config | success 627 | July 13, 2021 11:20:01 | 172.20.0.1 | robin | Administrators | | | NODE | config | success 626 | July 13, 2021 11:19:59 | 172.20.0.1 | robin | Administrators | | | USER | login | success 625 | July 13, 2021 11:19:01 | 172.20.0.1 | robin | Administrators | | | USER | login | success 624 | July 13, 2021 11:18:57 | 172.20.0.1 | robin | Administrators | | | USER | login | success -------------------------------------------- 537 items, page 1 of 27. -------------------------------------------- **Example 2 (List audit records filtered by object type):** .. code-block:: text # robin user-audit list --object-type APPLICATION Id | Timestamp | IP Addr | Exec User | Exec Tenant | Owner User | Owner Tenant | Object Type | Operation | Result ---+---------------------------+------------+-----------+----------------+------------+----------------+-------------+-----------+--------- 46 | October 26, 2020 12:51:46 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 45 | October 26, 2020 12:51:25 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 44 | October 26, 2020 12:51:18 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 43 | October 26, 2020 12:51:06 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 42 | October 26, 2020 12:50:59 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 41 | October 26, 2020 12:49:44 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 40 | October 26, 2020 12:49:26 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 39 | October 26, 2020 12:49:17 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 38 | October 26, 2020 12:49:03 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 37 | October 26, 2020 12:46:17 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success 36 | October 26, 2020 12:45:35 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success -------------------------------------------- 11 items, page 1 of 1. -------------------------------------------- **Example 3 (Show details for a single audit record):** .. code-block:: text # robin user-audit list --id 46 --full Id | Timestamp | IP Addr | Exec User | Exec Tenant | Owner User | Owner Tenant | Object Type | Operation | Result ---+---------------------------+------------+-----------+----------------+------------+----------------+-------------+-----------+--------- 46 | October 26, 2020 12:51:46 | 172.17.0.1 | robin | Administrators | robin | Administrators | APPLICATION | create | success object_attributes: {'tenant_id': 1, 'object_id': 11, 'jobid': 74, 'object_name': 'app-11', 'user_id': 3} details: -------------------------------------------- 1 items, page 1 of 1. -------------------------------------------- Robin audit logs ^^^^^^^^^^^^^^^^ Robin supports the audit log feature. The audit logs capture the requests of all event changes, such as creating, updating, and deleting requests within the Robin cluster. The audit logs let you view the user log details in the audit log file. The audit logs are stored in the ``robin-user-audit.log`` file and the file is located in the ``/home/robinds/var/log/robin/robin-user-audit.log`` directory. .. Note:: By default, the audit log feature is disabled. Enable audit logs ------------------ You must enable the audit log feature to view the user log details. .. Note:: As part of enabling the audit log feature, you must run the ``service robin-server restart`` command to restart the Robin server for the changes to take effect. After you enable the audit log feature, all log messages are provided in a ``JSON`` format or ``text`` format. As an administrator, you can view the audit logs using any text editor software application. You can also collect these logs using any log forwarding tool for processing. To enable the audit log feature, perform the following steps: 1. Run the following command to enable the audit log feature: .. code-block:: text # robin config update user_audit log_enable True 2. Run the following command to restart the Robin server for the changes to take effect: .. code-block:: text # service robin-server restart **Example** .. code-block:: text # robin config update user_audit log_enable True The 'user_audit' attribute 'log_enable' has been updated # service robin-server restart Redirecting to /bin/systemctl restart robin-server.service Disable audit logs ------------------ You can also disable the audit log feature to stop recording the audit log messages in the ``robin-user-audit.log`` file. .. Note:: As part of disabling the audit log feature, you must run the ``service robin-server restart`` command to restart the Robin server for the changes to take effect. To disable the audit log feature, perform the following steps: 1. Run the following command to disable the audit log feature: .. code-block:: text # robin config update user_audit log_enable False 2. Run the following command to restart the Robin server for the changes to take effect: .. code-block:: text # service robin-server restart **Example** .. code-block:: text # robin config update user_audit log_enable False The 'user_audit' attribute 'log_enable' has been updated # service robin-server restart Redirecting to /bin/systemctl restart robin-server.service Configure user audit attributes ------------------------------- You can configure the user audit attributes other than the default configuration. **View user audit attributes** To view the list of the user audit attributes, run the following command: .. code-block:: text # robin config list user_audit **User audit attributes and their values** .. list-table:: :widths: 15 15 80 :header-rows: 1 * - Attribute - Default value - Valid value * - enabled - True - ``True`` - to enable the user audit feature ``False`` - to disable the user audit feature * - log_enable - False - ``True`` – to enable the audit log feature ``False`` – to disable the audit log feature * - log_file_size - 10 - The maximum size in megabytes of the audit log file * - log_format - JSON - ``JSON`` – display output in ``JSON`` format ``TEXT`` – display output in ``TEXT`` format * - log_level - INFO - Log level generates the messages and assigns a severity level to them. The following are the valid values: ``INFO`` – for informational messages ``DEBUG`` – for debug-level messages that contain information for debugging a program ``WARNING`` – for warning messages ``ERROR`` – for error messages ``CRITICAL`` - for critical messages * - log_retention - 4 - The maximum number of audit log files to retain. Any additional log file is rolled over. **Update user audit attributes** To update the user audit attributes, run the following command: .. code-block:: text # robin config update user_audit **Example** Update the ``log_format`` attribute of the user audit. .. code-block:: text # robin config update user_audit log_format TEXT The 'user_audit' attribute 'log_format' has been updated View user audit logs -------------------- To view all user log details, run the following command: .. code-block:: text # cat /var/log/robin/robin-user-audit.log **Example** .. tabs:: .. tab:: CLI All user log details in text format. .. code-block:: text # cat /var/log/robin/robin-user-audit.log 1623 | 2021-08-12T15:26:06.581513+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1624 | 2021-08-12T15:26:12.655515+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1625 | 2021-08-12T15:26:12.783629+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1626 | 2021-08-12T15:26:13.118734+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1627 | 2021-08-12T15:26:18.584252+7:00 | 192.0.2.2 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1628 | 2021-08-12T15:26:21.752403+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1629 | 2021-08-12T15:26:28.934639+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1630 | 2021-08-12T15:26:36.089382+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1631 | 2021-08-12T15:26:43.233911+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1632 | 2021-08-12T15:26:50.370029+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1633 | 2021-08-12T15:26:57.528168+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1634 | 2021-08-12T15:27:04.749161+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1635 | 2021-08-12T15:27:11.934771+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1636 | 2021-08-12T15:27:19.127729+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1637 | 2021-08-12T15:27:26.291575+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1638 | 2021-08-12T15:27:33.702357+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- 1639 | 2021-08-12T15:27:41.017244+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- .. tab:: JSON All user log details in JSON format. .. code-block:: text # cat /var/log/robin/robin-user-audit.log { "id": 197, "timestamp": "2021-08-12T13:56:17.230515+7:00", "ip_addr": "192.0.2.2", "exec_user_id": 3, "exec_username": "robin", "exec_tenant_id": 1, "exec_tenant": "Administrators", "owner_user_id": null, "owner_username": null, "owner_tenant_id": null, "owner_tenant": null, "object_type": "NAMESPACE", "operation": "create", "result": "success", "object_attributes": { "object_name": "oc8687pk4i", "username": "robin", "tenant": "Administrators", "import_namespace": false }, "details": {} } { "id": 198, "timestamp": "2021-08-12T13:56:17.748933+7:00", "ip_addr": "192.0.2.1", "exec_user_id": 3, "exec_username": "robin", "exec_tenant_id": 1, "exec_tenant": "Administrators", "owner_user_id": null, "owner_username": null, "owner_tenant_id": null, "owner_tenant": null, "object_type": "USER", "operation": "login", "result": "success", "object_attributes": {}, "details": {} } { "id": 199, "timestamp": "2021-08-12T13:56:33.766674+7:00", "ip_addr": "192.0.2.2", "exec_user_id": 3, "exec_username": "robin", "exec_tenant_id": 1, "exec_tenant": "Administrators", "owner_user_id": null, "owner_username": null, "owner_tenant_id": null, "owner_tenant": null, "object_type": "NAMESPACE", "operation": "delete", "result": "success", "object_attributes": {}, "details": {} } { "id": 200, "timestamp": "2021-08-12T13:56:34.290960+7:00", "ip_addr": "192.0.2.1", "exec_user_id": 3, "exec_username": "robin", "exec_tenant_id": 1, "exec_tenant": "Administrators", "owner_user_id": null, "owner_username": null, "owner_tenant_id": null, "owner_tenant": null, "object_type": "USER", "operation": "login", "result": "success", "object_attributes": {}, "details": {} } To view the last user log detail, run the following command: .. code-block:: text # tail -n 1 /var/log/robin/robin-user-audit.log **Example** .. tabs:: .. tab:: CLI The last user log detail in text format. .. code-block:: text # tail -n 1 /var/log/robin/robin-user-audit.log 1645 | 2021-08-12T15:28:19.298469+7:00 | 192.0.2.1 | robin | Administrators | -- | -- | USER | login | success | -- | -- .. tab:: JSON The last user log detail in JSON format. .. code-block:: text # tail -n 1 /var/log/robin/robin-user-audit.log { "id": 1646, "timestamp": "2021-08-12T15:31:44.069446+7:00", "ip_addr": "192.0.2.2", "exec_user_id": 3, "exec_username": "robin", "exec_tenant_id": 1, "exec_tenant": "Administrators", "owner_user_id": null, "owner_username": null, "owner_tenant_id": null, "owner_tenant": null, "object_type": "CONFIG", "operation": "update", "result": "success", "object_attributes": { "section": "user_audit", "attribute": "log_format" }, "details": { "msg": "The 'user_audit' attribute 'log_format' has been updated" } } - **Points to consider for the Robin audit logs** - The active master can generate the log file. - The log file is automatically updated by the Robin control plane processes whenever an event occurs. - The logs are automatically rotated to ensure that these logs do not consume the whole log partition.