18. Sherlock - Troubleshooting Tool¶
Sherlock is a troubleshooting and self-diagnostic command-line tool (CLI) in Robin. It is designed to assist Robin administrators to identify and analyze any problems with Robin clusters. Using Sherlock, you as an administrator can diagnose cluster-wide problems or any information regarding specific applications, nodes, containers, volumes, devices, and so on. It provides details of problems affecting from applications to spindle.
Note
By default, Sherlock displays only unhealthy objects.
Sherlock tool generates details for troubleshooting by querying a range of Robin APIs and direct database calls to generate cluster health reports.
Note
You can access the Sherlock tool only from the active Robin master-pod. On CNS clusters, you can access using the primary robinds
Pod.
18.1. Sherlock Use Cases¶
You can use the Sherlock tool when you notice any problem with your Robin cluser to know details of the problem, plan maintenance, etc.
Analyze problem details
When there is an issue with the Robin cluster, Sherlock detects it automatically and displays all the impacted objects when you use the Sherlock tool.
For example, if a disk goes down on the cluster, Sherlock displays all the impacted objects because of the disk(device) issue, such as volumes, Pods, applications, users, etc. Similarly, Sherlock detects the problems with other objects and displays details.
This information helps you to detect the issue faster and fix it.
Plan maintenance activities
Sherlock also helps you to plan maintenance activities.
For example, if you want to bring down any node for maintenance, you can run the Sherlock tool and check the impacted objects (Pods, volumes, applications, and users) for planning maintenance.
18.2. Details you can View Using Sherlock¶
Sherlock is useful as it allows you to map resources both top-down and bottom-up through the resource hierarchy.
The Sherlock tool provides details of the following:
Application to Pod mappings
Source of computing resources for Pods
Volumes attached to Pods
Source of storage resources for Volume
Number of snapshots, replicas
Source of replication storage resources
Disks attached to hosts
Disks capacity and their consumption
Volumes that are hosted and Pods that owns the volumes
Robin cluster critical services status
For example:
Node → Disk
Node → Pod → Application
Disk → Volumes → Pods → Application
Volumes → Pods → Application
Volumes → File-object → Bundle → App
Pods → Application
Critical node services → Node → Pods → Application
Application → Users/Tenant
18.3. Sherlock Report¶
A report generated by Sherlock using the sherlock
command has the following sections:
SHOWING APPLICATIONS THAT NEED ATTENTION: |
This section of the report displays the unhealthy applications running on your cluster (with -H healthy apps), where the Pods comprising the application are located and volumes, devices on which data is saved. |
SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION: |
This section of the report displays details of the unhealth Pods (with -H healthy Pods) running on your cluster and the volumes, devices that are being used for storing data. |
SHOWING UNHEALTHY NODES THAT NEED ATTENTION: |
This section of the report displays details of unhealthy volumes (with -H health volumes) running on your cluster and the devices that these volumes are using for storing data. |
SHOWING UNHEALTHY NODES THAT NEED ATTENTION: |
This section of the report displays unhealthy nodes (with -H healthy) of the nodes and services running on the nodes, Pods in the nodes, volumes, and devices. |
SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION: |
This section of the report displays the status of unhealthy (with -H healthy) devices and impacted objects. |
18.4. Access Sherlock¶
After you log into your Robin cluster you can access the Sherlock troubleshooting tool. The Sherlock tool is part of Robin CNS and Robin CNP. You do not need to install anything manually to start using it.
The tool displays details of the following components that might need your attention in the form of a report:
Applications
Pods
Volumes
Nodes
Devices
Unavailable file collection
Unavailable bundle details
Note
If all components are fine, the tool displays all healthy or available messages.
Prerequisites
You must access the Sherlock tool only from the active master node.
On CNS clusters, you must run the Sherlock commands from the
primary robinds
Pod.
To access Sherlock, run the following command:
# sherlockNote
You can run the
sherlock --help
after accessing Sherlock to view commands options.
Example showing all healthy and available
# sherlock SHOWING APPLICATIONS THAT NEED ATTENTION: All apps are healthy SHOWING PODS THAT NEED ATTENTION: All pods are healthy SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION All volumes are healthy SHOWING UNHEALTHY NODES THAT NEED ATTENTION: All nodes are healthy SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION: All devices are healthy SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION: All file collection are available SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION: All bundles are available Only unhealthy objects are shown. To see everything re-run with -H|--healthy opt ion To see more details rerun with -V|--verbose option sherlock produced results in 155 milliseconds (Sat Sep 18 06:14:59 PM 2021). |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections, 2 bundles, 1 users and 1 tenants were analyzed
Example showing unhealthy app and Pods
# sherlock SHOWING APPLICATIONS THAT NEED ATTENTION: |-- robinte STATE: PLANNED Robin Systems 2/2 pods unhealthy KIND: ROB IN SHOWING USERS WHO ARE AFFECTED: |-- Robin Systems (Firstname: Robin LastName: Systems Email: None) | |-- APPS 1: robinte SHOWING PODS THAT NEED ATTENTION: o-- POD/VNODE ID 121: robinte.R1.01 INSTALLING/ONLINE 1 CPU, 50 MB MEM NODE : UP, RIO: UP |-- POD/VNODE ID 122: robinte.R2.01 INSTALLING/ONLINE 1 CPU, 50 MB MEM NODE : UP, RIO: UP SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION All volumes are healthy SHOWING UNHEALTHY NODES THAT NEED ATTENTION: All nodes are healthy SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION: All devices are healthy SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION: All file collection are available SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION: All bundles are available
18.5. View Health of All objects¶
You can view the health of all objects using the -H option. The -H option displays healthy objects along with unhealthy objects in the report. Sherlock displays health of all objects(Pods, volumes, applications, nodes, devices, file collection, bundle).
To view health of all objects, run the following command:
# sherlock -H
Note
You can use the -V
option to view the details of all healthy and unhealthy objects. Also, you can the -H
(healthy) and -V
(verbose) options together with other commands as well.
Example with -H
# sherlock -H No matching apps found No matching pods found SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER |-- VOLID 1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b, usage: 448 MB / 20 GB, 1 snapshots, resync progress: SYNCED, using 1 devices |-- VOLID 132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839 , usage: 352 MB / 5 GB, 1 snapshots, resync progress: SYNCED, using 1 devices |-- VOLID 131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1 , usage: 576 MB / 11 GB, 1 snapshots, resync progress: SYNCED, using 1 devices All volumes are healthy SHOWING HEALTH OF 3/3 NODES RUNNING IN THE CLUSTER |-- eqx01-flash16 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY |-- eqx04-flash05 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY |-- eqx01-flash15 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY SHOWING HEALTH OF 26/26 DEVICES IN THE CLUSTER |-- /dev/sdi@eqx01-flash16 | 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1) | |-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8) | |-- /dev/sde@eqx01-flash15 | 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0) | |-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90) | |-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2) | |-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33) | |-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7) | |-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT) | |-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998) | |-- /dev/sdb@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a07510ec79d1f PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F) | |-- /dev/sdh@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW) | |-- /dev/sdf@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS) | |-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3) | |-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS) | |-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2) | |-- /dev/sdc@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a07510ee9a052 PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052) | SHOWING 1 FILE COLLECTIONS IN THE CLUSTER |-- file-collection-1631971248912 Online 0 errors 0 warnings SHOWING 1 BUNDLES IN THE CLUSTER |-- wordpress ONLINE 0 errors 0 warnings To see more details rerun with -V|--verbose option sherlock produced results in 200 milliseconds (Sat Sep 18 11:51:21 PM 2021). |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections, 1 bundles, 3 users and 3 tenants were analyzed
Example with -H and -V
# sherlock -H -V No matching apps found No matching pods found SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER |-- VOLID 1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b, usage: 448 MB / 20 GB, 1 snapshots, resync progress: SYNCED, using 1 devices | |-- DEVID 1: /dev/sdb on eqx01-flash15 using 448 MB/894.3 GB capacity, 14/20 slices, 14 segs, segspernap=1 RDVM: UP, DEV: READY | | (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998) | | | |-- SNAPSHOTS: 1 CREATED DEV OWN CLONES STATE SIZE | | |-- SNAPID 1: 1969/12/31 16:00:00 14 14 0 READY 448 MB | | | | |-- VOLID 132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839 , usage: 352 MB / 5 GB, 1 snapshots, resync progress: SYNCED, using 1 devices | |-- DEVID 2: /dev/sde on eqx01-flash15 using 352 MB/1.8 TB capacity, 11/5 slices, 11 segs, segspernap=3 RDVM: UP, DEV: READY | | (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0) | | | |-- SNAPSHOTS: 1 CREATED DEV OWN CLONES STATE SIZE | | |-- SNAPID 1: 1969/12/31 16:00:00 11 11 0 READY 352 MB | | | | |-- VOLID 131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1 , usage: 576 MB / 11 GB, 1 snapshots, resync progress: SYNCED, using 1 devices | |-- DEVID 11: /dev/sdi on eqx01-flash16 using 576 MB/1.8 TB capacity, 18/11 slices, 18 segs, segspernap=2 RDVM: UP, DEV: READY | | (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1) | | | |-- SNAPSHOTS: 1 CREATED DEV OWN CLONES STATE SIZE | | |-- SNAPID 1: 1969/12/31 16:00:00 18 18 0 READY 576 MB | | | | All volumes are healthy |-- eqx01-flash16 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY ============================================================================================================================================================================================= 0 PODS ARE RUNNING ON THIS NODE 9 DEVICES ARE ATTACHED TO THIS NODE |-- DEVID 11: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 11/119194 slices, 18 segs | |-- VOLID 131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1 576 MB nslices=11 nsnaps=1 nsegs=18 nsegs_per_snap=2 | |-- DEVID 12: /dev/sde READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 9: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 8: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 10: /dev/sdb READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 14: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 13: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 0: /dev/sda INIT 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/0 slices, 0 segs |-- eqx04-flash05 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY ============================================================================================================================================================================================= 0 PODS ARE RUNNING ON THIS NODE 8 DEVICES ARE ATTACHED TO THIS NODE |-- DEVID 16: /dev/sdb READY 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/57194 slices, 0 segs |-- DEVID 0: /dev/sda INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 0: /dev/sdd INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 15: /dev/sdc READY 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/57194 slices, 0 segs |-- DEVID 0: /dev/sde INIT 59.6 GB free=59.6 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 0: /dev/dm-1 INIT 17.4 GB free=17.4 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 0: /dev/dm-2 INIT 35.7 GB free=35.7 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 0: /dev/dm-0 INIT 6.0 GB free=6.0 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- eqx01-flash15 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY ============================================================================================================================================================================================= 0 PODS ARE RUNNING ON THIS NODE 9 DEVICES ARE ATTACHED TO THIS NODE |-- DEVID 0: /dev/sda INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 2: /dev/sde READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 5/119194 slices, 11 segs | |-- VOLID 132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839 352 MB nslices=5 nsnaps=1 nsegs=11 nsegs_per_snap=3 | |-- DEVID 3: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 7: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 5: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 6: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 4: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 1: /dev/sdb READY 894.3 GB free=893.8 GB (100%) 1/100 vols, 20/57194 slices, 14 segs | |-- VOLID 1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b 448 MB nslices=20 nsnaps=1 nsegs=14 nsegs_per_snap=1 | DEVICE /dev/sdi on eqx01-flash16 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0fe71f | PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1 |-- VOL: 131 pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1 576 MB nslices=11 nsegs=18 (2 ) nsnaps=1 DEVICE /dev/sde on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0feea2 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8 DEVICE /dev/sde on eqx01-flash15 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0db2c7 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0 |-- VOL: 132 pvc-94229d46-e381-4e3c-99a1-ddfe389d7839 352 MB nslices=5 nsegs=11 (3 ) nsnaps=1 DEVICE /dev/sdf on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0db9be | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90 DEVICE /dev/sdi on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0dbae3 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2 DEVICE /dev/sdg on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0ddd62 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33 DEVICE /dev/sdh on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0df3ba | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7 DEVICE /dev/sdd on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c101de8 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT DEVICE /dev/sdb on eqx01-flash15 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x500a075109604998 | PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998 |-- VOL: 1 file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b 448 MB nslices=20 nsegs=14 (1 ) nsnaps=1 DEVICE /dev/sdb on eqx04-flash05 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x500a07510ec79d1f | PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F DEVICE /dev/sdh on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0d9e30 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW DEVICE /dev/sdf on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0dc21f | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS DEVICE /dev/sdb on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0dd039 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3 DEVICE /dev/sdg on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0dee42 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS DEVICE /dev/sdd on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x5000c5008c0df26c | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2 DEVICE /dev/sdc on eqx04-flash05 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x500a07510ee9a052 | PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052 SHOWING 1 FILE COLLECTIONS IN THE CLUSTER SHOWING 1 BUNDLES IN THE CLUSTER sherlock produced results in 141 milliseconds (Sat Sep 18 11:45:50 PM 2021). |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections, 1 bundles, 3 users and 3 tenants were analyzed
18.6. View Sherlock Command Options¶
To view sherlock command options, run the following command:
# sherlock --help
Resource Inspection Options
Using the Resource Inspection Command Options, you can provide comma-separated values of resources(objects) names. It enables you to view details of multiple objects at a time.
-a
|--app NAME,...
Displays applications information.
-n
|--node NAME,...
Displays nodes information.
-p
|--pod NAME,...
Displays Pods information.
-v
|--vol NAME,...
Displays volumes information.
-d
|--dev NAME,...
Displays devices information.
Advisory Rebalancing Options
You can use the rebalancing command options to known the disks that are being overutilized or underutilized. Based on the report advice, you can make the adjustments
-D
|--devs-needing-rebalance
Displays information about devices that need rebalancing.
-L
|--vols-needing-rebalance
Displays information about volumes that need rebalancing
-Y
|--dev-rebalance-advice DEV
Provides advice on device rebalancing
-X
|--vol-rebalance-advice VOL
Provides advice on volume rebalancing.
Behavior Controlling Options
-U
|--strict
Mark resources that are not fully online as unhealthy. This option displays the resources (objects) that are partially healthy as unhelathy.
-C
|--cache
Build and use cache to speed up queries. It caches resources once and use the same cache for subsequent queries. Use when you run Sherlock repeatedly run with different options.
-H
|--healthy
Also show healthy resources. Displays healthy resources (objects) along with unhealthy objects.
-V
|--verbose
Displays detailed report.
-M
|--html
Print output in HTML format. Use this along with
--outfile
and provide a path to save an HTML file. You can open the HTML file in a browser.
-O
|--outfile
Print output to specified file. Provide any file path to save a file. For example:
/tmp/sherlock-output
.
-K
|--no-skip
Don’t skip unimportant resources to minimize output.
-J
|--scan-joblogs
Scan job logs for errors.
-S
|--mon SECS
Monitor the resource metrics for every <secs> interval. Use this option along with
--app
,--pod
or--vol
.
--start TIME
Start scanning jobs starting at this date/time (default: 72 hrs ago).
--end TIME
End scanning jobs at this date/time (default: now).
--server PORT
Run in server mode so you can operate this from a web brower.
--prom
Run in server mode to serve metrics in Prometheus format.
Device Options
--dev all
Displays information about all devices
--dev full
Displays information devices that are near full
--dev NODENAME:,...
Displays information devices of <nodename>
--dev wwn,...
Displays information for devices with matching WWNs
--dev NODENAME:DEVPATH,...
Displays information for specific device
18.7. Access Sherlock On Web Server¶
You can access the Sherlock tool on a web browser. You can provide an available port number between 1-65535.
To set a port number to access the Sherlock tool for your cluster, run the following command:
# sherlock --server <port number>
Example
# sherlock --server 45536 running the read_config now Running in server mode. Point your web browser to the following address: https://eqx01-flash15:45536
18.8. Check Application Health¶
Using Sherlock you can check the health of deployed applications.
The Sherlock command output provides the following details about an application:
Volumes or devices on which the application data is stored
Pods in which the application is running
Node on which the Pod is existing
Running application details
Failed jobs on the specific application
Prerequisite
You must have the application name for which you want to check the health. You can run the sherlock -H
command to know the application name.
To check the application health, run the following command:
sherlock --app <application-name> You can optionally use the ``-V`` option to view the details.
Example
sherlock --app mysql-test -V -H APPNAME: mysql-test STATE: ONLINE Robin Systems 1/1 vnodes healthy ============================================================================================================================================================================================================================================== APP HAS 1 VNODES: VNODEID 2: mysql-test.mysql.01 on centos-60-181 INST: ONLINE/INST: STARTED, NODE: ONLINE, RIO: UP |-- VOLID 4: mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 1 GB | |-- DEVID : /dev/sdd segs=centos-60-181 slices=14 rawspace=1 448 MB |-- VOLID 5: mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 10 GB | |-- DEVID : /dev/sdd segs=centos-60-181 slices=10 rawspace=10 320 MB | APP IS RUNNING ON THE FOLLOWING 1 NODES: |-- centos-60-181 RIO: UP | |-- mysql-test.mysql.01 ONLINE/STARTED | APP IS STORING DATA ON THE FOLLOWING 1 DEVICES: |-- DEVID 6: /dev/sdd on centos-60-181 2 vols | |-- VOLID 4: mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 448 MB nslices=1 nsegs=14 nsnaps=3 segspersnap=5 | |-- VOLID 5: mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 320 MB nslices=10 nsegs=10 nsnaps=3 segspersnap=1 | THERE ARE 23 FAILED JOBS TO INSPECT BETWEEN Fri May 3 01:22:23 AM 2019 - Fri May 10 01:22:23 AM 2019 |-- mysql-test.mysql.01 | |-- VnodeDelete jobid=98 state=10 error=1 start=Thu May 9 00:23:31 2019 end=Thu May 9 00:23:32 2019 | | predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to execute | |-- VnodeDelete jobid=88 state=10 error=1 start=Thu May 9 00:20:37 2019 end=Thu May 9 00:20:43 2019 | | postdestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to execute | |-- mysql-test | |-- ApplicationDelete jobid=97 state=10 error=1 start=Thu May 9 00:23:31 2019 end=Thu May 9 00:23:32 2019 | | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to | | execute' | |-- ApplicationDelete jobid=87 state=10 error=1 start=Thu May 9 00:20:37 2019 end=Thu May 9 00:20:43 2019 | | Job failed. One or more child jobs reported errors. Error: 'postdestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to | | execute' | |-- ApplicationDelete jobid=92 state=10 error=1 start=Thu May 9 00:22:17 2019 end=Thu May 9 00:22:18 2019 | | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to | | execute' | |-- ApplicationDelete jobid=95 state=10 error=1 start=Thu May 9 00:22:59 2019 end=Thu May 9 00:23:00 2019 | | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to | | execute' | |-- mysql-test1 | |-- ApplicationCreate jobid=129 state=10 error=1 start=Thu May 9 03:50:54 2019 end=Thu May 9 03:50:54 2019 | | Invalid Zone Id and/or Bundle Id: 1/2 | |-- ApplicationCreate jobid=128 state=10 error=1 start=Thu May 9 03:50:08 2019 end=Thu May 9 03:50:08 2019 | | Invalid Zone Id and/or Bundle Id: 1/2 | sherlock produced results in 90 milliseconds (Fri May 10 01:22:23 AM 2019). |-- 3 nodes, 12 disks, 3 vols, 7 snapshots, 1 apps, 1 vnodes, 2 users and 1 tenants were analyzed
18.9. Check Node Health¶
When you check health of a node, you can find details about all the objects (Applications, Pods, voulmes, devices, file collection, and bundles) under the node.
To check the health of a node, complete the following steps:
1 To get the list of nodes for selecting a desired node for health check, run the following command:
# sherlock -H -K
To know the health of a node and its details, run the following command:
# sherlock --node <node name>
You can optionally use the -V
option to view the details.
Example
# sherlock -H -K No matching apps found No matching pods found SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER |-- VOLID 1: file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44, usage: 448 MB / 20 GB, 1 snapshots, resync progress: SYNCED, using 1 devices |-- VOLID 163: pvc-66646581-0210-46e2-b945-9ea880be38d7 , usage: 352 MB / 5 GB, 1 snapshots, resync progress: SYNCED, using 1 devices |-- VOLID 162: pvc-eb63979d-720e-41c9-808f-145306dc1259 , usage: 576 MB / 11 GB, 1 snapshots, resync progress: SYNCED, using 1 devices All volumes are healthy SHOWING HEALTH OF 3/3 NODES RUNNING IN THE CLUSTER |-- eqx04-flash05 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY |-- eqx01-flash16 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY |-- eqx01-flash15 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY SHOWING HEALTH OF 26/26 DEVICES IN THE CLUSTER |-- /dev/sdh@eqx01-flash16 | 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW) | |-- /dev/sde@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0) | |-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90) | |-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2) | |-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33) | |-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7) | |-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT) | |-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998) | |-- /dev/sdf@eqx01-flash16 | 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS) | |-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3) | |-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS) | |-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2) | |-- /dev/sdi@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1) | |-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8) | |-- /dev/sdb@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a07510ec79d1f PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F) | |-- /dev/sdc@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a07510ee9a052 PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052) | SHOWING 1 FILE COLLECTIONS IN THE CLUSTER |-- file-collection-1632045271349 Online 0 errors 0 warnings SHOWING 0 BUNDLES IN THE CLUSTER All bundles are available To see more details rerun with -V|--verbose option sherlock produced results in 158 milliseconds (Sun Sep 19 06:02:28 PM 2021). |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections, 0 bundles, 5 users and 5 tenants were analyzed **For this example, we have selected this node:eqx01-flash16** # sherlock --node eqx01-flash16 -H -V |-- eqx01-flash16 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY ============================================================================================================================================================================================= 0 PODS ARE RUNNING ON THIS NODE 9 DEVICES ARE ATTACHED TO THIS NODE |-- DEVID 0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 9: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 11/119194 slices, 18 segs | |-- VOLID 162: pvc-eb63979d-720e-41c9-808f-145306dc1259 576 MB nslices=11 nsnaps=1 nsegs=18 nsegs_per_snap=2 | |-- DEVID 13: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 5/119194 slices, 11 segs | |-- VOLID 163: pvc-66646581-0210-46e2-b945-9ea880be38d7 352 MB nslices=5 nsnaps=1 nsegs=11 nsegs_per_snap=3 | |-- DEVID 12: /dev/sdb READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 14: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 8: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 0: /dev/sda INIT 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/0 slices, 0 segs |-- DEVID 10: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs |-- DEVID 11: /dev/sde READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs THERE ARE 1 FAILED JOBS TO INSPECT BETWEEN Sun Sep 12 05:58:15 PM 2021 - Sun Sep 19 05:58:15 PM 2021 |-- eqx01-flash16.robinsystems.com | |-- HostAddResourcePool jobid=30 state=10 error=1 start=Sun Sep 19 03:23:54 2021 end=Wed Dec 31 16:00:00 1969 | | Host 'eqx01-flash16.robinsystems.com' already has a resource pool 'default' | sherlock produced results in 166 milliseconds (Sun Sep 19 05:58:15 PM 2021). |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections, 0 bundles, 5 users and 5 tenants were analyzed
18.10. Check Pod Health¶
To check the health of a Pod, complete the following steps:
1 To get the list Pods, run the following command and choose a desired Pod from the list:
sherlock -V -H
To check the health of the Pod, run the following command:
# sherlock --pod <pod name> You can optionally use the ``-V`` option to view the details.
Example
# sherlock -V -H SHOWING HEALTH OF 3/3 APPLICATIONS RUNNING IN THE CLUSTER APPNAME: centos1 STATE: ONLINE tuser1 1/1 pods healthy KIND: ROBIN ============================================================================================================================================================================================= APP HAS 1 PODS: POD ID 187: centos1.server.01 on eqx04-flash05 INST: ONLINE/INST: STARTED, NODE: ONLINE, RIO: UP |-- VOLID 238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313 1 GB | |-- DEVID : /dev/sdb (eqx04-flash05) segs=2 slices=1 rawspace=64 MB RDVM: UP |-- VOLID 237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851 1 GB | |-- DEVID : /dev/sdc (eqx04-flash05) segs=0 slices=1 rawspace=0 RDVM: UP | APP IS RUNNING ON THE FOLLOWING 1 NODES: |-- eqx04-flash05 RIO: UP | |-- centos1.server.01 ONLINE/STARTED | APP IS STORING DATA ON THE FOLLOWING 2 DEVICES: |-- DEVID 15: /dev/sdb on eqx04-flash05 1 vols | |-- VOLID 238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313 64 MB nslices=1 nsegs=2 nsnaps=1 segspersnap=2 | |-- DEVID 16: /dev/sdc on eqx04-flash05 1 vols | |-- VOLID 237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851 0 nslices=1 nsegs=0 nsnaps=1 segspersnap=0 # sherlock --pod centos1.server.01 -H -V SHOWING HEALTH OF 1 PODS IN THE CLUSTER: o-- POD/VNODE ID 187: centos1.server.01 STARTED/ONLINE 1 CPU, 200 MB MEM NODE: UP, RIO: UP | |-- VOLID 238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313 64 MB/1 GB nsnaps=1 | | |-- DEVID 15: /dev/sdb on eqx04-flash05 nsegs=2 nslices=1 64 MB | |-- VOLID 237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851 0/1 GB nsnaps=1 | | |-- DEVID 16: /dev/sdc on eqx04-flash05 nsegs=0 nslices=1 0 | THERE ARE 73 FAILED JOBS TO INSPECT BETWEEN Sun Sep 12 10:42:37 PM 2021 - Sun Sep 19 10:42:37 PM 2021 |-- | |-- HostProbe jobid=5 state=10 error=1 start=Sun Sep 19 02:52:57 2021 end=Sun Sep 19 02:52:57 2021 | | HTTPSConnectionPool(host='172.19.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces?fieldSelector=metadata.name%3Drobin- | | admin&limit=0&timeoutSeconds=56 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff69c481d50>: Failed to establish a | | new connection: [Errno 111] Connection refused')) | |-- HostProbe jobid=7 state=10 error=1 start=Sun Sep 19 02:53:21 2021 end=Sun Sep 19 02:53:21 2021 | | HTTPSConnectionPool(host='172.19.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces?fieldSelector=metadata.name%3Drobin- | | admin&limit=0&timeoutSeconds=56 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff69c3c7210>: Failed to establish a | | new connection: [Errno 111] Connection refused')) | |-- HostProbe jobid=6 state=10 error=1 start=Sun Sep 19 02:53:12 2021 end=Sun Sep 19 02:53:12 2021 | | HTTPSConnectionPool(host='172.19.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces?fieldSelector=metadata.name%3Drobin- | | admin&limit=0&timeoutSeconds=56 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff69c3fb090>: Failed to establish a | | new connection: [Errno 111] Connection refused')) | |-- HostAddResourcePool jobid=30 state=10 error=1 start=Sun Sep 19 03:23:54 2021 end=Wed Dec 31 16:00:00 1969 | | Host 'eqx01-flash16.robinsystems.com' already has a resource pool 'default' | |-- DockerRegistryRemove jobid=1870 state=10 error=1 start=Sun Sep 19 15:16:44 2021 end=Wed Dec 31 16:00:00 1969 | | Invalid registry ID '2', does not exist. | |-- TenantUserRemove jobid=2062 state=10 error=1 start=Sun Sep 19 17:52:01 2021 end=Wed Dec 31 16:00:00 1969 | | Delete the following objects or assign them to another tenant user before removing user 'tuser1': Bundles: robintest jobmgr-3.1 | |-- HostRemoveStorageRole jobid=2129 state=10 error=1 start=Sun Sep 19 18:53:57 2021 end=Wed Dec 31 16:00:00 1969 | | Disk 0x500a075109604998 is still part of 1 device sets | |-- HostRemoveRoles jobid=2127 state=10 error=1 start=Sun Sep 19 18:53:57 2021 end=Sun Sep 19 18:53:57 2021 | | Job failed. One or more child jobs reported errors. Error: Disk 0x500a075109604998 is still part of 1 device sets | |-- UserRemove jobid=2061 state=10 error=1 start=Sun Sep 19 17:52:01 2021 end=Sun Sep 19 17:52:02 2021 | | Job failed. One or more child jobs reported errors. Error: Delete the following objects or assign them to another tenant user before removing user 'tuser1': | | Bundles: robintest jobmgr-3.1 | |-- SetTag jobid=2256 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- HostRemoveRoles jobid=2123 state=10 error=1 start=Sun Sep 19 18:49:05 2021 end=Sun Sep 19 18:49:05 2021 | | Job failed. One or more child jobs reported errors. Error: Disk 0x500a075109604998 is still part of 1 device sets | |-- HostRemoveStorageRole jobid=2125 state=10 error=1 start=Sun Sep 19 18:49:05 2021 end=Wed Dec 31 16:00:00 1969 | | Disk 0x500a075109604998 is still part of 1 device sets | |-- HostAddResourcePool jobid=2131 state=10 error=1 start=Sun Sep 19 18:54:04 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot assign resource pool to host 'eqx01-flash15.robinsystems.com' which has a compute or storage role. | |-- SetTag jobid=2232 state=10 error=1 start=Sun Sep 19 20:10:56 2021 end=Sun Sep 19 20:10:56 2021 | | Tags support single value per key and there is already present a tag 'nightly':'non-default', please remove it first | |-- SetTag jobid=2254 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- SetTag jobid=2258 state=10 error=1 start=Sun Sep 19 20:44:14 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- SetTag jobid=2257 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- SetTag jobid=2255 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- SetTag jobid=2260 state=10 error=1 start=Sun Sep 19 20:44:14 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- SetTag jobid=2259 state=10 error=1 start=Sun Sep 19 20:44:14 2021 end=Sun Sep 19 20:44:14 2021 | | Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first | |-- SetTag jobid=2261 state=10 error=1 start=Sun Sep 19 21:05:10 2021 end=Sun Sep 19 21:05:11 2021 | | Tags support single value per key and there is already present a tag 'nightly':'DiskAll', please remove it first | |-- UserAdd jobid=2329 state=10 error=1 start=Sun Sep 19 22:35:21 2021 end=Wed Dec 31 16:00:00 1969 | | A user by the name of 'tuser1' already exists | |-- TenantUserRemove jobid=2325 state=10 error=1 start=Sun Sep 19 22:24:23 2021 end=Wed Dec 31 16:00:00 1969 | | Delete the following objects or assign them to another tenant user before removing user 'tuser1': Namespace with deployed apps: t016-u000018 Bundles: | | centos 7 Applications: centos1, centos12, centos123 | |-- UserRemove jobid=2327 state=10 error=1 start=Sun Sep 19 22:32:35 2021 end=Sun Sep 19 22:32:35 2021 | | Job failed. One or more child jobs reported errors. Error: Delete the following objects or assign them to another tenant user before removing user 'tuser1': | | Namespace with deployed apps: t016-u000018 Bundles: centos 7 Applications: centos1, centos12, centos123 | |-- BundleRemove jobid=2323 state=10 error=1 start=Sun Sep 19 22:21:44 2021 end=Wed Dec 31 16:00:00 1969 | | Applications active for bundle, centos, with bundleid = 89 | |-- UserAdd jobid=2326 state=10 error=1 start=Sun Sep 19 22:27:06 2021 end=Wed Dec 31 16:00:00 1969 | | A user by the name of 'tuser1' already exists | |-- UserRemove jobid=2324 state=10 error=1 start=Sun Sep 19 22:24:23 2021 end=Sun Sep 19 22:24:23 2021 | | Job failed. One or more child jobs reported errors. Error: Delete the following objects or assign them to another tenant user before removing user 'tuser1': | | Namespace with deployed apps: t016-u000018 Bundles: centos 7 Applications: centos1, centos12, centos123 | |-- TenantUserRemove jobid=2328 state=10 error=1 start=Sun Sep 19 22:32:35 2021 end=Wed Dec 31 16:00:00 1969 | | Delete the following objects or assign them to another tenant user before removing user 'tuser1': Namespace with deployed apps: t016-u000018 Bundles: | | centos 7 Applications: centos1, centos12, centos123 | |-- robinte-backup-rw | |-- StorageRepoAdd jobid=1943 state=10 error=1 start=Sun Sep 19 16:08:10 2021 end=Wed Dec 31 16:00:00 1969 | | Storage repo with name 'robinte-backup-rw' already exists. | |-- MySqlOn | |-- ApplicationScalein jobid=330 state=10 error=1 start=Sun Sep 19 05:19:42 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- MariaOn | |-- ApplicationScalein jobid=249 state=10 error=1 start=Sun Sep 19 04:51:29 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- Conflue | |-- ApplicationScalein jobid=747 state=10 error=1 start=Sun Sep 19 06:47:59 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=748 state=10 error=1 start=Sun Sep 19 06:48:00 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=736 state=10 error=1 start=Sun Sep 19 06:46:15 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=743 state=10 error=1 start=Sun Sep 19 06:47:53 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=744 state=10 error=1 start=Sun Sep 19 06:47:55 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=745 state=10 error=1 start=Sun Sep 19 06:47:56 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=746 state=10 error=1 start=Sun Sep 19 06:47:57 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- NginxHo | |-- ApplicationSnapshot jobid=477 state=10 error=1 start=Sun Sep 19 06:04:36 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot snapshot application 'NginxHo' in 'CREATE_FAILED' state | |-- ApplicationSnapshot jobid=478 state=10 error=1 start=Sun Sep 19 06:04:54 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot snapshot application 'NginxHo' in 'CREATE_FAILED' state | |-- ApplicationScale jobid=479 state=10 error=1 start=Sun Sep 19 06:05:01 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scale application 'NginxHo' in 'CREATE_FAILED' state | |-- ApplicationCreate jobid=474 state=10 error=1 start=Sun Sep 19 05:49:13 2021 end=Sun Sep 19 06:04:23 2021 | | Job failed. One or more child jobs reported errors. Error: Job failed. One or more child jobs reported errors. Error: Timeout expired while waiting for pod | | desired phase 'Ready' current phase 'Pending' | |-- ApplicationScalein jobid=481 state=10 error=1 start=Sun Sep 19 06:08:23 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scale-in application 'NginxHo' in 'CREATE_FAILED' state | |-- CentosO | |-- ApplicationScalein jobid=168 state=10 error=1 start=Sun Sep 19 04:22:52 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- RedisCl | |-- ApplicationScalein jobid=457 state=10 error=1 start=Sun Sep 19 05:46:43 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '6' nodes. multinode_min is set to '6' in the bundle definition. | |-- ApplicationScalein jobid=456 state=10 error=1 start=Sun Sep 19 05:46:42 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '6' nodes. multinode_min is set to '6' in the bundle definition. | |-- ApplicationScalein jobid=458 state=10 error=1 start=Sun Sep 19 05:46:44 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '6' nodes. multinode_min is set to '6' in the bundle definition. | |-- ApplicationScalein jobid=459 state=10 error=1 start=Sun Sep 19 05:46:46 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '6' nodes. multinode_min is set to '6' in the bundle definition. | |-- ApplicationScalein jobid=460 state=10 error=1 start=Sun Sep 19 05:46:47 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '6' nodes. multinode_min is set to '6' in the bundle definition. | |-- ApplicationScalein jobid=461 state=10 error=1 start=Sun Sep 19 05:46:48 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '6' nodes. multinode_min is set to '6' in the bundle definition. | |-- NginxHo.nginx.01 | |-- VnodeDeploy jobid=480 state=10 error=1 start=Sun Sep 19 06:07:41 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot perform Relocate Operation on Vnode 'NginxHo.nginx.01' of an application in failed state | |-- VnodeDeploy jobid=482 state=10 error=1 start=Sun Sep 19 06:08:28 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot perform Repair Operation on Vnode 'NginxHo.nginx.01' of an application in failed state | |-- VnodeAdd jobid=476 state=10 error=1 start=Sun Sep 19 05:49:18 2021 end=Sun Sep 19 06:04:23 2021 | | Timeout expired while waiting for pod desired phase 'Ready' current phase 'Pending' | |-- Postgre | |-- ApplicationScalein jobid=411 state=10 error=1 start=Sun Sep 19 05:37:06 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- nginx | |-- RoleCreate jobid=475 state=10 error=1 start=Sun Sep 19 05:49:18 2021 end=Sun Sep 19 06:04:23 2021 | | Job failed. One or more child jobs reported errors. Error: Timeout expired while waiting for pod desired phase 'Ready' current phase 'Pending' | |-- mysql | |-- ApplicationSnapshot jobid=1190 state=10 error=1 start=Sun Sep 19 10:15:01 2021 end=Sun Sep 19 10:15:59 2021 | | Snapshotting application mysql failed | |-- ApplicationDataSnapshot jobid=1191 state=10 error=1 start=Sun Sep 19 10:15:30 2021 end=Sun Sep 19 10:15:31 2021 | | Failed to create snapshot: 529 Volume mysql.mysql.01.data.1.a0374e98-442b-46c2-8784-032461845f7f (103) snapshots is using 384.0MB which is above space limit | | 256.0MB. No new snapshot is allowed. | |-- ApplicationCreate jobid=1446 state=10 error=1 start=Sun Sep 19 13:11:57 2021 end=Wed Dec 31 16:00:00 1969 | | Non static IP allocations cannot be done from non-range(network) IP pools -> 'demo-ovs-pool-nw' | |-- ApplicationCreate jobid=1487 state=10 error=1 start=Sun Sep 19 13:22:31 2021 end=Wed Dec 31 16:00:00 1969 | | Static IP 10.10.0.10 is not a part of the IP pool demo-ovs-pool-nw associated with this application | |-- ApplicationCreate jobid=1536 state=10 error=1 start=Sun Sep 19 13:37:28 2021 end=Wed Dec 31 16:00:00 1969 | | Static IP 10.9.63.151 is not within the current Robin IP Pool ranges. Cannot allocate static IP | |-- ApplicationCreate jobid=1593 state=10 error=1 start=Sun Sep 19 13:53:41 2021 end=Wed Dec 31 16:00:00 1969 | | Static IP 10.9.105.64 is not a part of the IP pool demo-ovs-pool associated with this application | |-- Elastic | |-- ApplicationScalein jobid=903 state=10 error=1 start=Sun Sep 19 07:11:51 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=904 state=10 error=1 start=Sun Sep 19 07:11:52 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=905 state=10 error=1 start=Sun Sep 19 07:11:53 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- WordPre | |-- ApplicationScalein jobid=1023 state=10 error=1 start=Sun Sep 19 07:31:58 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- ApplicationScalein jobid=1022 state=10 error=1 start=Sun Sep 19 07:31:57 2021 end=Wed Dec 31 16:00:00 1969 | | Cannot scaledown to less than '1' nodes. multinode_min is set to '1' in the bundle definition. | |-- centos | |-- ApplicationCreate jobid=1742 state=10 error=1 start=Sun Sep 19 14:09:40 2021 end=Sun Sep 19 14:15:18 2021 | | Job failed. One or more child jobs reported errors. Error: Job failed. One or more child jobs reported errors. Error: time="2021-09-19T14:13:04-07:00" | | level=fatal msg="pulling image: rpc error: code = Unknown desc = Exception calling application: ErrorUnknown:StatusCode.UNKNOWN:Error response from daemon: | | Get https://artifactory.robinsystems.com/v2/robinsys/centos/manifests/7: unauthorized: BAD_CREDENTIAL" | |-- ApplicationCreate jobid=2330 state=10 error=1 start=Sun Sep 19 22:40:53 2021 end=Sun Sep 19 22:40:54 2021 | | Tenant Application count (3) equals or exceeds max_apps_per_tenant limit (2) | |-- server | |-- RoleCreate jobid=1743 state=10 error=1 start=Sun Sep 19 14:09:42 2021 end=Sun Sep 19 14:15:18 2021 | | Job failed. One or more child jobs reported errors. Error: time="2021-09-19T14:13:04-07:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = | | Exception calling application: ErrorUnknown:StatusCode.UNKNOWN:Error response from daemon: Get | | https://artifactory.robinsystems.com/v2/robinsys/centos/manifests/7: unauthorized: BAD_CREDENTIAL" | |-- centos.server.01 | |-- VnodeAdd jobid=1744 state=10 error=1 start=Sun Sep 19 14:09:42 2021 end=Sun Sep 19 14:15:18 2021 | | time="2021-09-19T14:13:04-07:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = Exception calling application: | | ErrorUnknown:StatusCode.UNKNOWN:Error response from daemon: Get https://artifactory.robinsystems.com/v2/robinsys/centos/manifests/7: unauthorized: | | BAD_CREDENTIAL" | |-- robi-bkup-rw | |-- StorageRepoAdd jobid=2142 state=10 error=1 start=Sun Sep 19 18:58:24 2021 end=Wed Dec 31 16:00:00 1969 | | Storage repo with supplied parameters already exists: robinte-backup-rw | |-- alpine-limit | |-- ImageAdd jobid=2276 state=10 error=1 start=Sun Sep 19 21:40:19 2021 end=Wed Dec 31 16:00:00 1969 | | Tenant ImageAdd count (2) equals or exceeds max_images_per_tenant limit (2) | |-- podlimit | |-- ApplicationCreate jobid=2322 state=10 error=1 start=Sun Sep 19 22:02:11 2021 end=Sun Sep 19 22:02:11 2021 | | Tenant Application count (3) equals or exceeds max_pods_per_tenant limit (3) | sherlock produced results in 169 milliseconds (Sun Sep 19 10:42:37 PM 2021). |-- 3 nodes, 26 disks, 9 vols, 9 snapshots, 3 apps, 3 pods, 1 file-collections, 1 bundles, 15 users and 16 tenants were analyzed
18.11. Check Volume Health¶
Sherlock provides you an option to check health of the Volumes attached to virtual nodes or pods. It is recommended to validate this check frequently.
To check volume health attached to vNodes or Pods, complete the following steps:
Run the following command to get the list of volumes to choose for checking health.
robin volume list
Run the following command to check health of the volume.
# sherlock --vol <volume name> -H -V
Example
# robin volume list ID | Name | Media | Type | BlockSz | Size | Psize | SnapLimit | Prot | Repl | Compress | QGroupID | CTime | Mount ---+--------------------------------------------------------------------+-------+------+---------+-------------+-----------+------------+------+------+----------+----------+------------+---------------------------------------------------------------------------------------------- 1 | file-collection-1557385703714.98f09770-bedd-496f-a7d1-1fd3d1ec840e | 72 | 0 | 4096 | 10737418240 | 268435456 | 0 | 0 | 1 | 1 | 1 | 1557385704 | [{'readonly': 0, 'state': 14, 'zoneid': 1556896675, 'nodeid': 1, 'mntpath': '/dev/vblock0'}] 4 | mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 | 72 | 0 | 4096 | 1073741824 | 469762048 | 214748364 | 0 | 1 | 0 | 4 | 1557390977 | [{'readonly': 0, 'state': 14, 'zoneid': 1556896675, 'nodeid': 2, 'mntpath': '/dev/vblock0'}] 5 | mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 | 72 | 0 | 4096 | 10737418240 | 335544320 | 2147483648 | 0 | 1 | 0 | 5 | 1557390977 | [{'readonly': 0, 'state': 14, 'zoneid': 1556896675, 'nodeid': 2, 'mntpath': '/dev/vblock1' # sherlock --vol file-collection-1557385703714.98f09770-bedd-496f-a7d1-1fd3d1ec840e -H -V SHOWING HEALTH OF 1/3 VOLUMES IN THE CLUSTER |-- VOLID 1: file-collection-1557385703714.98f09770-bedd-496f-a7d1-1fd3d1ec840e 256 MB / 10 GB 1 snapshots, using 1 devices | |-- DEVID 9: /dev/sdd on centos-60-182 using 256 MB/127 GB capacity, 8/10 slices, 8 segs, segspernap=1 RDVM: UP, DEV: READY | | (WWN: 0x60022480d380ceef6ddbcdfa327834ca PATH: /dev/disk/by-id/scsi-360022480d380ceef6ddbcdfa327834ca) | | | |-- SNAPSHOTS: 1 CREATED DEV OWN CLONES STATE SIZE | | |-- SNAPID 1: 1969/12/31 16:00:00 8 8 0 READY 256 MB | | | | All volumes are healthy sherlock produced results in 70 milliseconds (Fri May 10 01:46:43 AM 2019). |-- 3 nodes, 12 disks, 3 vols, 7 snapshots, 1 apps, 1 vnodes, 2 users and 1 tenants were analyze
18.12. Check Devices Health¶
To check health of a device, complete the following steps:
Run the following command to get the list of devices to choose for checking health. From the list select the desired device name.
sherlock -H
Run the following command to check health of the volume.
# sherlock --dev <device name> -H -V
Example
# sherlock -H No matching apps found No matching pods found SHOWING HEALTH OF 1/1 VOLUMES IN THE CLUSTER |-- VOLID 1: file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44, usage: 448 MB / 20 GB, 1 snapshots, resync progress: SYNCED, using 1 devices All volumes are healthy SHOWING HEALTH OF 2/2 NODES RUNNING IN THE CLUSTER |-- eqx01-flash15 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY |-- eqx01-flash16 ONLINE 0 errors, 0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY SHOWING HEALTH OF 18/18 DEVICES IN THE CLUSTER |-- /dev/sdh@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW) | |-- /dev/sde@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0) | |-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90) | |-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2) | |-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33) | |-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7) | |-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT) | |-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998) | |-- /dev/sdf@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS) | |-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3) | |-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS) | |-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2) | |-- /dev/sdi@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1) | |-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY | (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8) **For this example, we used this device name: 0x500a075109604998** # sherlock --dev 0x500a075109604998 -H -V DEVICE /dev/sdb on eqx01-flash15 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY ============================================================================================================================================================================================= |==> WWN: 0x500a075109604998 | PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998 |-- VOL: 1 file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44 448 MB nslices=20 nsegs=14 (1 ) nsnaps=1 sherlock produced results in 130 milliseconds (Sun Sep 19 03:34:51 AM 2021). |-- 2 nodes, 18 disks, 1 vols, 1 snapshots, 0 apps, 0 pods, 1 file-collections, 0 bundles, 1 users and 1 tenants were analyzed
18.13. Check Devices Nearing Maximum Capacity¶
Using Sherlock you can check devices space usage statistics and identify if any attached device’s capacity is nearing maximum capacity. If any device is nearing its maximum capacity, Robin recommends adding more devices to boost the performance.
To check devicies nearing maximum capacity, run the following command:
# sherlock --dev full -H -V
If there are any devices nearing to maximum capacity, the output will display.
18.14. Find Devices With Rebalance Need¶
To check the devices that might need rebalance, run the following command:
# sherlock --devs-needing-rebalance
# sherlock --devs-needing-rebalance SHOWING APPLICATIONS THAT NEED ATTENTION: All apps are healthy SHOWING PODS THAT NEED ATTENTION: All pods are healthy SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION All volumes are healthy SHOWING UNHEALTHY NODES THAT NEED ATTENTION: All nodes are healthy SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION: All devices are healthy SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION: All file collection are available SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION: All bundles are available Moving 4 vols, 20 slices and 256 segments: eqx04-flash05 /dev/sdb 894.3 GB/894.3 GB (free=100.0 %) slices= 0/57194 segs= 0/57194 vols= 0/100 [ 1.26 ] eqx04-flash05 /dev/sdc 894.3 GB/894.3 GB (free=100.0 %) slices= 0/57194 segs= 0/57194 vols= 0/100 [ 1.26 ] eqx04-flash05 /dev/sda 894.3 GB/894.3 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx04-flash05 /dev/sdd 894.3 GB/894.3 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx04-flash05 /dev/sde 59.6 GB/59.6 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx04-flash05 /dev/dm-1 17.4 GB/17.4 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx04-flash05 /dev/dm-2 35.7 GB/35.7 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx04-flash05 /dev/dm-0 6.0 GB/6.0 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] ------------------------------------------------------------------------------------------------------------------------ eqx01-flash16 /dev/sdb 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash16 /dev/sdg 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash16 /dev/sdd 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash16 /dev/sdi 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash16 /dev/sde 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash16 /dev/sdf 1.8 TB/1.8 TB (free=100.0 %) slices= 5/119194 segs= 11/119194 vols= 1/100 [ 1.25 ] eqx01-flash16 /dev/sdh 1.8 TB/1.8 TB (free=100.0 %) slices= 11/119194 segs= 18/119194 vols= 1/100 [ 1.25 ] eqx01-flash16 /dev/sdc 14.9 GB/14.9 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx01-flash16 /dev/sda 1.8 TB/1.8 TB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] ------------------------------------------------------------------------------------------------------------------------ eqx01-flash15 /dev/sde 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash15 /dev/sdf 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash15 /dev/sdi 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash15 /dev/sdg 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash15 /dev/sdh 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash15 /dev/sdd 1.8 TB/1.8 TB (free=100.0 %) slices= 0/119194 segs= 0/119194 vols= 0/100 [ 1.26 ] eqx01-flash15 /dev/sdb 893.8 GB/894.3 GB (free=100.0 %) slices= 20/57194 segs= 14/57194 vols= 1/100 [ 1.24 ] eqx01-flash15 /dev/sda 894.3 GB/894.3 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] eqx01-flash15 /dev/sdc 14.9 GB/14.9 GB (free=100.0 %) slices= 0/0 segs= 0/0 vols= 0/100 [ -1.00 ] Only unhealthy objects are shown. To see everything re-run with -H|--healthy option To see more details rerun with -V|--verbose option sherlock produced results in 131 milliseconds (Sun Sep 19 05:19:44 PM 2021). |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections, 1 bundles, 2 users and 2 tenants were analyzed