18. Sherlock - Troubleshooting Tool

Sherlock is a troubleshooting and self-diagnostic command-line tool (CLI) in Robin. It is designed to assist Robin administrators to identify and analyze any problems with Robin clusters. Using Sherlock, you as an administrator can diagnose cluster-wide problems or any information regarding specific applications, nodes, containers, volumes, devices, and so on. It provides details of problems affecting from applications to spindle.

Note

By default, Sherlock displays only unhealthy objects.

Sherlock tool generates details for troubleshooting by querying a range of Robin APIs and direct database calls to generate cluster health reports.

Note

You can access the Sherlock tool only from the active master node. On CNS clusters, you can access using the primary robinds Pod.

18.1. Sherlock Use Cases

You can use the Sherlock tool when you notice any problem with your Robin cluser to know details of the problem, plan maintenance, etc.

Analyze problem details

When there is an issue with the Robin cluster, Sherlock detects it automatically and displays all the impacted objects when you use the Sherlock tool.

For example, if a disk goes down on the cluster, Sherlock displays all the impacted objects because of the disk(device) issue, such as volumes, Pods, applications, users, etc. Similarly, Sherlock detects the problems with other objects and displays details.

This information helps you to detect the issue faster and fix it.

Plan maintenance activities

Sherlock also helps you to plan maintenance activities.

For example, if you want to bring down any node for maintenance, you can run the Sherlock tool and check the impacted objects (Pods, volumes, applications, and users) for planning maintenance.

18.2. Details you can View Using Sherlock

Sherlock is useful as it allows you to map resources both top-down and bottom-up through the resource hierarchy.

The Sherlock tool provides details of the following:

  • Application to Pod mappings

    • Source of computing resources for Pods

    • Volumes attached to Pods

  • Source of storage resources for Volume

    • Number of snapshots, replicas

    • Source of replication storage resources

  • Disks attached to hosts

  • Disks capacity and their consumption

  • Volumes that are hosted and Pods that owns the volumes

  • Robin cluster critical services status

    For example:

    Node → Disk

    Node → Pod → Application

    Disk → Volumes → Pods → Application

    Volumes → Pods → Application

    Volumes → File-object → Bundle → App

    Pods → Application

    Critical node services → Node → Pods → Application

    Application → Users/Tenant

18.3. Sherlock Report

A report generated by Sherlock using the sherlock command has the following sections:

SHOWING APPLICATIONS THAT NEED ATTENTION:

This section of the report displays the unhealthy applications running on your cluster (with -H healthy apps), where the Pods comprising the application are located and volumes, devices on which data is saved.

SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION:

This section of the report displays details of the unhealth Pods (with -H healthy Pods) running on your cluster and the volumes, devices that are being used for storing data.

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:

This section of the report displays details of unhealthy volumes (with -H health volumes) running on your cluster and the devices that these volumes are using for storing data.

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:

This section of the report displays unhealthy nodes (with -H healthy) of the nodes and services running on the nodes, Pods in the nodes, volumes, and devices.

SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:

This section of the report displays the status of unhealthy (with -H healthy) devices and impacted objects.

18.4. Access Sherlock

After you log into your Robin cluster you can access the Sherlock troubleshooting tool. The Sherlock tool is part of Robin CNS and Robin CNP. You do not need to install anything manually to start using it.

The tool displays details of the following components that might need your attention in the form of a report:

  • Applications

  • Pods

  • Volumes

  • Nodes

  • Devices

  • Unavailable file collection

  • Unavailable bundle details

Note

If all components are fine, the tool displays all healthy or available messages.

Prerequisites

  • You must access the Sherlock tool only from the active master node.

  • On CNS clusters, you must run the Sherlock commands from the primary robinds Pod.

To access Sherlock, run the following command:

# sherlock

Note

You can run the sherlock --help after accessing Sherlock to view commands options.

Example showing all healthy and available

# sherlock

    SHOWING APPLICATIONS THAT NEED ATTENTION:
    All apps are healthy

    SHOWING PODS THAT NEED ATTENTION:
    All pods are healthy

    SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION
    All volumes are healthy

    SHOWING UNHEALTHY NODES THAT NEED ATTENTION:
    All nodes are healthy

    SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:
    All devices are healthy

    SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION:
    All file collection are available

    SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION:
    All bundles are available

    Only unhealthy objects are shown. To see everything re-run with -H|--healthy opt                                                                                                             ion
    To see more details rerun with -V|--verbose option
    sherlock produced results in 155 milliseconds (Sat Sep 18 06:14:59 PM 2021).
    |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
        2 bundles, 1 users and 1 tenants were analyzed

Example showing unhealthy app and Pods

# sherlock

SHOWING APPLICATIONS THAT NEED ATTENTION:
|-- robinte STATE: PLANNED      Robin Systems     2/2 pods unhealthy KIND: ROB                                                                                                             IN

SHOWING USERS WHO ARE AFFECTED:
|-- Robin Systems (Firstname: Robin LastName: Systems Email: None)
|   |-- APPS 1: robinte

SHOWING PODS THAT NEED ATTENTION:
o-- POD/VNODE ID  121: robinte.R1.01 INSTALLING/ONLINE   1 CPU, 50 MB MEM NODE                                                                                                             : UP, RIO: UP
|-- POD/VNODE ID  122: robinte.R2.01 INSTALLING/ONLINE   1 CPU, 50 MB MEM NODE                                                                                                             : UP, RIO: UP

SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION
All volumes are healthy

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:
All nodes are healthy

SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:
All devices are healthy

SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION:
All file collection are available

SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION:
All bundles are available

18.5. View Health of All objects

You can view the health of all objects using the -H option. The -H option displays healthy objects along with unhealthy objects in the report. Sherlock displays health of all objects(Pods, volumes, applications, nodes, devices, file collection, bundle).

To view health of all objects, run the following command:

# sherlock -H

Note

You can use the -V option to view the details of all healthy and unhealthy objects. Also, you can the -H (healthy) and -V (verbose) options together with other commands as well.

Example with -H

# sherlock -H


      No matching apps found
      No matching pods found

      SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER
      |-- VOLID     1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b, usage:  448 MB /  20 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
      |-- VOLID   132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839                          , usage:  352 MB /   5 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
      |-- VOLID   131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1                          , usage:  576 MB /  11 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
      All volumes are healthy

      SHOWING HEALTH OF 3/3 NODES RUNNING IN THE CLUSTER
      |-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
      |-- eqx04-flash05 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
      |-- eqx01-flash15 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY

      SHOWING HEALTH OF 26/26 DEVICES IN THE CLUSTER
      |-- /dev/sdi@eqx01-flash16 | 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1)
      |
      |-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8)
      |
      |-- /dev/sde@eqx01-flash15 | 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0)
      |
      |-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90)
      |
      |-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2)
      |
      |-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33)
      |
      |-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7)
      |
      |-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT)
      |
      |-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998)
      |
      |-- /dev/sdb@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x500a07510ec79d1f PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F)
      |
      |-- /dev/sdh@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW)
      |
      |-- /dev/sdf@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS)
      |
      |-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3)
      |
      |-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS)
      |
      |-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2)
      |
      |-- /dev/sdc@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
      |   (WWN: 0x500a07510ee9a052 PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052)
      |

      SHOWING 1 FILE COLLECTIONS IN THE CLUSTER
      |-- file-collection-1631971248912 Online     0 errors  0 warnings

      SHOWING 1 BUNDLES IN THE CLUSTER
      |-- wordpress ONLINE     0 errors  0 warnings

      To see more details rerun with -V|--verbose option
      sherlock produced results in 200 milliseconds (Sat Sep 18 11:51:21 PM 2021).
      |-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
          1 bundles, 3 users and 3 tenants were analyzed

Example with -H and -V

# sherlock -H -V

No matching apps found
No matching pods found

SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER
|-- VOLID     1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b, usage:  448 MB /  20 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|   |-- DEVID 1: /dev/sdb on eqx01-flash15 using 448 MB/894.3 GB capacity, 14/20 slices, 14 segs, segspernap=1 RDVM: UP, DEV: READY
|   |             (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998)
|   |
|   |-- SNAPSHOTS: 1     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 1969/12/31 16:00:00    14   14      0 READY        448 MB
|   |   |
|
|-- VOLID   132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839                          , usage:  352 MB /   5 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|   |-- DEVID 2: /dev/sde on eqx01-flash15 using 352 MB/1.8 TB capacity, 11/5 slices, 11 segs, segspernap=3 RDVM: UP, DEV: READY
|   |             (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0)
|   |
|   |-- SNAPSHOTS: 1     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 1969/12/31 16:00:00    11   11      0 READY        352 MB
|   |   |
|
|-- VOLID   131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1                          , usage:  576 MB /  11 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|   |-- DEVID 11: /dev/sdi on eqx01-flash16 using 576 MB/1.8 TB capacity, 18/11 slices, 18 segs, segspernap=2 RDVM: UP, DEV: READY
|   |             (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1)
|   |
|   |-- SNAPSHOTS: 1     CREATED               DEV  OWN CLONES STATE          SIZE
|   |   |-- SNAPID    1: 1969/12/31 16:00:00    18   18      0 READY        576 MB
|   |   |
|
All volumes are healthy
|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

9 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID   11: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 11/119194 slices, 18 segs
|   |-- VOLID  131: pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1                             576 MB nslices=11  nsnaps=1  nsegs=18   nsegs_per_snap=2
|
|-- DEVID   12: /dev/sde READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    9: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    8: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   10: /dev/sdb READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   14: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   13: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    0: /dev/sda INIT 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/0 slices, 0 segs

|-- eqx04-flash05 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

8 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID   16: /dev/sdb READY 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/57194 slices, 0 segs
|-- DEVID    0: /dev/sda INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/sdd INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID   15: /dev/sdc READY 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/57194 slices, 0 segs
|-- DEVID    0: /dev/sde INIT 59.6 GB free=59.6 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/dm-1 INIT 17.4 GB free=17.4 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/dm-2 INIT 35.7 GB free=35.7 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/dm-0 INIT 6.0 GB free=6.0 GB (100%) 0/100 vols, 0/0 slices, 0 segs

|-- eqx01-flash15 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

9 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID    0: /dev/sda INIT 894.3 GB free=894.3 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    2: /dev/sde READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 5/119194 slices, 11 segs
|   |-- VOLID  132: pvc-94229d46-e381-4e3c-99a1-ddfe389d7839                             352 MB nslices=5   nsnaps=1  nsegs=11   nsegs_per_snap=3
|
|-- DEVID    3: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    7: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    5: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    6: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    4: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    1: /dev/sdb READY 894.3 GB free=893.8 GB (100%) 1/100 vols, 20/57194 slices, 14 segs
|   |-- VOLID    1: file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b   448 MB nslices=20  nsnaps=1  nsegs=14   nsegs_per_snap=1
|


DEVICE /dev/sdi on eqx01-flash16 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0fe71f | PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1
|-- VOL: 131 pvc-a140c841-0e2a-4d91-be7c-c7c75b5756b1    576 MB nslices=11  nsegs=18   (2  ) nsnaps=1

DEVICE /dev/sde on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0feea2 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8

DEVICE /dev/sde on eqx01-flash15 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0db2c7 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0
|-- VOL: 132 pvc-94229d46-e381-4e3c-99a1-ddfe389d7839    352 MB nslices=5   nsegs=11   (3  ) nsnaps=1

DEVICE /dev/sdf on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0db9be | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90

DEVICE /dev/sdi on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dbae3 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2

DEVICE /dev/sdg on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0ddd62 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33

DEVICE /dev/sdh on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0df3ba | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7

DEVICE /dev/sdd on eqx01-flash15 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c101de8 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT

DEVICE /dev/sdb on eqx01-flash15 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a075109604998 | PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998
|-- VOL: 1 file-collection-1631971248912.0798c2d5-332f-4c6f-96e6-8283a431851b    448 MB nslices=20  nsegs=14   (1  ) nsnaps=1

DEVICE /dev/sdb on eqx04-flash05 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a07510ec79d1f | PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F

DEVICE /dev/sdh on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0d9e30 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW

DEVICE /dev/sdf on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dc21f | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS

DEVICE /dev/sdb on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dd039 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3

DEVICE /dev/sdg on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0dee42 | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS

DEVICE /dev/sdd on eqx01-flash16 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x5000c5008c0df26c | PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2

DEVICE /dev/sdc on eqx04-flash05 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a07510ee9a052 | PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052

SHOWING 1 FILE COLLECTIONS IN THE CLUSTER

SHOWING 1 BUNDLES IN THE CLUSTER

sherlock produced results in 141 milliseconds (Sat Sep 18 11:45:50 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    1 bundles, 3 users and 3 tenants were analyzed

18.6. View Sherlock Command Options

To view sherlock command options, run the following command:

# sherlock --help

Resource Inspection Options

Using the Resource Inspection Command Options, you can provide comma-separated values of resources(objects) names. It enables you to view details of multiple objects at a time.

-a | --app    NAME,...

Displays applications information.

-n | --node   NAME,...

Displays nodes information.

-p | --pod    NAME,...

Displays Pods information.

-v | --vol    NAME,...

Displays volumes information.

-d | --dev    NAME,...

Displays devices information.

Advisory Rebalancing Options

You can use the rebalancing command options to known the disks that are being overutilized or underutilized. Based on the report advice, you can make the adjustments

-D | --devs-needing-rebalance

Displays information about devices that need rebalancing.

-L | --vols-needing-rebalance

Displays information about volumes that need rebalancing

-Y | --dev-rebalance-advice DEV

Provides advice on device rebalancing

-X | --vol-rebalance-advice VOL

Provides advice on volume rebalancing.

Behavior Controlling Options

-U | --strict

Mark resources that are not fully online as unhealthy. This option displays the resources (objects) that are partially healthy as unhelathy.

-C | --cache

Build and use cache to speed up queries. It caches resources once and use the same cache for subsequent queries. Use when you run Sherlock repeatedly run with different options.

-H | --healthy

Also show healthy resources. Displays healthy resources (objects) along with unhealthy objects.

-V | --verbose

Displays detailed report.

-M | --html

Print output in HTML format. Use this along with --outfile and provide a path to save an HTML file. You can open the HTML file in a browser.

-O | --outfile

Print output to specified file. Provide any file path to save a file. For example: /tmp/sherlock-output.

-K | --no-skip

Don’t skip unimportant resources to minimize output.

-J | --scan-joblogs

Scan job logs for errors.

-S | --mon SECS

Monitor the resource metrics for every <secs> interval. Use this option along with --app, --pod or --vol.

--start TIME

Start scanning jobs starting at this date/time (default: 72 hrs ago).

--end TIME

End scanning jobs at this date/time (default: now).

--server PORT

Run in server mode so you can operate this from a web brower.

--prom

Run in server mode to serve metrics in Prometheus format.

Device Options

--dev all

Displays information about all devices

--dev full

Displays information devices that are near full

--dev NODENAME:,...

Displays information devices of <nodename>

--dev wwn,...

Displays information for devices with matching WWNs

--dev NODENAME:DEVPATH,...

Displays information for specific device

18.7. Access Sherlock On Web Server

You can access the Sherlock tool on a web browser. You can provide an available port number between 1-65535.

To set a port number to access the Sherlock tool for your cluster, run the following command:

# sherlock --server <port number>

Example

# sherlock --server 45536
running the read_config now

Running in server mode. Point your web browser to the following address:

https://eqx01-flash15:45536

18.8. Check Application Health

Using Sherlock you can check the health of deployed applications.

The Sherlock command output provides the following details about an application:

  • Volumes or devices on which the application data is stored

  • Pods in which the application is running

  • Node on which the Pod is existing

  • Running application details

  • Failed jobs on the specific application

Prerequisite

You must have the application name for which you want to check the health. You can run the sherlock -H command to know the application name.

To check the application health, run the following command:

sherlock --app <application-name>

You can optionally use the ``-V`` option to view the details.

Example

sherlock --app mysql-test -V -H

APPNAME: mysql-test STATE: ONLINE Robin Systems 1/1 vnodes healthy
==============================================================================================================================================================================================================================================
APP HAS 1 VNODES:
VNODEID 2: mysql-test.mysql.01 on centos-60-181 INST: ONLINE/INST: STARTED, NODE: ONLINE, RIO: UP
|-- VOLID 4: mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 1 GB
| |-- DEVID : /dev/sdd segs=centos-60-181 slices=14 rawspace=1 448 MB
|-- VOLID 5: mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 10 GB
| |-- DEVID : /dev/sdd segs=centos-60-181 slices=10 rawspace=10 320 MB
|
APP IS RUNNING ON THE FOLLOWING 1 NODES:
|-- centos-60-181 RIO: UP
| |-- mysql-test.mysql.01 ONLINE/STARTED
|
APP IS STORING DATA ON THE FOLLOWING 1 DEVICES:
|-- DEVID 6: /dev/sdd on centos-60-181 2 vols
| |-- VOLID 4: mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 448 MB nslices=1 nsegs=14 nsnaps=3 segspersnap=5
| |-- VOLID 5: mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 320 MB nslices=10 nsegs=10 nsnaps=3 segspersnap=1
|

THERE ARE 23 FAILED JOBS TO INSPECT BETWEEN Fri May 3 01:22:23 AM 2019 - Fri May 10 01:22:23 AM 2019
|-- mysql-test.mysql.01
| |-- VnodeDelete jobid=98 state=10 error=1 start=Thu May 9 00:23:31 2019 end=Thu May 9 00:23:32 2019
| | predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to execute
| |-- VnodeDelete jobid=88 state=10 error=1 start=Thu May 9 00:20:37 2019 end=Thu May 9 00:20:43 2019
| | postdestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to execute
|
|-- mysql-test
| |-- ApplicationDelete jobid=97 state=10 error=1 start=Thu May 9 00:23:31 2019 end=Thu May 9 00:23:32 2019
| | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
| |-- ApplicationDelete jobid=87 state=10 error=1 start=Thu May 9 00:20:37 2019 end=Thu May 9 00:20:43 2019
| | Job failed. One or more child jobs reported errors. Error: 'postdestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
| |-- ApplicationDelete jobid=92 state=10 error=1 start=Thu May 9 00:22:17 2019 end=Thu May 9 00:22:18 2019
| | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
| |-- ApplicationDelete jobid=95 state=10 error=1 start=Thu May 9 00:22:59 2019 end=Thu May 9 00:23:00 2019
| | Job failed. One or more child jobs reported errors. Error: 'predestroy vnodehook cmd bash /var/lib/robin/.file_object_cache/64f1ef8529796f8199a63eaf2e65365f/scripts/vnode_sample <REDACTED ARGS> failed to
| | execute'
|
|-- mysql-test1
| |-- ApplicationCreate jobid=129 state=10 error=1 start=Thu May 9 03:50:54 2019 end=Thu May 9 03:50:54 2019
| | Invalid Zone Id and/or Bundle Id: 1/2
| |-- ApplicationCreate jobid=128 state=10 error=1 start=Thu May 9 03:50:08 2019 end=Thu May 9 03:50:08 2019
| | Invalid Zone Id and/or Bundle Id: 1/2
|
sherlock produced results in 90 milliseconds (Fri May 10 01:22:23 AM 2019).
|-- 3 nodes, 12 disks, 3 vols, 7 snapshots, 1 apps, 1 vnodes, 2 users and 1 tenants were analyzed

18.9. Check Node Health

When you check health of a node, you can find details about all the objects (Applications, Pods, voulmes, devices, file collection, and bundles) under the node.

To check the health of a node, complete the following steps:

1 To get the list of nodes for selecting a desired node for health check, run the following command:

# sherlock -H -K
  1. To know the health of a node and its details, run the following command:

    # sherlock --node <node name>
    

You can optionally use the -V option to view the details.

Example

# sherlock -H -K
No matching apps found
No matching pods found

SHOWING HEALTH OF 3/3 VOLUMES IN THE CLUSTER
|-- VOLID     1: file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44, usage:  448 MB /  20 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|-- VOLID   163: pvc-66646581-0210-46e2-b945-9ea880be38d7                          , usage:  352 MB /   5 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
|-- VOLID   162: pvc-eb63979d-720e-41c9-808f-145306dc1259                          , usage:  576 MB /  11 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
All volumes are healthy

SHOWING HEALTH OF 3/3 NODES RUNNING IN THE CLUSTER
|-- eqx04-flash05 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
|-- eqx01-flash15 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY

SHOWING HEALTH OF 26/26 DEVICES IN THE CLUSTER
|-- /dev/sdh@eqx01-flash16 | 1/100 vols | 11/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW)
|
|-- /dev/sde@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0)
|
|-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90)
|
|-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2)
|
|-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33)
|
|-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7)
|
|-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT)
|
|-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998)
|
|-- /dev/sdf@eqx01-flash16 | 1/100 vols | 5/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS)
|
|-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3)
|
|-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS)
|
|-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2)
|
|-- /dev/sdi@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1)
|
|-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8)
|
|-- /dev/sdb@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a07510ec79d1f PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14330EC79D1F)
|
|-- /dev/sdc@eqx04-flash05 | 0/100 vols | 0/57194 slices | 894.3 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a07510ee9a052 PATH: /dev/disk/by-id/ata-Crucial_CT960M500SSD1_14280EE9A052)
|

SHOWING 1 FILE COLLECTIONS IN THE CLUSTER
|-- file-collection-1632045271349 Online     0 errors  0 warnings

SHOWING 0 BUNDLES IN THE CLUSTER
All bundles are available

To see more details rerun with -V|--verbose option
sherlock produced results in 158 milliseconds (Sun Sep 19 06:02:28 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    0 bundles, 5 users and 5 tenants were analyzed

**For this example, we have selected this node:eqx01-flash16**

# sherlock --node eqx01-flash16 -H -V

|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
=============================================================================================================================================================================================

0 PODS ARE RUNNING ON THIS NODE

9 DEVICES ARE ATTACHED TO THIS NODE
|-- DEVID    0: /dev/sdc INIT 14.9 GB free=14.9 GB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID    9: /dev/sdh READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 11/119194 slices, 18 segs
|   |-- VOLID  162: pvc-eb63979d-720e-41c9-808f-145306dc1259                             576 MB nslices=11  nsnaps=1  nsegs=18   nsegs_per_snap=2
|
|-- DEVID   13: /dev/sdf READY 1.8 TB free=1.8 TB (100%) 1/100 vols, 5/119194 slices, 11 segs
|   |-- VOLID  163: pvc-66646581-0210-46e2-b945-9ea880be38d7                             352 MB nslices=5   nsnaps=1  nsegs=11   nsegs_per_snap=3
|
|-- DEVID   12: /dev/sdb READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   14: /dev/sdg READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    8: /dev/sdd READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID    0: /dev/sda INIT 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/0 slices, 0 segs
|-- DEVID   10: /dev/sdi READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs
|-- DEVID   11: /dev/sde READY 1.8 TB free=1.8 TB (100%) 0/100 vols, 0/119194 slices, 0 segs


THERE ARE 1 FAILED JOBS TO INSPECT BETWEEN Sun Sep 12 05:58:15 PM 2021 - Sun Sep 19 05:58:15 PM 2021
    |-- eqx01-flash16.robinsystems.com
    |   |-- HostAddResourcePool jobid=30 state=10 error=1 start=Sun Sep 19 03:23:54 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Host 'eqx01-flash16.robinsystems.com' already has a resource pool 'default'
    |

sherlock produced results in 166 milliseconds (Sun Sep 19 05:58:15 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    0 bundles, 5 users and 5 tenants were analyzed

18.10. Check Pod Health

To check the health of a Pod, complete the following steps:

1 To get the list Pods, run the following command and choose a desired Pod from the list:

sherlock -V -H
  1. To check the health of the Pod, run the following command:

    # sherlock --pod <pod name>
    
    You can optionally use the ``-V`` option to view the details.
    

Example

# sherlock -V -H


SHOWING HEALTH OF 3/3 APPLICATIONS RUNNING IN THE CLUSTER
APPNAME:  centos1   STATE: ONLINE       tuser1     1/1 pods healthy   KIND: ROBIN
=============================================================================================================================================================================================

APP HAS 1 PODS:
    POD ID 187: centos1.server.01 on eqx04-flash05 INST: ONLINE/INST: STARTED, NODE: ONLINE, RIO: UP
    |-- VOLID  238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313 1 GB
    |   |-- DEVID : /dev/sdb (eqx04-flash05) segs=2 slices=1 rawspace=64 MB RDVM: UP
    |-- VOLID  237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851 1 GB
    |   |-- DEVID : /dev/sdc (eqx04-flash05) segs=0 slices=1 rawspace=0 RDVM: UP
    |

APP IS RUNNING ON THE FOLLOWING 1 NODES:
    |-- eqx04-flash05 RIO: UP
    |   |-- centos1.server.01 ONLINE/STARTED
    |

APP IS STORING DATA ON THE FOLLOWING 2 DEVICES:
    |-- DEVID 15: /dev/sdb on eqx04-flash05 1 vols
    |   |-- VOLID  238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313    64 MB nslices=1    nsegs=2    nsnaps=1  segspersnap=2
    |
    |-- DEVID 16: /dev/sdc on eqx04-flash05 1 vols
    |   |-- VOLID  237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851       0 nslices=1    nsegs=0    nsnaps=1  segspersnap=0



# sherlock --pod centos1.server.01 -H -V

SHOWING HEALTH OF 1 PODS IN THE CLUSTER:
o-- POD/VNODE ID  187: centos1.server.01 STARTED/ONLINE   1 CPU, 200 MB MEM NODE: UP, RIO: UP
|   |-- VOLID 238: centos1.server.01.data.1.3a588402-0288-4921-a611-8c8b27e94313 64 MB/1 GB nsnaps=1
|   |   |-- DEVID 15: /dev/sdb on eqx04-flash05 nsegs=2   nslices=1   64 MB
|   |-- VOLID 237: centos1.server.01.block.1.0dd5e060-0e28-499c-a3f8-198e33b10851    0/1 GB nsnaps=1
|   |   |-- DEVID 16: /dev/sdc on eqx04-flash05 nsegs=0   nslices=1   0
|

THERE ARE 73 FAILED JOBS TO INSPECT BETWEEN Sun Sep 12 10:42:37 PM 2021 - Sun Sep 19 10:42:37 PM 2021
    |--
    |   |-- HostProbe jobid=5 state=10 error=1 start=Sun Sep 19 02:52:57 2021 end=Sun Sep 19 02:52:57 2021
    |   |       HTTPSConnectionPool(host='172.19.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces?fieldSelector=metadata.name%3Drobin-
    |   |       admin&limit=0&timeoutSeconds=56 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff69c481d50>: Failed to establish a
    |   |       new connection: [Errno 111] Connection refused'))
    |   |-- HostProbe jobid=7 state=10 error=1 start=Sun Sep 19 02:53:21 2021 end=Sun Sep 19 02:53:21 2021
    |   |       HTTPSConnectionPool(host='172.19.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces?fieldSelector=metadata.name%3Drobin-
    |   |       admin&limit=0&timeoutSeconds=56 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff69c3c7210>: Failed to establish a
    |   |       new connection: [Errno 111] Connection refused'))
    |   |-- HostProbe jobid=6 state=10 error=1 start=Sun Sep 19 02:53:12 2021 end=Sun Sep 19 02:53:12 2021
    |   |       HTTPSConnectionPool(host='172.19.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces?fieldSelector=metadata.name%3Drobin-
    |   |       admin&limit=0&timeoutSeconds=56 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff69c3fb090>: Failed to establish a
    |   |       new connection: [Errno 111] Connection refused'))
    |   |-- HostAddResourcePool jobid=30 state=10 error=1 start=Sun Sep 19 03:23:54 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Host 'eqx01-flash16.robinsystems.com' already has a resource pool 'default'
    |   |-- DockerRegistryRemove jobid=1870 state=10 error=1 start=Sun Sep 19 15:16:44 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Invalid registry ID '2', does not exist.
    |   |-- TenantUserRemove jobid=2062 state=10 error=1 start=Sun Sep 19 17:52:01 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Delete the following objects or assign them to another tenant user before removing user 'tuser1':   Bundles: robintest jobmgr-3.1
    |   |-- HostRemoveStorageRole jobid=2129 state=10 error=1 start=Sun Sep 19 18:53:57 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Disk 0x500a075109604998 is still part of 1 device sets
    |   |-- HostRemoveRoles jobid=2127 state=10 error=1 start=Sun Sep 19 18:53:57 2021 end=Sun Sep 19 18:53:57 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Disk 0x500a075109604998 is still part of 1 device sets
    |   |-- UserRemove jobid=2061 state=10 error=1 start=Sun Sep 19 17:52:01 2021 end=Sun Sep 19 17:52:02 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Delete the following objects or assign them to another tenant user before removing user 'tuser1':
    |   |       Bundles: robintest jobmgr-3.1
    |   |-- SetTag jobid=2256 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- HostRemoveRoles jobid=2123 state=10 error=1 start=Sun Sep 19 18:49:05 2021 end=Sun Sep 19 18:49:05 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Disk 0x500a075109604998 is still part of 1 device sets
    |   |-- HostRemoveStorageRole jobid=2125 state=10 error=1 start=Sun Sep 19 18:49:05 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Disk 0x500a075109604998 is still part of 1 device sets
    |   |-- HostAddResourcePool jobid=2131 state=10 error=1 start=Sun Sep 19 18:54:04 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot assign resource pool to host 'eqx01-flash15.robinsystems.com' which has a compute or storage role.
    |   |-- SetTag jobid=2232 state=10 error=1 start=Sun Sep 19 20:10:56 2021 end=Sun Sep 19 20:10:56 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'non-default', please remove it first
    |   |-- SetTag jobid=2254 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- SetTag jobid=2258 state=10 error=1 start=Sun Sep 19 20:44:14 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- SetTag jobid=2257 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- SetTag jobid=2255 state=10 error=1 start=Sun Sep 19 20:44:13 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- SetTag jobid=2260 state=10 error=1 start=Sun Sep 19 20:44:14 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- SetTag jobid=2259 state=10 error=1 start=Sun Sep 19 20:44:14 2021 end=Sun Sep 19 20:44:14 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'eqx01-flash15-Disk', please remove it first
    |   |-- SetTag jobid=2261 state=10 error=1 start=Sun Sep 19 21:05:10 2021 end=Sun Sep 19 21:05:11 2021
    |   |       Tags support single value per key and there is already present a tag 'nightly':'DiskAll', please remove it first
    |   |-- UserAdd jobid=2329 state=10 error=1 start=Sun Sep 19 22:35:21 2021 end=Wed Dec 31 16:00:00 1969
    |   |       A user by the name of 'tuser1' already exists
    |   |-- TenantUserRemove jobid=2325 state=10 error=1 start=Sun Sep 19 22:24:23 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Delete the following objects or assign them to another tenant user before removing user 'tuser1':   Namespace with deployed apps: t016-u000018   Bundles:
    |   |       centos 7   Applications: centos1, centos12, centos123
    |   |-- UserRemove jobid=2327 state=10 error=1 start=Sun Sep 19 22:32:35 2021 end=Sun Sep 19 22:32:35 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Delete the following objects or assign them to another tenant user before removing user 'tuser1':
    |   |       Namespace with deployed apps: t016-u000018   Bundles: centos 7   Applications: centos1, centos12, centos123
    |   |-- BundleRemove jobid=2323 state=10 error=1 start=Sun Sep 19 22:21:44 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Applications active for bundle, centos, with bundleid = 89
    |   |-- UserAdd jobid=2326 state=10 error=1 start=Sun Sep 19 22:27:06 2021 end=Wed Dec 31 16:00:00 1969
    |   |       A user by the name of 'tuser1' already exists
    |   |-- UserRemove jobid=2324 state=10 error=1 start=Sun Sep 19 22:24:23 2021 end=Sun Sep 19 22:24:23 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Delete the following objects or assign them to another tenant user before removing user 'tuser1':
    |   |       Namespace with deployed apps: t016-u000018   Bundles: centos 7   Applications: centos1, centos12, centos123
    |   |-- TenantUserRemove jobid=2328 state=10 error=1 start=Sun Sep 19 22:32:35 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Delete the following objects or assign them to another tenant user before removing user 'tuser1':   Namespace with deployed apps: t016-u000018   Bundles:
    |   |       centos 7   Applications: centos1, centos12, centos123
    |
    |-- robinte-backup-rw
    |   |-- StorageRepoAdd jobid=1943 state=10 error=1 start=Sun Sep 19 16:08:10 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Storage repo with name 'robinte-backup-rw' already exists.
    |
    |-- MySqlOn
    |   |-- ApplicationScalein jobid=330 state=10 error=1 start=Sun Sep 19 05:19:42 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- MariaOn
    |   |-- ApplicationScalein jobid=249 state=10 error=1 start=Sun Sep 19 04:51:29 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- Conflue
    |   |-- ApplicationScalein jobid=747 state=10 error=1 start=Sun Sep 19 06:47:59 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=748 state=10 error=1 start=Sun Sep 19 06:48:00 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=736 state=10 error=1 start=Sun Sep 19 06:46:15 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=743 state=10 error=1 start=Sun Sep 19 06:47:53 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=744 state=10 error=1 start=Sun Sep 19 06:47:55 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=745 state=10 error=1 start=Sun Sep 19 06:47:56 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=746 state=10 error=1 start=Sun Sep 19 06:47:57 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- NginxHo
    |   |-- ApplicationSnapshot jobid=477 state=10 error=1 start=Sun Sep 19 06:04:36 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot snapshot application 'NginxHo' in 'CREATE_FAILED' state
    |   |-- ApplicationSnapshot jobid=478 state=10 error=1 start=Sun Sep 19 06:04:54 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot snapshot application 'NginxHo' in 'CREATE_FAILED' state
    |   |-- ApplicationScale jobid=479 state=10 error=1 start=Sun Sep 19 06:05:01 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scale application 'NginxHo' in 'CREATE_FAILED' state
    |   |-- ApplicationCreate jobid=474 state=10 error=1 start=Sun Sep 19 05:49:13 2021 end=Sun Sep 19 06:04:23 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Job failed. One or more child jobs reported errors. Error: Timeout expired while waiting for pod
    |   |       desired phase 'Ready' current phase 'Pending'
    |   |-- ApplicationScalein jobid=481 state=10 error=1 start=Sun Sep 19 06:08:23 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scale-in application 'NginxHo' in 'CREATE_FAILED' state
    |
    |-- CentosO
    |   |-- ApplicationScalein jobid=168 state=10 error=1 start=Sun Sep 19 04:22:52 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- RedisCl
    |   |-- ApplicationScalein jobid=457 state=10 error=1 start=Sun Sep 19 05:46:43 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '6' nodes. multinode_min  is set to '6' in the bundle definition.
    |   |-- ApplicationScalein jobid=456 state=10 error=1 start=Sun Sep 19 05:46:42 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '6' nodes. multinode_min  is set to '6' in the bundle definition.
    |   |-- ApplicationScalein jobid=458 state=10 error=1 start=Sun Sep 19 05:46:44 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '6' nodes. multinode_min  is set to '6' in the bundle definition.
    |   |-- ApplicationScalein jobid=459 state=10 error=1 start=Sun Sep 19 05:46:46 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '6' nodes. multinode_min  is set to '6' in the bundle definition.
    |   |-- ApplicationScalein jobid=460 state=10 error=1 start=Sun Sep 19 05:46:47 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '6' nodes. multinode_min  is set to '6' in the bundle definition.
    |   |-- ApplicationScalein jobid=461 state=10 error=1 start=Sun Sep 19 05:46:48 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '6' nodes. multinode_min  is set to '6' in the bundle definition.
    |
    |-- NginxHo.nginx.01
    |   |-- VnodeDeploy jobid=480 state=10 error=1 start=Sun Sep 19 06:07:41 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot perform Relocate Operation on Vnode 'NginxHo.nginx.01' of an application in failed state
    |   |-- VnodeDeploy jobid=482 state=10 error=1 start=Sun Sep 19 06:08:28 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot perform Repair Operation on Vnode 'NginxHo.nginx.01' of an application in failed state
    |   |-- VnodeAdd jobid=476 state=10 error=1 start=Sun Sep 19 05:49:18 2021 end=Sun Sep 19 06:04:23 2021
    |   |       Timeout expired while waiting for pod desired phase 'Ready' current phase 'Pending'
    |
    |-- Postgre
    |   |-- ApplicationScalein jobid=411 state=10 error=1 start=Sun Sep 19 05:37:06 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- nginx
    |   |-- RoleCreate jobid=475 state=10 error=1 start=Sun Sep 19 05:49:18 2021 end=Sun Sep 19 06:04:23 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Timeout expired while waiting for pod desired phase 'Ready' current phase 'Pending'
    |
    |-- mysql
    |   |-- ApplicationSnapshot jobid=1190 state=10 error=1 start=Sun Sep 19 10:15:01 2021 end=Sun Sep 19 10:15:59 2021
    |   |       Snapshotting application mysql failed
    |   |-- ApplicationDataSnapshot jobid=1191 state=10 error=1 start=Sun Sep 19 10:15:30 2021 end=Sun Sep 19 10:15:31 2021
    |   |       Failed to create snapshot: 529 Volume mysql.mysql.01.data.1.a0374e98-442b-46c2-8784-032461845f7f (103) snapshots is using 384.0MB which is above space limit
    |   |       256.0MB. No new snapshot is allowed.
    |   |-- ApplicationCreate jobid=1446 state=10 error=1 start=Sun Sep 19 13:11:57 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Non static IP allocations cannot be done from non-range(network) IP pools -> 'demo-ovs-pool-nw'
    |   |-- ApplicationCreate jobid=1487 state=10 error=1 start=Sun Sep 19 13:22:31 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Static IP 10.10.0.10 is not a part of the IP pool demo-ovs-pool-nw associated with this application
    |   |-- ApplicationCreate jobid=1536 state=10 error=1 start=Sun Sep 19 13:37:28 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Static IP 10.9.63.151 is not within the current Robin IP Pool ranges. Cannot allocate static IP
    |   |-- ApplicationCreate jobid=1593 state=10 error=1 start=Sun Sep 19 13:53:41 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Static IP 10.9.105.64 is not a part of the IP pool demo-ovs-pool associated with this application
    |
    |-- Elastic
    |   |-- ApplicationScalein jobid=903 state=10 error=1 start=Sun Sep 19 07:11:51 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=904 state=10 error=1 start=Sun Sep 19 07:11:52 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=905 state=10 error=1 start=Sun Sep 19 07:11:53 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- WordPre
    |   |-- ApplicationScalein jobid=1023 state=10 error=1 start=Sun Sep 19 07:31:58 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |   |-- ApplicationScalein jobid=1022 state=10 error=1 start=Sun Sep 19 07:31:57 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Cannot scaledown to less than '1' nodes. multinode_min  is set to '1' in the bundle definition.
    |
    |-- centos
    |   |-- ApplicationCreate jobid=1742 state=10 error=1 start=Sun Sep 19 14:09:40 2021 end=Sun Sep 19 14:15:18 2021
    |   |       Job failed. One or more child jobs reported errors. Error: Job failed. One or more child jobs reported errors. Error: time="2021-09-19T14:13:04-07:00"
    |   |       level=fatal msg="pulling image: rpc error: code = Unknown desc = Exception calling application: ErrorUnknown:StatusCode.UNKNOWN:Error response from daemon:
    |   |       Get https://artifactory.robinsystems.com/v2/robinsys/centos/manifests/7: unauthorized: BAD_CREDENTIAL"
    |   |-- ApplicationCreate jobid=2330 state=10 error=1 start=Sun Sep 19 22:40:53 2021 end=Sun Sep 19 22:40:54 2021
    |   |       Tenant Application count (3) equals or exceeds max_apps_per_tenant limit (2)
    |
    |-- server
    |   |-- RoleCreate jobid=1743 state=10 error=1 start=Sun Sep 19 14:09:42 2021 end=Sun Sep 19 14:15:18 2021
    |   |       Job failed. One or more child jobs reported errors. Error: time="2021-09-19T14:13:04-07:00" level=fatal msg="pulling image: rpc error: code = Unknown desc =
    |   |       Exception calling application: ErrorUnknown:StatusCode.UNKNOWN:Error response from daemon: Get
    |   |       https://artifactory.robinsystems.com/v2/robinsys/centos/manifests/7: unauthorized: BAD_CREDENTIAL"
    |
    |-- centos.server.01
    |   |-- VnodeAdd jobid=1744 state=10 error=1 start=Sun Sep 19 14:09:42 2021 end=Sun Sep 19 14:15:18 2021
    |   |       time="2021-09-19T14:13:04-07:00" level=fatal msg="pulling image: rpc error: code = Unknown desc = Exception calling application:
    |   |       ErrorUnknown:StatusCode.UNKNOWN:Error response from daemon: Get https://artifactory.robinsystems.com/v2/robinsys/centos/manifests/7: unauthorized:
    |   |       BAD_CREDENTIAL"
    |
    |-- robi-bkup-rw
    |   |-- StorageRepoAdd jobid=2142 state=10 error=1 start=Sun Sep 19 18:58:24 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Storage repo with supplied parameters already exists: robinte-backup-rw
    |
    |-- alpine-limit
    |   |-- ImageAdd jobid=2276 state=10 error=1 start=Sun Sep 19 21:40:19 2021 end=Wed Dec 31 16:00:00 1969
    |   |       Tenant ImageAdd count (2) equals or exceeds max_images_per_tenant limit (2)
    |
    |-- podlimit
    |   |-- ApplicationCreate jobid=2322 state=10 error=1 start=Sun Sep 19 22:02:11 2021 end=Sun Sep 19 22:02:11 2021
    |   |       Tenant Application count (3) equals or exceeds max_pods_per_tenant limit (3)
    |

sherlock produced results in 169 milliseconds (Sun Sep 19 10:42:37 PM 2021).
|-- 3 nodes, 26 disks, 9 vols, 9 snapshots, 3 apps, 3 pods, 1 file-collections,
    1 bundles, 15 users and 16 tenants were analyzed

18.11. Check Volume Health

Sherlock provides you an option to check health of the Volumes attached to virtual nodes or pods. It is recommended to validate this check frequently.

To check volume health attached to vNodes or Pods, complete the following steps:

  1. Run the following command to get the list of volumes to choose for checking health.

    robin volume list
    
  2. Run the following command to check health of the volume.

    # sherlock --vol <volume name> -H -V
    

Example

# robin volume list

    ID | Name | Media | Type | BlockSz | Size | Psize | SnapLimit | Prot | Repl | Compress | QGroupID | CTime | Mount
    ---+--------------------------------------------------------------------+-------+------+---------+-------------+-----------+------------+------+------+----------+----------+------------+----------------------------------------------------------------------------------------------
    1 | file-collection-1557385703714.98f09770-bedd-496f-a7d1-1fd3d1ec840e | 72 | 0 | 4096 | 10737418240 | 268435456 | 0 | 0 | 1 | 1 | 1 | 1557385704 | [{'readonly': 0, 'state': 14, 'zoneid': 1556896675, 'nodeid': 1, 'mntpath': '/dev/vblock0'}]
    4 | mysql-test.mysql.01.data.1.3daab239-4327-4f00-873d-ffda3c9575f2 | 72 | 0 | 4096 | 1073741824 | 469762048 | 214748364 | 0 | 1 | 0 | 4 | 1557390977 | [{'readonly': 0, 'state': 14, 'zoneid': 1556896675, 'nodeid': 2, 'mntpath': '/dev/vblock0'}]
    5 | mysql-test.mysql.01.root_fs.1.58523556-a318-483c-9f8c-d2cd98ad6a32 | 72 | 0 | 4096 | 10737418240 | 335544320 | 2147483648 | 0 | 1 | 0 | 5 | 1557390977 | [{'readonly': 0, 'state': 14, 'zoneid': 1556896675, 'nodeid': 2, 'mntpath': '/dev/vblock1'


# sherlock --vol file-collection-1557385703714.98f09770-bedd-496f-a7d1-1fd3d1ec840e -H -V

    SHOWING HEALTH OF 1/3 VOLUMES IN THE CLUSTER
    |-- VOLID 1: file-collection-1557385703714.98f09770-bedd-496f-a7d1-1fd3d1ec840e 256 MB / 10 GB 1 snapshots, using 1 devices
    | |-- DEVID 9: /dev/sdd on centos-60-182 using 256 MB/127 GB capacity, 8/10 slices, 8 segs, segspernap=1 RDVM: UP, DEV: READY
    | | (WWN: 0x60022480d380ceef6ddbcdfa327834ca PATH: /dev/disk/by-id/scsi-360022480d380ceef6ddbcdfa327834ca)
    | |
    | |-- SNAPSHOTS: 1 CREATED DEV OWN CLONES STATE SIZE
    | | |-- SNAPID 1: 1969/12/31 16:00:00 8 8 0 READY 256 MB
    | | |
    |
    All volumes are healthy
    sherlock produced results in 70 milliseconds (Fri May 10 01:46:43 AM 2019).
    |-- 3 nodes, 12 disks, 3 vols, 7 snapshots, 1 apps, 1 vnodes, 2 users and 1 tenants were analyze

18.12. Check Devices Health

To check health of a device, complete the following steps:

  1. Run the following command to get the list of devices to choose for checking health. From the list select the desired device name.

    sherlock -H
    
  2. Run the following command to check health of the volume.

    # sherlock --dev <device name> -H -V
    

Example

# sherlock -H

No matching apps found
No matching pods found

SHOWING HEALTH OF 1/1 VOLUMES IN THE CLUSTER
|-- VOLID     1: file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44, usage:  448 MB /  20 GB,  1 snapshots, resync progress: SYNCED, using 1 devices
All volumes are healthy

SHOWING HEALTH OF 2/2 NODES RUNNING IN THE CLUSTER
|-- eqx01-flash15 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY
|-- eqx01-flash16 ONLINE     0 errors,  0 warnings NODE: UP, AGENT: UP, IOMGR: UP, K8S Service(s): READY

SHOWING HEALTH OF 18/18 DEVICES IN THE CLUSTER
|-- /dev/sdh@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0d9e30 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EZW)
|
|-- /dev/sde@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0db2c7 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605EN0)
|
|-- /dev/sdf@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0db9be PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605E90)
|
|-- /dev/sdi@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dbae3 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DY2)
|
|-- /dev/sdg@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0ddd62 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605D33)
|
|-- /dev/sdh@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0df3ba PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605CZ7)
|
|-- /dev/sdd@eqx01-flash15 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c101de8 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605TKT)
|
|-- /dev/sdb@eqx01-flash15 | 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x500a075109604998 PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998)
|
|-- /dev/sdf@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dc21f PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DSS)
|
|-- /dev/sdb@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dd039 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DQ3)
|
|-- /dev/sdg@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0dee42 PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DCS)
|
|-- /dev/sdd@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0df26c PATH: /dev/disk/by-id/ata-ST2000NX0253_S4605DB2)
|
|-- /dev/sdi@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0fe71f PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059W1)
|
|-- /dev/sde@eqx01-flash16 | 0/100 vols | 0/119194 slices | 1.8 TB free of 1.8 TB, NODE: ONLINE, RDVM: UP, DEV: READY
|   (WWN: 0x5000c5008c0feea2 PATH: /dev/disk/by-id/ata-ST2000NX0253_S46059R8)


**For this example, we used this device name: 0x500a075109604998**

# sherlock --dev 0x500a075109604998 -H -V

DEVICE /dev/sdb on eqx01-flash15 1/100 vols | 20/57194 slices | 893.8 GB free of 894.3 GB NODE: ONLINE, RDVM: UP, DEV: READY
=============================================================================================================================================================================================
|==> WWN: 0x500a075109604998 | PATH: /dev/disk/by-id/ata-Micron_M500_MTFDDAK960MAV_140109604998
|-- VOL: 1 file-collection-1632045271349.5ff1f19f-937f-4ec1-a595-9d9df9d11d44    448 MB nslices=20  nsegs=14   (1  ) nsnaps=1

sherlock produced results in 130 milliseconds (Sun Sep 19 03:34:51 AM 2021).
|-- 2 nodes, 18 disks, 1 vols, 1 snapshots, 0 apps, 0 pods, 1 file-collections,
    0 bundles, 1 users and 1 tenants were analyzed

18.13. Check Devices Nearing Maximum Capacity

Using Sherlock you can check devices space usage statistics and identify if any attached device’s capacity is nearing maximum capacity. If any device is nearing its maximum capacity, Robin recommends adding more devices to boost the performance.

To check devicies nearing maximum capacity, run the following command:

# sherlock --dev full -H -V

If there are any devices nearing to maximum capacity, the output will display.

18.14. Find Devices With Rebalance Need

To check the devices that might need rebalance, run the following command:

# sherlock --devs-needing-rebalance
# sherlock --devs-needing-rebalance

SHOWING APPLICATIONS THAT NEED ATTENTION:
All apps are healthy

SHOWING PODS THAT NEED ATTENTION:
All pods are healthy

SHOWING UNHEALTHY VOLUMES THAT NEED ATTENTION
All volumes are healthy

SHOWING UNHEALTHY NODES THAT NEED ATTENTION:
All nodes are healthy

SHOWING UNHEALTHY DEVICES THAT NEED ATTENTION:
All devices are healthy

SHOWING UNAVAILABLE FILE COLLECTIONS THAT NEED ATTENTION:
All file collection are available

SHOWING UNAVAILABLE BUNDLES THAT NEED ATTENTION:
All bundles are available

Moving 4 vols, 20 slices and 256 segments:

eqx04-flash05        /dev/sdb  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/57194   segs=      0/57194   vols= 0/100 [ 1.26 ]
eqx04-flash05        /dev/sdc  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/57194   segs=      0/57194   vols= 0/100 [ 1.26 ]
eqx04-flash05        /dev/sda  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05        /dev/sdd  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05        /dev/sde   59.6 GB/59.6 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05       /dev/dm-1   17.4 GB/17.4 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05       /dev/dm-2   35.7 GB/35.7 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx04-flash05       /dev/dm-0    6.0 GB/6.0 GB    (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
------------------------------------------------------------------------------------------------------------------------
eqx01-flash16        /dev/sdb    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdg    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdd    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdi    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sde    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash16        /dev/sdf    1.8 TB/1.8 TB    (free=100.0 %) slices=      5/119194  segs=     11/119194  vols= 1/100 [ 1.25 ]
eqx01-flash16        /dev/sdh    1.8 TB/1.8 TB    (free=100.0 %) slices=     11/119194  segs=     18/119194  vols= 1/100 [ 1.25 ]
eqx01-flash16        /dev/sdc   14.9 GB/14.9 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx01-flash16        /dev/sda    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
------------------------------------------------------------------------------------------------------------------------
eqx01-flash15        /dev/sde    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdf    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdi    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdg    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdh    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdd    1.8 TB/1.8 TB    (free=100.0 %) slices=      0/119194  segs=      0/119194  vols= 0/100 [ 1.26 ]
eqx01-flash15        /dev/sdb  893.8 GB/894.3 GB  (free=100.0 %) slices=     20/57194   segs=     14/57194   vols= 1/100 [ 1.24 ]
eqx01-flash15        /dev/sda  894.3 GB/894.3 GB  (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]
eqx01-flash15        /dev/sdc   14.9 GB/14.9 GB   (free=100.0 %) slices=      0/0       segs=      0/0       vols= 0/100 [ -1.00 ]

Only unhealthy objects are shown. To see everything re-run with -H|--healthy option
To see more details rerun with -V|--verbose option
sherlock produced results in 131 milliseconds (Sun Sep 19 05:19:44 PM 2021).
|-- 3 nodes, 26 disks, 3 vols, 3 snapshots, 0 apps, 0 pods, 1 file-collections,
    1 bundles, 2 users and 2 tenants were analyzed