14. Alerts and Events

Robin CNS has a built-in alerting and event notification mechanism that enables a user to stay up to date with any meaningful events within the cluster. This is especially useful as it not only informs the user on system configuration changes but also occurrences that might impact applications and/or services on nodes in the cluster.

14.1. Events

Robin events are objects that provide a glimpse into operations that are occurring within the Robin cluster with regards to the underlying infrastructure and applications hosted on the hosts. These insights are generated regardless of the origin of the operation (manual intervention, Robin autopilot etc.) and span a variety of topics ranging from container relocation to temperature detection on hosts. Robin events are predetermined (identified by their type or unique id) and have three levels: INFO, WARN, ERROR. Each level is an indication of the severity of the event and consequently allows a user to distinguish which events they should be concerned about. In addition Robin events in the majority of cases come in pairs; one is raised during an occurrence of a negative scenario and another is raised when the scenario mentioned above is fixed (either by manual intervention or Robin auto-healing). The latter is known as a resolving event and is usually of the level INFO.

The following commands are described in this section:

robin event list

List all events

robin event-type list

List all event-types

14.1.1. Listing all events

Robin stores all events that have occurred during a cluster’s lifespan. To view these events, issue the following command:

# robin event list <id>
                   --page-size <page_size>
                   --page <page>
                   --hostname <hostname>
                   --type <type>
                   --type-id <type_id>
                   --level <level>
                   --object <object>


ID of event to inspect. Note: This is an optional parameter.

--page-size <page_size>

Maximum number of event records to include in a single output page

--page <page>

Starting page number (relative to total number of pages of PAGE_SIZE

--hostname <hostname>

Filter events to include only those originating from a particular host

--type <type>

Filter events to include only those of a particular type

--type-id <type_id>

Filter events to include only those with a particular type-id

--level <level>

Filter events to include only those of a specific LEVEL

--object <object>

Filter events to include only those which affect a particular object


Return events in ascending order of ID


Return the total number of events


Display output in JSON format


# robin event list
ID | TIME                 | EVENT_TYPE             | LEVEL | OBJECT                                                                       | HOST          | DESCRIPTION
61 | 11 Aug 2020 00:01:00 | EVENT_KUBELET_CERT_OK  | INFO  | kubeletcert                                                                  |               | Kubelet certificate on UNKNOWN:UNKNOWN is not expiring soon.
60 | 10 Aug 2020 23:35:23 | EVENT_APP_CREATED      | INFO  | new-app-10 (8)                                                               |               | Application new-app-10 was created
59 | 10 Aug 2020 23:35:23 | EVENT_POD_SWAP_LOWMARK | INFO  | new-app-10-server-01.t001-u000003.svc.cluster.local (new-app-10-server-01)   | cscale-82-140 | POD new-app-10-server-01.t001-u000003.svc.cluster.local has swap space usage is in normal range
58 | 10 Aug 2020 23:30:52 | EVENT_APP_CREATED      | INFO  | new-app-2 (7)                                                                |               | Application new-app-2 was created
57 | 10 Aug 2020 23:30:52 | EVENT_POD_SWAP_LOWMARK | INFO  | new-app-2-server-01.t001-u000003.svc.cluster.local (new-app-2-server-01)     | cscale-82-140 | POD new-app-2-server-01.t001-u000003.svc.cluster.local has swap space usage is in normal range
56 | 10 Aug 2020 23:22:35 | EVENT_PROC_HEALTHY     | INFO  | robin-server                                                                 | cscale-82-140 | Health check passed for service robin-server on node default:cscale-82-140.robinsystems.com
55 | 10 Aug 2020 23:22:00 | EVENT_POD_SWAP_LOWMARK | INFO  | new-app-server-01.t001-u000003.svc.cluster.local (new-app-server-01)         | cscale-82-140 | POD new-app-server-01.t001-u000003.svc.cluster.local has swap space usage is in normal range
54 | 10 Aug 2020 23:21:59 | EVENT_APP_CREATED      | INFO  | new-app (6)                                                                  |               | Application new-app was created
53 | 10 Aug 2020 23:21:59 | EVENT_PROC_UNHEALTHY   | WARN  | robin-server                                                                 | cscale-82-140 | Health check failed for service robin-server on node default:cscale-82-140.robinsystems.com
52 | 10 Aug 2020 23:12:00 | EVENT_APP_DELETED      | INFO  | midhaul-app (4)                                                              |               | Application midhaul-app was deleted
51 | 10 Aug 2020 23:12:00 | EVENT_VOLUME_DELETED   | INFO  | midhaul-app.server.01.block.1.b7c9f1fd-d980-4a6c-b793-0a6354e556d7 (9)       |               | volume midhaul-app.server.01.block.1.b7c9f1fd-d980-4a6c-b793-0a6354e556d7 for application midhaul-app was deleted
50 | 10 Aug 2020 23:12:00 | EVENT_VOLUME_DELETED   | INFO  | midhaul-app.server.01.data.1.f33b7cfb-11de-498a-93f2-85f74a8e3b21 (8)        |               | volume midhaul-app.server.01.data.1.f33b7cfb-11de-498a-93f2-85f74a8e3b21 for application midhaul-app was deleted
49 | 10 Aug 2020 23:11:58 | EVENT_POD_DELETED      | INFO  | midhaul-app-server-01.t001-u000003.svc.cluster.local (midhaul-app-server-01) | cscale-82-140 | POD midhaul-app-server-01.t001-u000003.svc.cluster.local on node default:cscale-82-140 was deleted
48 | 10 Aug 2020 23:11:57 | EVENT_APP_DELETED      | INFO  | test-app-2 (3)                                                               |               | Application test-app-2 was deleted
47 | 10 Aug 2020 23:11:56 | EVENT_VOLUME_DELETED   | INFO  | test-app-2.server.01.block.1.fee2c5dc-6704-42d7-956b-5d07119b5a87 (7)        |               | volume test-app-2.server.01.block.1.fee2c5dc-6704-42d7-956b-5d07119b5a87 for application test-app-2 was deleted
46 | 10 Aug 2020 23:11:56 | EVENT_VOLUME_DELETED   | INFO  | test-app-2.server.01.data.1.b9bb1991-b367-45a2-84c3-ed803687bfd0 (6)         |               | volume test-app-2.server.01.data.1.b9bb1991-b367-45a2-84c3-ed803687bfd0 for application test-app-2 was deleted
45 | 10 Aug 2020 23:11:55 | EVENT_POD_DELETED      | INFO  | test-app-2-server-01.t001-u000003.svc.cluster.local (test-app-2-server-01)   | cscale-82-140 | POD test-app-2-server-01.t001-u000003.svc.cluster.local on node default:cscale-82-140 was deleted
44 | 10 Aug 2020 23:11:43 | EVENT_APP_DELETED      | INFO  | ron-app (5)                                                                  |               | Application ron-app was deleted
43 | 10 Aug 2020 23:11:43 | EVENT_VOLUME_DELETED   | INFO  | ron-app.server.01.block.1.bb2c9bee-6d99-4ee1-9f09-4a3259a09726 (10)          |               | volume ron-app.server.01.block.1.bb2c9bee-6d99-4ee1-9f09-4a3259a09726 for application ron-app was deleted
42 | 10 Aug 2020 23:11:43 | EVENT_VOLUME_DELETED   | INFO  | ron-app.server.01.data.1.5c8e51a9-db49-461d-bb5e-af9ae3287be5 (11)           |               | volume ron-app.server.01.data.1.5c8e51a9-db49-461d-bb5e-af9ae3287be5 for application ron-app was deleted

Returns all events that have occurred during a cluster’s lifespan.

End Point: /api/v3/robin_server/events/

Method: GET

URL Parameters:

  • sort=[id|-id] : Utilizing this parameter results in the list of events returned being sorted by their id.

  • total=[true|false] : Utilizing this parameter results in the total number of events being returned.

  • physical_node=<physical_nodename> : Utilizing this parameter results in only events that occured on the specified host being returned.

  • type=<event_type> : Utilizing this parameter results in only events that match the specified type being returned.

  • type_id=<event_type_id> : Utilizing this parameter results in only events that match the specified type ID being returned.

  • level=[INFO|WARN|ERROR]: Utilizing this parameter results in only events of the specified level being returned.

  • object_id=<object_id>: Utilizing this parameter results in only events that are associated with the specified object ID being returned.

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

  • X-Event-Port: <event_server_port> : Port on which the Event Server is listening on; by default this is 29449. Note the value of this field should be a string.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

         "create_time":"August 11, 2020 07:01:00",
            "description":"Kubelet certificate on node 1597147518:1 is not expiring in next 30 days."
         "create_time":"August 11, 2020 06:35:23",
         "create_time":"August 11, 2020 06:35:23",
         "create_time":"August 11, 2020 06:30:52",
         "create_time":"August 11, 2020 06:30:52",
         "create_time":"August 11, 2020 06:22:35",
            "description":"Health check passed for Service 'robin-server'",
            "err_msg":"/usr/lib/python3.4/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings\n  InsecureRequestWarning)\nNone service-check POLLING https://[]:29442/api/v3/robin_server\nNone service-check READY https://[]:29442/api/v3/robin_server\n",
         "create_time":"August 11, 2020 06:22:00",
         "create_time":"August 11, 2020 06:21:59",
         "create_time":"August 11, 2020 06:21:59",
            "description":"Health check failed for Service 'robin-server'",
            "err_msg":"None service-check POLLING https://[]:29442/api/v3/robin_server\nNone service-check attempt (1/2) WARNING: HTTPSConnectionPool(host='', port=29442): Max retries exceeded with url: /api/v3/robin_server (Caused by NewConnectionError('\u003curllib3.connection.VerifiedHTTPSConnection object at 0x7f48f9100cc0\u003e: Failed to establish a new connection: [Errno 111] Connection refused',))\nNone service-check FAILED https://[]:29442/api/v3/robin_server, maximum of 2 attempts reached\n",
         "create_time":"August 11, 2020 06:12:00",
         "create_time":"August 11, 2020 06:12:00",
         "create_time":"August 11, 2020 06:12:00",
         "create_time":"August 11, 2020 06:11:58",
         "create_time":"August 11, 2020 06:11:57",
         "create_time":"August 11, 2020 06:11:56",
         "create_time":"August 11, 2020 06:11:56",
         "create_time":"August 11, 2020 06:11:55",
         "create_time":"August 11, 2020 06:11:43",
         "create_time":"August 11, 2020 06:11:43",
         "create_time":"August 11, 2020 06:11:43",

14.1.2. Listing event types

Robin has a set of predetermined events that are raised whenever the appropriate condition is met. They are identified by their type and their unique ID. The former gives an indication on what the event is referring. In addition, each event type has a status which refers to if the event is currently being tracked. To view all event types, run this command:

# robin event-type list <event_type>
                        --status <status>


ID of event type to inspect. Note: This is optional.

--status <status>

Filter event types to include only those that match a particular status


Display all event types regardless of status


Display output in JSON format


# robin event-type list
ID    | NAME                             | LEVEL | RESOLVES                       | STATUS
2     | EVENT_RESOLVER                   | INFO  |                                | ACTIVE
1005  | EVENT_NODE_UNREACHABLE           | WARN  |                                | ACTIVE
1007  | EVENT_NODE_DOWN                  | WARN  |                                | ACTIVE
1008  | EVENT_NODE_UP                    | INFO  | ['EVENT_NODE_DOWN']            | ACTIVE
1011  | EVENT_NODE_MEM_HIGHMARK          | WARN  |                                | ACTIVE
1015  | EVENT_NODE_ROOTFS_HIGHMARK       | WARN  |                                | ACTIVE
1023  | EVENT_NODE_SWAP_HIGHMARK         | WARN  |                                | ACTIVE
1027  | EVENT_NODE_TEMP_HIGHMARK         | WARN  |                                | ACTIVE

Returns the list of predetermined events that are raised whenever the appropriate condition is met.

End Point: /api/v3/robin_server/events/

Method: GET

URL Parameters:

  • name=<event_type_name : Utilizing this parameter results in only events that match the specified name being returned.

  • status=[0|1|2|3] : Utilizing this parameter results in only events that match the specified status being returned. In this case 0 maps to ALL, 1 maps to ACTIVE, 2 maps to INACTIVE, 3 maps to OBSOLETE.

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

  • X-Event-Port: <event_server_port> : Port on which the Event Server is listening on; by default this is 29449. Note the value of this field should be a string.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

         "msg":"volume \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is faulted",

         "msg":"volume \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is degraded",
         "msg":"volume \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is healthy",
         "msg":"volume \u003cobject_name\u003e for application \u003cappname\u003e was deleted",
         "msg":"Mount \u003cobject_name\u003e for vnode \u003cvnodename\u003e has reached highmark",

         "msg":"volume \u003cobject_name\u003e for vnode \u003cvnodename\u003e usage is in normal range",
         "msg":"Health check failed for service {object_name} on node {zonename}:\u003cnodename\u003e",

         "msg":"Health check passed for service {object_name} on node {zonename}:\u003cnodename\u003e",
         "msg":"process high memory high-watermark",

         "msg":"process memory that was previously at high-watermark dropped into safe zone",
         "msg":"process has hit CPU high-watermark",

         "msg":"process that has previously at high-watermark dropped into safe zone",
         "msg":"Application \u003cappname\u003e was created",

         "msg":"Application {appname} was deleted",
         "msg":"Application \u003cappname\u003e was started",

         "msg":"Application \u003cappname\u003e was stopped",

         "msg":"Application \u003cappname\u003e was frozen",

         "msg":"Application \u003cappname\u003e was thawed",

         "msg":"Application \u003cappname\u003e is waiting for admin's attention",

         "msg":"Application \u003cappname\u003e was snapshotted",

         "msg":"Application \u003cappname\u003e was rolled back",

         "msg":"Application \u003cappname\u003e was cloned",

         "msg":"Application \u003cappname\u003e was scaled",

         "msg":"Application \u003cappname\u003e was evacuated",

         "msg":"Application \u003cappname\u003e was deployed",

         "msg":"Application \u003cappname\u003e was probed",
         "msg":"Application \u003cappname\u003e was upgraded",

         "msg":"Application \u003cappname\u003e was backed up",
         "msg":"Application \u003cappname\u003e failed to be backed up.",

         "msg":"Application \u003cappname\u003e was restored",

         "msg":"Cluster is in VIOLATION of license limits, please see license info for more details.",

         "msg":"Cluster license is healthy.",
         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was started",
         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was stopped",
         "msg":"POD \u003cobject_name\u003e was restarted on {zonename}:\u003cnodename\u003e",
         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was deleted",
         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e crashed",

         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e repeatedly FAULTED",

         "msg":"Deployment plan generation for POD \u003cobject_name\u003e failed",

         "msg":"POD \u003cobject_name\u003e was relocated to {zonename}:\u003cnodename\u003e",
         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e could not be relocated",

         "msg":"POD \u003cobject_name\u003e could not be deployed on node {zonename}:\u003cnodename\u003e",

         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e experienced an error",

         "msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e error has been resolved",
         "msg":"POD \u003cobject_name\u003e could not be stopped on node {zonename}:\u003cnodename\u003e",

         "msg":"POD \u003cobject_name\u003e has reached swap space high watermark",

         "msg":"POD \u003cobject_name\u003e has swap space usage is in normal range",
         "msg":"Node {zonename}:\u003cnodename\u003e is unreachable",

         "msg":"Node {zonename}:\u003cnodename\u003e is now reachable",
         "msg":"Node {zonename}:\u003cnodename\u003e has been marked as down",

         "msg":"Node {zonename}:\u003cnodename\u003e is up after being marked as down",
         "msg":"Node {zonename}:\u003cnodename\u003e has reached memory high-watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e has dropped below memory high-watermark to safe zone",
         "msg":"Node {zonename}:\u003cnodename\u003e root filesystem usage has hit high watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e root filesystem usage is in safe zone",
         "msg":"Node {zonename}:\u003cnodename\u003e swap space has reached highmark",

         "msg":"Node {zonename}:\u003cnodename\u003e swap space usage is in normal range",


         "msg":"Node {zonename}:\u003cnodename\u003e has been removed",

         "msg":"Node {zonename}:\u003cnodename\u003e has reached /var high watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e /var usage is in normal range",
         "msg":"Node {zonename}:\u003cnodename\u003e has reached /var/log high watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e /var/log usage is in normal range",
         "msg":"Node {zonename}:\u003cnodename\u003e has reached /var/lib/pgsql high watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e /var/lib/pgsql usage is in normal range",
         "msg":"Node {zonename}:\u003cnodename\u003e has reached /var/crash high watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e /var/crash usage is in normal range",
         "msg":"Node {zonename}:\u003cnodename\u003e has reached /var/lib/robin high watermark",

         "msg":"Node {zonename}:\u003cnodename\u003e /var/lib/robin usage is in normal range",
         "msg":"Node {zonename}:\u003cnodename\u003e has been added",

         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was started",
         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was stopped",

         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was restarted",
         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was deleted",
         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e crashed",

         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e reached memory high-watermark",

         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was previously at memory high-watermark, but dropped to safe zone now",
         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was previously at CPU utilization high-watermark, but has dropped to safe zone now",
         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was previously at Block IO high-watermark, but has dropped to safe zone now",
         "msg":"Deployment plan generation for container \u003cobject_name\u003e failed",

         "msg":"container \u003cobject_name\u003e was relocated to {zonename}:\u003cnodename\u003e",
         "msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e could not be relocated",

         "msg":"container \u003cobject_name\u003e could not be deployed on node {zonename}:\u003cnodename\u003e",

         "msg":"container \u003cobject_name\u003e has reached swap space high watermark",

         "msg":"container \u003cobject_name\u003e has swap space usage is in normal range",
         "msg":"File Collection {object_name} on node {zonename}:\u003cnodename\u003e experienced an error.",

         "msg":"File Collection {object_name} on node {zonename}:\u003cnodename\u003e is offline.",
         "msg":"Failed to take File Collection {object_name} on node {zonename}:\u003cnodename\u003e offline.",

         "msg":"File Collection {object_name} on node {zonename}:\u003cnodename\u003e is online.",
         "msg":"Failed to take File Collection {object_name} on node {zonename}:\u003cnodename\u003e online.",

         "msg":"Node {zonename}:\u003cnodename\u003e is now the MASTER Manager (\u003cdescription\u003e)",

         "msg":"User on node {zonename}:\u003cnodename\u003e is utilizing the YUM package manager with the command: yum \u003cdescription\u003e",


         "msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following System Configuration precheck warning was ignored: \u003cdescription\u003e",

         "msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following Package Available precheck warning was ignored: \u003cdescription\u003e",

         "msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following Physical System Properties precheck warning was ignored: \u003cdescription\u003e",

         "msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following Networking precheck warning was ignored: \u003cdescription\u003e",

         "msg":"Active alerts for object '{object_name}' on node {zonename}:\u003cnodename\u003e have been resolved.",

         "msg":"Kubelet certificate check failed. Please check the kubelet certificate in node on node {zonename}:\u003cnodename\u003e",

         "msg":"Kubelet certificate on node {zonename}:\u003cnodename\u003e will expire soon.",

         "msg":"Kubelet certificate on node {zonename}:\u003cnodename\u003e expired.",

         "msg":"Kubelet certificate on {zonename}:\u003cnodename\u003e is not expiring soon.",
         "msg":"Certificate check failed. Please check 'kubeadm alpha certs check-expiration' is responding correctly in Robin master node.",

         "msg":"One or more K8S certificates will expire soon.",

         "msg":"One or more K8S certificates expired.",

         "msg":"K8S certificates are not expiring soon.",
         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e high-watermark",

         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e has dropped below disk high-watermark to safe zone",
         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e temperature high-watermark",

         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e has dropped below temperature high-watermark to safe zone",
         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is faulted",

         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is offline",

         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is degraded",

         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is healthy",
         "msg":"disk \u003cobject_name\u003e failed to detach from node {zonename}:\u003cnodename\u003e",

         "msg":"disk \u003cobject_name\u003e failed to attach to node {zonename}:\u003cnodename\u003e",
         "msg":"disk \u003cobject_name\u003e detached from node {zonename}:\u003cnodename\u003e",
         "msg":"disk \u003cobject_name\u003e is attached to node {zonename}:\u003cnodename\u003e",
         "msg":"disk \u003cobject_name\u003e removal from node {zonename}:\u003cnodename\u003e failed",

         "msg":"disk \u003cobject_name\u003e removed from node {zonename}:\u003cnodename\u003e",
         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e exceeds used space threshold",

         "msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e used space threshold ok",

The following are the types of events and their description:

Event type





This event is triggered when the Robin volume is in the FAULTED state. The Robin volume will be in the FAULTED state when none of its replicas are accessible because a node is down or the disk is bad.



This event is triggered when the Robin volume is in the degraded state. When one or more replicas of the Robin volume are not accessible because the node is down or the disk is bad but the Robin volume can still serve data through at least one healthy replica.



This event is triggered when the filesystem size on a volume reaches the threshold limit. It can be updated through the monitor_container_volume_highmark config attribute. The default value of this attribute is 0.9.



This event is triggered when one of the Robin Manager nodes is UNREACHABLE.



This event is triggered when the free space of a disk is less than 512 MB.



This event is triggered when the disk is faulted (IOs are failing on the device).



This event is triggered when the Robin IO manager (robin storage service) daemon is down or the Robin Pod or the node is down.



This event is triggered when the cloud disk fails to detach from a node.



This event is triggered when the cloud disk fails to attach to a node.



This event is triggered when Robin or the user decides to move the disk from one cloud compute node to another cloud compute node.



This event is triggered when the removal of a disk from the node is failed in the cloud environments.



This event is triggered when the used space of a disk reaches the threshold limit. It can be updated through the disk_used_space_threshold config attribute. The default value of this attribute is 80%.



This event is triggered when removing a disk from Robin CNS fails.



This event is triggered when the disk has free space but the maximum number of volumes is reached.



This event is triggered when the node is marked as UNKNOWN.



This event is triggered when the node is marked as UNREACHABLE. The reason can be either the network is down or the Pod or the node is down.



This event is triggered when the node is marked as DOWN.



This event is triggered when the node is marked as OFFLINE.



This event is triggered when the /home/robinds/var/log/robin filesystem reaches the threshold limit. It can be updated through the monitor_host_var_log_volume_highmark config attribute. The default value of this attribute is 90%.



This event is triggered when the /home/robinds/var/lib/pgsql filesystem reaches the threshold limit. It can be updated through the monitor_host_var_pgsql_volume_highmark config attribute. The default value of this attribute is 70%.



This event is triggered when the /home/robinds/var/crash filesystem reaches the threshold limit. It can be updated through the monitor_host_var_crash_volume_highmark config attribute. The default value of this attribute is 90%.



This event is triggered when the status of the Robin service is unhealthy.



This event is triggered when the license of the Robin CNS exceeds its limit.

You can update the threshold limit of the following events as per requirement:






Run the following command to update the threshold limit:

# robin config update agent <attribute> <value>


# robin config update agent monitor_host_var_pgsql_volume_highmark 0.8

14.2. Alerts

Robin alerts are generated to notify the logged-in user that a negative event has occurred in the cluster. They are only generated when events at level WARN or ERROR are created. This is because these events might require immediate attention. Once an alert is raised, its state is set to ACTIVE. The alert will only be resolved if a resolving event is created or if a user manually resolves the alert.

The following commands are described in this section:

robin alert list

List all alerts

14.2.1. Listing all alerts

Events of level ERROR or WARN are considered to be alerts as they need to be resolved before they can be dismissed. Robin stores all alerts that have occurred during the lifespan of a cluster. To view these alerts, run this command:

# robin alert list <id>
                   --page-size <page_size>
                   --page <page>
                   --hostname <hostname>
                   --nodeid <node_id>
                   --type <type>
                   --type-id <type_id>
                   --level <level>
                   --object <object>


ID of alert to inspect. Note: This is an optional parameter.

--page-size <page_size>

Maximum number of alert records to include in a single output page

--page <page>

Starting page number (relative to total number of pages of PAGE_SIZE)

--hostname  <hostname>

Filter alerts to include only those originating from a particular host

--nodeid <node_id>

Filter alerts to include only with a particular node ID

--type <type>

Filter alerts to include only those of a particular event type

--type-id <type_id>

Filter alerts to include only those with a particular event type-id

--level <level>

Filter alerts to include only those of a specific LEVEL

--object <object>

Filter alerts to include only those which concern a particular object


Return all alerts, even those that were resolved


Return events in ascending order of ID


Return the total number of alerts


Display output in JSON format


# robin alert list
ID   | START_TIME           | CUR_TIME             | EVENT_TYPE                       | CUR_LEVEL | STATE  | NODE_ID      | OBJECT                                     | EVENT_COUNT
1520 | 06 Feb 2020 15:04:53 | 06 Feb 2020 15:12:14 | EVENT_CONT_DEPLOY_FAILED         | ERROR     | ACTIVE | 1580198912:0 | my-mysql-01.t001-u000003.svc.cluster.local | 15
1519 | 06 Feb 2020 15:04:46 | 06 Feb 2020 15:04:46 | EVENT_CONT_ERROR                 | ERROR     | ACTIVE | 1580198912:0 | my-mysql-01.t001-u000003.svc.cluster.local | 1
1518 | 31 Jan 2020 22:27:08 | 31 Jan 2020 22:27:08 | EVENT_APP_ADMIN_WAIT             | WARN      | ACTIVE | 1580198912:0 | my                                         | 1
1360 | 29 Jan 2020 00:01:50 | 29 Jan 2020 00:01:50 | EVENT_CONT_DEPLOY_FAILED         | ERROR     | ACTIVE | 1580198912:0 | vnode-ipv6-55                              | 1
846  | 28 Jan 2020 22:43:07 | 28 Jan 2020 22:43:07 | EVENT_CONT_ERROR                 | ERROR     | ACTIVE | 1580198912:0 | vnode-ipv6-55                              | 1
3    | 27 Jan 2020 16:27:08 | 27 Jan 2020 16:27:10 | EVENT_SYSCONFIG_PRECHECK_WARNING | WARN      | ACTIVE | 1580198912:3 | cscale-82-38.robinsystems.com              | 30
2    | 27 Jan 2020 16:16:55 | 27 Jan 2020 16:16:55 | EVENT_SYSCONFIG_PRECHECK_WARNING | WARN      | ACTIVE | 1580198912:2 | cscale-82-37.robinsystems.com              | 15
1    | 27 Jan 2020 16:09:09 | 27 Jan 2020 16:09:11 | EVENT_SYSCONFIG_PRECHECK_WARNING | WARN      | ACTIVE | 1580198912:1 | cscale-82-36.robinsystems.com              | 30

Returns all events that have occurred during a cluster’s lifespan.

End Point: /api/v3/robin_server/events/

Method: GET

URL Parameters:

  • sort=[id|-id] : Utilizing this parameter results in the list of events returned being sorted by their id.

  • total=[true|false] : Utilizing this parameter results in the total number of events being returned.

  • state="ACTIVE" : Utilizing this parameter results in only events which havent been resolved being returned.

  • physical_node=<physical_nodename> : Utilizing this parameter results in only events that occured on the specified host being returned.

  • nodeid=<physical_node_id> : Utilizing this parameter results in only events that occured on the host with the specified id being returned.

  • type=<event_type> : Utilizing this parameter results in only events that match the specified type being returned.

  • type_id=<event_type_id> : Utilizing this parameter results in only events that match the specified type ID being returned.

  • object_id=<object_id>: Utilizing this parameter results in only events that are associated with the specified object ID being returned.

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

  • X-Event-Port: <event_server_port> : Port on which the Event Server is listening on; by default this is 29449. Note the value of this field should be a string.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

               "create_time":"August 11, 2020 05:06:01",
                  "description":"/var/lib/docker not a partition (Folder not present)",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/var/lib/docker folder not present. Required space: 40G",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"PCI device passthrough not enabled. Set iommu=pt and intel_iommu=on in GRUB if planning to deploy KVM+SRIOV apps",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds/var/log not a partition (Folder not present)",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds/var/log folder not present. Required space: 60G",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds/var/crash not a partition (Folder not present)",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds/var/crash folder not present. Required space: 100G",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds not a partition (Folder not present)",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds folder not present. Required space: 40G",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds/var/lib/pgsql not a partition (Folder not present)",
               "create_time":"August 11, 2020 05:06:01",
                  "description":"/home/robinds/var/lib/pgsql folder not present. Required space: 50G",

14.3. Notification of events

Robin provides a native mechanism to instantly notify parties of any events that may concern them. This feature is useful as it enables quick responses to failures of cluster-wide resources and infrastructure.

The parties are modeled as subscribers in Robin and will need to be added manually, along with the method used to notify them. Given the large number of events that are detected by Robin, a subscription to a specific event is needed to indicate that a particular event is of interest. Each subscriber is then notified when the event that is tied to a subscription occurs.

To summarize, receiving a notification for an event is a two step process:

  • Add a subscriber and their contact details

  • Add a subscription to an event that is of interest

Described below are the commands used to manage subscribers and subscriptions.

14.3.1. Managing subscribers

Subscribers are the intended recipients of notifications and should be configured before adding a subscription.

The following commands are described in this section:

robin subscriber add

Add a subscriber

robin subscriber list

List all subscribers

robin subscriber update

Update a subscriber’s attributes

robin subscriber remove

Remove a subscriber Registering a Robin subscriber

To add a new subscriber to the system, run this command:

# robin subscriber add <name> <subscriber_type>
                              --email-address <email_address>
                              --full-name <full_name>
                              --host <host>
                              --port <port>
                              --community <community>


For email subscribers the parameters email-address and fullname are mandatory. On the other hand, for SNMP subscribers the parameters host, port and community are mandatory.


Name to assign to subscriber


Type of subscriber. Options include: snmp, email

--email-address <email_address>

Email address of subscriber

--full-name <full_name>

Full name of subscriber

--host <host>

SNMP Host for subscriber

--port <port>

SNMP Port of subscriber

--community <community>

SNMP Community of subscriber


# robin subscriber add demo_user email --email-address demo@robin.io --full-name Robin
Successfully added subscriber 'demo_user' with 'email' notification


A user with the same name can be both an email and SNMP subscriber. Listing all subscribers

To list all of the subscribers currently registered with Robin alongside details such as the subscription type and ID, issue the following command:

# robin subscriber list <id>
                        --name <name>
                        --type <type>


ID of subscriber to inspect (optional).

--name <name>

Name of subscriber to inspect

--type <type>

Filter subscribers to include only a particular type of subscriber


Display output in JSON format


# robin subscriber list
ID |  Name | Type  | Details
56 | demo  | email | Full name: demo user, Email Address: demo@robin.io
56 | demo  | snmp  | Host: cscale-82-45, Port: 162, Community: public

Returns all of the subscribers currently registered with Robin alongside details such as the subscription type and ID.

End Point: /api/v3/robin_server/subscribers/

Method: GET

URL Parameters:

  • name=<name_of_subscriber> : Utilizing this parameter results in only details for the specified subscriber being present in the response payload.

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error)

Example Response:

} Updating a subscriber

To update one or many attributes of a subscriber, run this command:

# robin subscriber update <id>
                          --email-address <email_address>
                          --full-name <full_name>
                          --host <host>
                          --port <port>
                          --community <community>


ID of subscriber to update

--email-address <email_address>

Updated value for the subscribers email address

--full-name <full_name>

Updated value for the subscribers full name

--host <host>

Updated value for the subscribers SNMP host

--port <port>

Updated value for the subscribers SNMP port

--community <community>

Updated value for the subscribers SNMP community


# robin subscriber list
ID |  Name      | Type  | Details
1  | demo_user  | email | Full name: demo user, Email Address: demo@robin.io
2  | demo_two   | snmp  | Host: cscale-82-45, Port: 162, Community: public

# robin subscriber update 1 --email-address change_demo@robin.io
Successfully updated subscriber '1' with 'email' notification

# robin subscriber list
ID |  Name      | Type  | Details
1  | demo_user  | email | Full name: demo user, Email Address: change_demo@robin.io
2  | demo_two   | snmp  | Host: cscale-82-45, Port: 162, Community: public Removing a subscriber

To remove a subscriber currently registered with Robin, run this command:

# robin subscriber remove <id>
                          --type <type>


ID of subscriber to remove

--type <type>

Type of the specified subscriber to remove


Confirm deletion without prompting


# robin subscriber list
ID |  Name | Type  | Details
56 | demo  | email | Full name: demo user, Email Address: demo@robin.io
56 | demo  | snmp  | Host: cscale-82-45, Port: 162, Community: public

# robin subscriber remove 1 --type snmp
Successfully removed 'snmp' notification for subscriber '1'

# robin subscriber list
ID |  Name | Type  | Details
56 | demo  | email | Full name: demo user, Email Address: demo@robin.io


If a subscriber is of both types and type is not specified during the delete operation both entries for this subscriber will be removed.

Removes a subscriber such that no notifications are sent to the specified user.

End Point: /api/v3/robin_server/subscribers/<subscriber_id>

Method: DELETE

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)

Example Response:

   "message":"Deleted the subscriber"

14.3.2. Managing subscriptions

Subscriptions indicate to Robin which events/alerts a user is interested in. Each subscription has an associated list of event types and subscribers. As a result, when an event that is part of a subscription occurs, all the subscribers that are linked with the subscriptions are notified.

The following commands are described in this section:

robin subscription add

Add a subscription

robin subscription list

List subscriptions

robin subscription update

Update a subscription

robin subscription remove

Remove a subscription Registering a Robin subscription

To add a new subscription, run this command with the appropriate options:

# robin subscription add <subscriber_id> <subscription_type> <event_types>
                                                             --subscription-file <subscription_file>
                                                             --object-id <object_id>
                                                             --nodeid <node_id>
                                                             --zoneid <zone_id>
                                                             --threshold <threshold>
                                                             --elapsed-ticks <elapsed_ticks>
                                                             --throttle <throttle>


ID of subscriber to associate with the subscription


Type of subscription. Valid choices are alert, event, and file.


Event types to associate with the subscription. Note: This can be provided via the subscription file

--subscription-file <subscription_file>

JSON formatted file that contains lists of events and alerts to subscribe to

--object-id <object_id>

ID of objects to match

--nodeid <node_id>

ID of nodes to match

--zoneid <zone_id>

ID of zone to match


Disable this subscription when it is first added


Enable this subscription when it is first added

--threshold <threshold>

Number of instances of event/alert before launching notification

--elapsed-ticks <elapsed_ticks>

Number of seconds to allow for the threshold to be met

--throttle <throttle>

Number of seconds before a repeat notification will be sent


# robin subscription add 1 event 5006,4002 --enable
Notifications subscription completed successfully Listing all subscriptions

To list all subscriptions currently registered with Robin for a particular subscriber alongside details such as the associated events, issue the following command:

# robin subscription list <id>


ID of subscription to inspect.


Display additional information about the subscriptions


Display output in JSON format


# robin subscription list 2
ID | Type         | Subscriber ID | Event Type         | Enabled | Threshold | Elapsed Ticks | Throttle
20 | SYSTEM_EVENT | 2             | EVENT_DISK_OFFLINE |    True |         1 | 0             | 0
21 | SYSTEM_EVENT | 2             | EVENT_CONT_STOPPED |    True |         1 | 0             | 0

Returns all subscriptions currently registered with Robin for a particular subscriber alongside details such as the associated events.

End Point: /api/v3/robin_server/subscribers/<subscriber_id>/subscriptions

Method: GET

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error)

Example Response:

} Updating a Subscription

To update one or many attributes of a subscription, run this command:

# robin subscriber update <subscriber_id> <subscription_type> <subscription_id>
                                                               --threshold <threshold>
                                                               --elapsed-ticks <elapsed_ticks>
                                                               --throttle <throttle>


ID of subscriber associated with the subscription to update


Type of subscription. Valid choices are alert and event


ID of subscription to update


Disable this subscription


Enable this subscription

--threshold <threshold>

Updated value of the number of instances of event/alert before launching notification

--elapsed-ticks <elapsed_ticks>

Updated value of the number of seconds to allow for the threshold to be met

--throttle <throttle>

Updated value of the number of seconds before a repeat notification will be sent


# robin subscription list 2
ID | Type         | Subscriber ID | Event Type         | Enabled | Threshold | Elapsed Ticks | Throttle
1  | SYSTEM_EVENT | 2             | EVENT_DISK_OFFLINE | True    | 1         | 0             | 86400
2  | SYSTEM_EVENT | 2             | EVENT_CONT_STOPPED | True    | 1         | 0             | 86400

# robin subscription update 2 event 1 --threshold 3 --disable
Successfully updated event subscription with id 20 for subscriber 2

# robin subscription list
ID | Type         | Subscriber ID | Event Type         | Enabled | Threshold | Elapsed Ticks | Throttle
1  | SYSTEM_EVENT | 2             | EVENT_DISK_OFFLINE | False   | 3         | 0             | 86400
2  | SYSTEM_EVENT | 2             | EVENT_CONT_STOPPED | True    | 1         | 0             | 86400 Removing a subscription

To remove a subscription currently registered with Robin, run this command:

# robin subscription remove <subscriber_id> <subscription_type> <subscription_id>


ID of subscriber associated with subscription to remove


Type of subscription. Valid choices are alert and event


ID of subscription to remove


Confirm deletion without prompting


# robin subscription list
ID | Type         | Subscriber ID | Event Type         | Enabled | Threshold | Elapsed Ticks | Throttle
1  | SYSTEM_EVENT | 2             | EVENT_DISK_OFFLINE | False   | 3         | 0             | 86400
2  | SYSTEM_EVENT | 2             | EVENT_CONT_STOPPED | True    | 1         | 0             | 86400

# robin subscription remove 2 event 1
Successfully deleted event subscription with id 20 for subscriber 2

# robin subscriber list
ID | Type         | Subscriber ID | Event Type         | Enabled | Threshold | Elapsed Ticks | Throttle
2  | SYSTEM_EVENT | 2             | EVENT_CONT_STOPPED | True    | 1         | 0             | 86400

Removes a subscription that a subscriber currently holds.

End Point:


Method: DELETE

URL Parameters: None

Data Parameters: None

Port: RCM Port (default value is 29442)


  • Authorization: <auth_token> : Authorization token to identify which user is sending the request. The token can be acquired from the login API.

Success Response Code: 200

Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)

Example Response: On success the reponse is empty.

14.4. Exposing alerts and events for external log monitoring tools

Robin CNS has the collect-event container in the Robin Master Pod. This container collects all alerts and events with event types listed in the robin event-type list command. External log monitoring tools can consume the logs of this container to access Robin alerts and events.

14.4.1. Checking alerts and events in the collect-event container

To check the Robin alerts and events collected by the collect-event container, run the following command:

# kubectl logs <master-pod-name> -n robinio -c collect-event


# kubectl logs robin-master-54f97c6b85-57k97 -n robinio -c collect-event
INFO:robin-events:Event: {'id': 40, 'zoneid': 1696946859, 'type_id': 3004, 'object_id': 'iomgr-server', 'nodeid': 2, 'level': 1, 'parent_ref': 0, 'tenant_id': 0, 'user_id': 0, 'timestamp': 1696947046.2893283, 'create_time': 'October 10, 2023 00:10:46', 'payload': {'description': "Health check failed for Service 'iomgr-server'", 'err_msg': 'Get "http://localhost:29456/api/v3/rio/alive": dial tcp connect: connection refused', 'event_server_orig': '', 'hostname': 'hypervvm-2-15', 'nodename': 'hypervvm-2-15', 'object_name': 'iomgr-server', 'zonename': 'default'}}
INFO:robin-events:Alert: {'id': 5, 'zoneid': 1696946859, 'nodeid': 2, 'object_id': 'iomgr-server', 'type_id': 3004, 'state': 2, 'start_level': 1, 'cur_level': 1, 'count': 1, 'tenant_id': 0, 'user_id': 0, 'start_time': '2023-10-10T00:10:46.028932829-07:00', 'cur_time': '2023-10-10T00:10:46.028932829-07:00', 'event_instances': [{'id': 40, 'zoneid': 1696946859, 'nodeid': 2, 'object_id': 'iomgr-server', 'type_id': 3004, 'level': 1, 'parent_ref': 0, 'tenant_id': 0, 'user_id': 0, 'timestamp': {'Int': 16969470462893283, 'Exp': -7, 'Status': 2, 'NaN': False, 'InfinityModifier': 0}, 'create_time': '2023-10-10T00:10:46.028932829-07:00', 'payload': {'description': "Health check failed for Service 'iomgr-server'", 'err_msg': 'Get "http://localhost:29456/api/v3/rio/alive": dial tcp connect: connection refused', 'event_server_orig': '', 'hostname': 'hypervvm-2-15', 'nodename': 'hypervvm-2-15', 'object_name': 'iomgr-server', 'zonename': 'default'}}]}