15. Alerts and Events¶
Robin Platform has a built-in alerting and event notification mechanism that enables a user to stay up to date with any meaningful events within the cluster. This is especially useful as it not only informs the user on system configuration changes but also occurrences that might impact applications and/or services on nodes in the cluster.
15.1. Events¶
Robin events are objects that provide a glimpse into operations that are occurring within the Robin cluster with regards to the underlying infrastructure and applications hosted on the hosts. These insights are generated regardless of the origin of the operation (manual intervention, Robin autopilot etc.) and span a variety of topics ranging from container relocation to temperature detection on hosts. Robin events are predetermined (identified by their type or unique id) and have three levels: INFO, WARN, ERROR. Each level is an indication of the severity of the event and consequently allows a user to distinguish which events they should be concerned about. In addition Robin events in the majority of cases come in pairs; one is raised during an occurrence of a negative scenario and another is raised when the scenario mentioned above is fixed (either by manual intervention or Robin auto-healing). The latter is known as a resolving event and is usually of the level INFO.
The following commands are described in this section:
|
List all events |
|
List all event-types |
15.1.1. Listing all events¶
Robin stores all events that have occurred during a cluster’s lifespan. To view these events, issue the following command:
# robin event list <id>
--page-size <page_size>
--page <page>
--hostname <hostname>
--type <type>
--type-id <type_id>
--level <level>
--object <object>
--ascending
--total
--json
|
ID of event to inspect. Note: This is an optional parameter. |
|
Maximum number of event records to include in a single output page |
|
Starting page number (relative to total number of pages of PAGE_SIZE |
|
Filter events to include only those originating from a particular host |
|
Filter events to include only those of a particular type |
|
Filter events to include only those with a particular type-id |
|
Filter events to include only those of a specific LEVEL |
|
Filter events to include only those which affect a particular object |
|
Return events in ascending order of ID |
|
Return the total number of events |
|
Display output in JSON format |
Example:
# robin event list
ID | TIME | EVENT_TYPE | LEVEL | OBJECT | HOST | DESCRIPTION
---+----------------------+------------------------+-------+------------------------------------------------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------
61 | 11 Aug 2020 00:01:00 | EVENT_KUBELET_CERT_OK | INFO | kubeletcert | | Kubelet certificate on UNKNOWN:UNKNOWN is not expiring soon.
60 | 10 Aug 2020 23:35:23 | EVENT_APP_CREATED | INFO | new-app-10 (8) | | Application new-app-10 was created
59 | 10 Aug 2020 23:35:23 | EVENT_POD_SWAP_LOWMARK | INFO | new-app-10-server-01.t001-u000003.svc.cluster.local (new-app-10-server-01) | cscale-82-140 | POD new-app-10-server-01.t001-u000003.svc.cluster.local has swap space usage is in normal range
58 | 10 Aug 2020 23:30:52 | EVENT_APP_CREATED | INFO | new-app-2 (7) | | Application new-app-2 was created
57 | 10 Aug 2020 23:30:52 | EVENT_POD_SWAP_LOWMARK | INFO | new-app-2-server-01.t001-u000003.svc.cluster.local (new-app-2-server-01) | cscale-82-140 | POD new-app-2-server-01.t001-u000003.svc.cluster.local has swap space usage is in normal range
56 | 10 Aug 2020 23:22:35 | EVENT_PROC_HEALTHY | INFO | robin-server | cscale-82-140 | Health check passed for service robin-server on node default:cscale-82-140.robinsystems.com
55 | 10 Aug 2020 23:22:00 | EVENT_POD_SWAP_LOWMARK | INFO | new-app-server-01.t001-u000003.svc.cluster.local (new-app-server-01) | cscale-82-140 | POD new-app-server-01.t001-u000003.svc.cluster.local has swap space usage is in normal range
54 | 10 Aug 2020 23:21:59 | EVENT_APP_CREATED | INFO | new-app (6) | | Application new-app was created
53 | 10 Aug 2020 23:21:59 | EVENT_PROC_UNHEALTHY | WARN | robin-server | cscale-82-140 | Health check failed for service robin-server on node default:cscale-82-140.robinsystems.com
52 | 10 Aug 2020 23:12:00 | EVENT_APP_DELETED | INFO | midhaul-app (4) | | Application midhaul-app was deleted
51 | 10 Aug 2020 23:12:00 | EVENT_VOLUME_DELETED | INFO | midhaul-app.server.01.block.1.b7c9f1fd-d980-4a6c-b793-0a6354e556d7 (9) | | volume midhaul-app.server.01.block.1.b7c9f1fd-d980-4a6c-b793-0a6354e556d7 for application midhaul-app was deleted
50 | 10 Aug 2020 23:12:00 | EVENT_VOLUME_DELETED | INFO | midhaul-app.server.01.data.1.f33b7cfb-11de-498a-93f2-85f74a8e3b21 (8) | | volume midhaul-app.server.01.data.1.f33b7cfb-11de-498a-93f2-85f74a8e3b21 for application midhaul-app was deleted
49 | 10 Aug 2020 23:11:58 | EVENT_POD_DELETED | INFO | midhaul-app-server-01.t001-u000003.svc.cluster.local (midhaul-app-server-01) | cscale-82-140 | POD midhaul-app-server-01.t001-u000003.svc.cluster.local on node default:cscale-82-140 was deleted
48 | 10 Aug 2020 23:11:57 | EVENT_APP_DELETED | INFO | test-app-2 (3) | | Application test-app-2 was deleted
47 | 10 Aug 2020 23:11:56 | EVENT_VOLUME_DELETED | INFO | test-app-2.server.01.block.1.fee2c5dc-6704-42d7-956b-5d07119b5a87 (7) | | volume test-app-2.server.01.block.1.fee2c5dc-6704-42d7-956b-5d07119b5a87 for application test-app-2 was deleted
46 | 10 Aug 2020 23:11:56 | EVENT_VOLUME_DELETED | INFO | test-app-2.server.01.data.1.b9bb1991-b367-45a2-84c3-ed803687bfd0 (6) | | volume test-app-2.server.01.data.1.b9bb1991-b367-45a2-84c3-ed803687bfd0 for application test-app-2 was deleted
45 | 10 Aug 2020 23:11:55 | EVENT_POD_DELETED | INFO | test-app-2-server-01.t001-u000003.svc.cluster.local (test-app-2-server-01) | cscale-82-140 | POD test-app-2-server-01.t001-u000003.svc.cluster.local on node default:cscale-82-140 was deleted
44 | 10 Aug 2020 23:11:43 | EVENT_APP_DELETED | INFO | ron-app (5) | | Application ron-app was deleted
43 | 10 Aug 2020 23:11:43 | EVENT_VOLUME_DELETED | INFO | ron-app.server.01.block.1.bb2c9bee-6d99-4ee1-9f09-4a3259a09726 (10) | | volume ron-app.server.01.block.1.bb2c9bee-6d99-4ee1-9f09-4a3259a09726 for application ron-app was deleted
42 | 10 Aug 2020 23:11:43 | EVENT_VOLUME_DELETED | INFO | ron-app.server.01.data.1.5c8e51a9-db49-461d-bb5e-af9ae3287be5 (11) | | volume ron-app.server.01.data.1.5c8e51a9-db49-461d-bb5e-af9ae3287be5 for application ron-app was deleted
Returns all events that have occurred during a cluster’s lifespan.
End Point: /api/v3/robin_server/events/
Method: GET
URL Parameters:
sort=[id|-id]
: Utilizing this parameter results in the list of events returned being sorted by their id.total=[true|false]
: Utilizing this parameter results in the total number of events being returned.physical_node=<physical_nodename>
: Utilizing this parameter results in only events that occured on the specified host being returned.type=<event_type>
: Utilizing this parameter results in only events that match the specified type being returned.type_id=<event_type_id>
: Utilizing this parameter results in only events that match the specified type ID being returned.level=[INFO|WARN|ERROR]
: Utilizing this parameter results in only events of the specified level being returned.object_id=<object_id>
: Utilizing this parameter results in only events that are associated with the specified object ID being returned.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.X-Event-Port: <event_server_port>
: Port on which the Event Server is listening on; by default this is 29449. Note the value of this field should be a string.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"object_type":"SystemEvent",
"start":0,
"items":[
{
"id":61,
"zoneid":1597147518,
"type_id":6,
"object_id":"kubeletcert",
"nodeid":1,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597154460.8367121,
"create_time":"August 11, 2020 07:01:00",
"payload":{
"description":"Kubelet certificate on node 1597147518:1 is not expiring in next 30 days."
}
},
{
"id":60,
"zoneid":1597147518,
"type_id":10001,
"object_id":"8",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597152923.884905,
"create_time":"August 11, 2020 06:35:23",
"payload":{
"appname":"new-app-10",
"object_name":"new-app-10"
}
},
{
"id":59,
"zoneid":1597147518,
"type_id":16015,
"object_id":"new-app-10-server-01.t001-u000003.svc.cluster.local",
"nodeid":1,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597152923.551775,
"create_time":"August 11, 2020 06:35:23",
"payload":{
"nodename":"cscale-82-140.robinsystems.com",
"object_name":"new-app-10-server-01.t001-u000003.svc.cluster.local"
}
},
{
"id":58,
"zoneid":1597147518,
"type_id":10001,
"object_id":"7",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597152652.7069433,
"create_time":"August 11, 2020 06:30:52",
"payload":{
"appname":"new-app-2",
"object_name":"new-app-2"
}
},
{
"id":57,
"zoneid":1597147518,
"type_id":16015,
"object_id":"new-app-2-server-01.t001-u000003.svc.cluster.local",
"nodeid":1,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597152652.5186956,
"create_time":"August 11, 2020 06:30:52",
"payload":{
"nodename":"cscale-82-140.robinsystems.com",
"object_name":"new-app-2-server-01.t001-u000003.svc.cluster.local"
}
},
{
"id":56,
"zoneid":1597147518,
"type_id":3005,
"object_id":"robin-server",
"nodeid":1,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597152155.1629999,
"create_time":"August 11, 2020 06:22:35",
"payload":{
"description":"Health check passed for Service 'robin-server'",
"err_msg":"/usr/lib/python3.4/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings\n InsecureRequestWarning)\nNone service-check POLLING https://[10.9.82.140]:29442/api/v3/robin_server\nNone service-check READY https://[10.9.82.140]:29442/api/v3/robin_server\n",
"event_server_orig":"10.9.82.140",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"object_name":"robin-server",
"zonename":"default"
}
},
{
"id":55,
"zoneid":1597147518,
"type_id":16015,
"object_id":"new-app-server-01.t001-u000003.svc.cluster.local",
"nodeid":1,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597152120.7556264,
"create_time":"August 11, 2020 06:22:00",
"payload":{
"nodename":"cscale-82-140.robinsystems.com",
"object_name":"new-app-server-01.t001-u000003.svc.cluster.local"
}
},
{
"id":54,
"zoneid":1597147518,
"type_id":10001,
"object_id":"6",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597152119.8896759,
"create_time":"August 11, 2020 06:21:59",
"payload":{
"appname":"new-app",
"object_name":"new-app"
}
},
{
"id":53,
"zoneid":1597147518,
"type_id":3004,
"object_id":"robin-server",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597152119.1847906,
"create_time":"August 11, 2020 06:21:59",
"payload":{
"description":"Health check failed for Service 'robin-server'",
"err_msg":"None service-check POLLING https://[10.9.82.140]:29442/api/v3/robin_server\nNone service-check attempt (1/2) WARNING: HTTPSConnectionPool(host='10.9.82.140', port=29442): Max retries exceeded with url: /api/v3/robin_server (Caused by NewConnectionError('\u003curllib3.connection.VerifiedHTTPSConnection object at 0x7f48f9100cc0\u003e: Failed to establish a new connection: [Errno 111] Connection refused',))\nNone service-check FAILED https://[10.9.82.140]:29442/api/v3/robin_server, maximum of 2 attempts reached\n",
"event_server_orig":"10.9.82.140",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"object_name":"robin-server",
"zonename":"default"
}
},
{
"id":52,
"zoneid":1597147518,
"type_id":10002,
"object_id":"4",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151520.3362408,
"create_time":"August 11, 2020 06:12:00",
"payload":{
"appname":"midhaul-app",
"object_name":"midhaul-app"
}
},
{
"id":51,
"zoneid":1597147518,
"type_id":11004,
"object_id":"9",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151520.3282547,
"create_time":"August 11, 2020 06:12:00",
"payload":{
"appname":"midhaul-app",
"object_name":"midhaul-app.server.01.block.1.b7c9f1fd-d980-4a6c-b793-0a6354e556d7"
}
},
{
"id":50,
"zoneid":1597147518,
"type_id":11004,
"object_id":"8",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151520.3190093,
"create_time":"August 11, 2020 06:12:00",
"payload":{
"appname":"midhaul-app",
"object_name":"midhaul-app.server.01.data.1.f33b7cfb-11de-498a-93f2-85f74a8e3b21"
}
},
{
"id":49,
"zoneid":1597147518,
"type_id":16004,
"object_id":"midhaul-app-server-01.t001-u000003.svc.cluster.local",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597151518.9782567,
"create_time":"August 11, 2020 06:11:58",
"payload":{
"nodename":"cscale-82-140",
"object_name":"midhaul-app-server-01.t001-u000003.svc.cluster.local",
"zonename":"default"
}
},
{
"id":48,
"zoneid":1597147518,
"type_id":10002,
"object_id":"3",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151517.1828654,
"create_time":"August 11, 2020 06:11:57",
"payload":{
"appname":"test-app-2",
"object_name":"test-app-2"
}
},
{
"id":47,
"zoneid":1597147518,
"type_id":11004,
"object_id":"7",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151516.993125,
"create_time":"August 11, 2020 06:11:56",
"payload":{
"appname":"test-app-2",
"object_name":"test-app-2.server.01.block.1.fee2c5dc-6704-42d7-956b-5d07119b5a87"
}
},
{
"id":46,
"zoneid":1597147518,
"type_id":11004,
"object_id":"6",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151516.9838266,
"create_time":"August 11, 2020 06:11:56",
"payload":{
"appname":"test-app-2",
"object_name":"test-app-2.server.01.data.1.b9bb1991-b367-45a2-84c3-ed803687bfd0"
}
},
{
"id":45,
"zoneid":1597147518,
"type_id":16004,
"object_id":"test-app-2-server-01.t001-u000003.svc.cluster.local",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597151515.7935147,
"create_time":"August 11, 2020 06:11:55",
"payload":{
"nodename":"cscale-82-140",
"object_name":"test-app-2-server-01.t001-u000003.svc.cluster.local",
"zonename":"default"
}
},
{
"id":44,
"zoneid":1597147518,
"type_id":10002,
"object_id":"5",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151503.6052084,
"create_time":"August 11, 2020 06:11:43",
"payload":{
"appname":"ron-app",
"object_name":"ron-app"
}
},
{
"id":43,
"zoneid":1597147518,
"type_id":11004,
"object_id":"10",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151503.5967808,
"create_time":"August 11, 2020 06:11:43",
"payload":{
"appname":"ron-app",
"object_name":"ron-app.server.01.block.1.bb2c9bee-6d99-4ee1-9f09-4a3259a09726"
}
},
{
"id":42,
"zoneid":1597147518,
"type_id":11004,
"object_id":"11",
"nodeid":0,
"level":0,
"parent_ref":0,
"tenant_id":1,
"user_id":3,
"timestamp":1597151503.5880036,
"create_time":"August 11, 2020 06:11:43",
"payload":{
"appname":"ron-app",
"object_name":"ron-app.server.01.data.1.5c8e51a9-db49-461d-bb5e-af9ae3287be5"
}
}
],
"total":61,
"count":20,
"page_num":1,
"page_size":20
}
15.1.2. Listing event types¶
Robin has a set of predetermined events that are raised whenever the appropriate condition is met. They are identified by their type and their unique ID. The former gives an indication on what the event is referring. In addition, each event type has a status which refers to if the event is currently being tracked. To view all event types, run this command:
# robin event-type list <event_type>
--status <status>
--all
--json
|
ID of event type to inspect. Note: This is optional. |
|
Filter event types to include only those that match a particular status |
|
Display all event types regardless of status |
|
Display output in JSON format |
Example
# robin event-type list
ID | NAME | LEVEL | RESOLVES | STATUS
------+----------------------------------+-------+--------------------------------+--------
2 | EVENT_RESOLVER | INFO | | ACTIVE
1005 | EVENT_NODE_UNREACHABLE | WARN | | ACTIVE
1006 | EVENT_NODE_REACHABLE | INFO | ['EVENT_NODE_UNREACHABLE'] | ACTIVE
1007 | EVENT_NODE_DOWN | WARN | | ACTIVE
1008 | EVENT_NODE_UP | INFO | ['EVENT_NODE_DOWN'] | ACTIVE
1011 | EVENT_NODE_MEM_HIGHMARK | WARN | | ACTIVE
1012 | EVENT_NODE_MEM_LOWMARK | INFO | ['EVENT_NODE_MEM_HIGHMARK'] | ACTIVE
1015 | EVENT_NODE_ROOTFS_HIGHMARK | WARN | | ACTIVE
1016 | EVENT_NODE_ROOTFS_LOWMARK | INFO | ['EVENT_NODE_ROOTFS_HIGHMARK'] | ACTIVE
1023 | EVENT_NODE_SWAP_HIGHMARK | WARN | | ACTIVE
1024 | EVENT_NODE_SWAP_LOWMARK | INFO | ['EVENT_NODE_SWAP_HIGHMARK'] | ACTIVE
1027 | EVENT_NODE_TEMP_HIGHMARK | WARN | | ACTIVE
1028 | EVENT_NODE_TEMP_LOWMARK | INFO | ['EVENT_NODE_TEMP_HIGHMARK'] | ACTIVE
Returns the list of predetermined events that are raised whenever the appropriate condition is met.
End Point: /api/v3/robin_server/events/
Method: GET
URL Parameters:
name=<event_type_name
: Utilizing this parameter results in only events that match the specified name being returned.status=[0|1|2|3]
: Utilizing this parameter results in only events that match the specified status being returned. In this case 0 maps to ALL, 1 maps to ACTIVE, 2 maps to INACTIVE, 3 maps to OBSOLETE.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.X-Event-Port: <event_server_port>
: Port on which the Event Server is listening on; by default this is 29449. Note the value of this field should be a string.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"object_type":"EventType",
"start":1,
"items":[t
{
"id":11001,
"name":"EVENT_VOLUME_FAULTED",
"level":2,
"msg":"volume \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is faulted",
"status":1,
"resolves":[
]
},
{
"id":11002,
"name":"EVENT_VOLUME_DEGRADED",
"level":1,
"msg":"volume \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is degraded",
"status":1,
"resolves":[
"EVENT_VOLUME_FAULTED"
]
},
{
"id":11003,
"name":"EVENT_VOLUME_OK",
"level":0,
"msg":"volume \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is healthy",
"status":1,
"resolves":[
"EVENT_VOLUME_FAULTED"
]
},
{
"id":11004,
"name":"EVENT_VOLUME_DELETED",
"level":0,
"msg":"volume \u003cobject_name\u003e for application \u003cappname\u003e was deleted",
"status":1,
"resolves":[
"EVENT_VOLUME_HIGHMARK",
"EVENT_VOLUME_FAULTED"
]
},
{
"id":11005,
"name":"EVENT_VOLUME_HIGHMARK",
"level":1,
"msg":"Mount \u003cobject_name\u003e for vnode \u003cvnodename\u003e has reached highmark",
"status":1,
"resolves":[
]
},
{
"id":11006,
"name":"EVENT_VOLUME_LOWMARK",
"level":0,
"msg":"volume \u003cobject_name\u003e for vnode \u003cvnodename\u003e usage is in normal range",
"status":1,
"resolves":[
"EVENT_VOLUME_HIGHMARK"
]
},
{
"id":3004,
"name":"EVENT_PROC_UNHEALTHY",
"level":1,
"msg":"Health check failed for service {object_name} on node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
]
},
{
"id":3005,
"name":"EVENT_PROC_HEALTHY",
"level":0,
"msg":"Health check passed for service {object_name} on node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_PROC_UNHEALTHY"
]
},
{
"id":3008,
"name":"EVENT_PROC_MEM_HIGHMARK",
"level":1,
"msg":"process high memory high-watermark",
"status":1,
"resolves":[
]
},
{
"id":3009,
"name":"EVENT_PROC_MEM_LOWMARK",
"level":0,
"msg":"process memory that was previously at high-watermark dropped into safe zone",
"status":1,
"resolves":[
"EVENT_PROC_MEM_HIGHMARK"
]
},
{
"id":3012,
"name":"EVENT_PROC_CPU_HIGHMARK",
"level":1,
"msg":"process has hit CPU high-watermark",
"status":1,
"resolves":[
]
},
{
"id":3013,
"name":"EVENT_PROC_CPU_LOWMARK",
"level":0,
"msg":"process that has previously at high-watermark dropped into safe zone",
"status":1,
"resolves":[
"EVENT_PROC_CPU_HIGHMARK"
]
},
{
"id":10001,
"name":"EVENT_APP_CREATED",
"level":0,
"msg":"Application \u003cappname\u003e was created",
"status":1,
"resolves":[
]
},
{
"id":10002,
"name":"EVENT_APP_DELETED",
"level":0,
"msg":"Application {appname} was deleted",
"status":1,
"resolves":[
"EVENT_APP_ADMIN_WAIT"
]
},
{
"id":10003,
"name":"EVENT_APP_STARTED",
"level":0,
"msg":"Application \u003cappname\u003e was started",
"status":1,
"resolves":[
]
},
{
"id":10004,
"name":"EVENT_APP_STOPPED",
"level":0,
"msg":"Application \u003cappname\u003e was stopped",
"status":1,
"resolves":[
]
},
{
"id":10005,
"name":"EVENT_APP_FROZEN",
"level":0,
"msg":"Application \u003cappname\u003e was frozen",
"status":1,
"resolves":[
]
},
{
"id":10006,
"name":"EVENT_APP_THAWED",
"level":0,
"msg":"Application \u003cappname\u003e was thawed",
"status":1,
"resolves":[
]
},
{
"id":10008,
"name":"EVENT_APP_ADMIN_WAIT",
"level":1,
"msg":"Application \u003cappname\u003e is waiting for admin's attention",
"status":1,
"resolves":[
]
},
{
"id":10010,
"name":"EVENT_APP_SNAPSHOTTED",
"level":0,
"msg":"Application \u003cappname\u003e was snapshotted",
"status":1,
"resolves":[
]
},
{
"id":10011,
"name":"EVENT_APP_ROLLEDBACK",
"level":0,
"msg":"Application \u003cappname\u003e was rolled back",
"status":1,
"resolves":[
]
},
{
"id":10012,
"name":"EVENT_APP_CLONED",
"level":0,
"msg":"Application \u003cappname\u003e was cloned",
"status":1,
"resolves":[
]
},
{
"id":10013,
"name":"EVENT_APP_SCALED",
"level":0,
"msg":"Application \u003cappname\u003e was scaled",
"status":1,
"resolves":[
]
},
{
"id":10014,
"name":"EVENT_APP_EVACUATED",
"level":0,
"msg":"Application \u003cappname\u003e was evacuated",
"status":1,
"resolves":[
]
},
{
"id":10015,
"name":"EVENT_APP_DEPLOYED",
"level":0,
"msg":"Application \u003cappname\u003e was deployed",
"status":1,
"resolves":[
]
},
{
"id":10016,
"name":"EVENT_APP_PROBED",
"level":0,
"msg":"Application \u003cappname\u003e was probed",
"status":1,
"resolves":[
"EVENT_APP_ADMIN_WAIT",
"EVENT_APP_FAULTED"
]
},
{
"id":10017,
"name":"EVENT_APP_UPGRADED",
"level":0,
"msg":"Application \u003cappname\u003e was upgraded",
"status":1,
"resolves":[
]
},
{
"id":10018,
"name":"EVENT_APP_BACKED_UP",
"level":0,
"msg":"Application \u003cappname\u003e was backed up",
"status":1,
"resolves":[
"EVENT_APP_BACKUP_FAILED"
]
},
{
"id":10019,
"name":"EVENT_APP_BACKUP_FAILED",
"level":1,
"msg":"Application \u003cappname\u003e failed to be backed up.",
"status":1,
"resolves":[
]
},
{
"id":10020,
"name":"EVENT_APP_RESTORED",
"level":0,
"msg":"Application \u003cappname\u003e was restored",
"status":1,
"resolves":[
]
},
{
"id":14001,
"name":"EVENT_LICENSE_VIOLATION",
"level":1,
"msg":"Cluster is in VIOLATION of license limits, please see license info for more details.",
"status":1,
"resolves":[
]
},
{
"id":14002,
"name":"EVENT_LICENSE_OK",
"level":0,
"msg":"Cluster license is healthy.",
"status":1,
"resolves":[
"EVENT_LICENSE_VIOLATION"
]
},
{
"id":16001,
"name":"EVENT_POD_STARTED",
"level":0,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was started",
"status":1,
"resolves":[
"EVENT_POD_ERROR",
"EVENT_POD_CRASHED",
"EVENT_POD_PLAN_FAILED",
"EVENT_POD_RELOCATE_FAILED",
"EVENT_POD_DEPLOY_FAILED",
"EVENT_POD_FAULTED"
]
},
{
"id":16002,
"name":"EVENT_POD_STOPPED",
"level":0,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was stopped",
"status":1,
"resolves":[
"EVENT_POD_STOP_FAILED"
]
},
{
"id":16003,
"name":"EVENT_POD_RESTARTED",
"level":0,
"msg":"POD \u003cobject_name\u003e was restarted on {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_POD_ERROR",
"EVENT_POD_CRASHED",
"EVENT_POD_RELOCATE_FAILED",
"EVENT_POD_DEPLOY_FAILED",
"EVENT_POD_FAULTED",
"EVENT_POD_STOP_FAILED"
]
},
{
"id":16004,
"name":"EVENT_POD_DELETED",
"level":0,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was deleted",
"status":1,
"resolves":[
"EVENT_POD_ERROR",
"EVENT_POD_SWAP_HIGHMARK",
"EVENT_POD_CRASHED",
"EVENT_POD_PLAN_FAILED",
"EVENT_POD_RELOCATE_FAILED",
"EVENT_POD_DEPLOY_FAILED",
"EVENT_POD_FAULTED"
]
},
{
"id":16005,
"name":"EVENT_POD_CRASHED",
"level":2,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e crashed",
"status":1,
"resolves":[
]
},
{
"id":16006,
"name":"EVENT_POD_FAULTED",
"level":2,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e repeatedly FAULTED",
"status":1,
"resolves":[
]
},
{
"id":16007,
"name":"EVENT_POD_PLAN_FAILED",
"level":0,
"msg":"Deployment plan generation for POD \u003cobject_name\u003e failed",
"status":1,
"resolves":[
]
},
{
"id":16008,
"name":"EVENT_POD_RELOCATED",
"level":0,
"msg":"POD \u003cobject_name\u003e was relocated to {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_POD_ERROR",
"EVENT_POD_CRASHED",
"EVENT_POD_PLAN_FAILED",
"EVENT_POD_RELOCATE_FAILED",
"EVENT_POD_DEPLOY_FAILED",
"EVENT_POD_FAULTED",
"EVENT_POD_STOP_FAILED"
]
},
{
"id":16009,
"name":"EVENT_POD_RELOCATE_FAILED",
"level":2,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e could not be relocated",
"status":1,
"resolves":[
]
},
{
"id":16010,
"name":"EVENT_POD_DEPLOY_FAILED",
"level":2,
"msg":"POD \u003cobject_name\u003e could not be deployed on node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
]
},
{
"id":16011,
"name":"EVENT_POD_ERROR",
"level":2,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e experienced an error",
"status":1,
"resolves":[
]
},
{
"id":16012,
"name":"EVENT_POD_RESOLVED",
"level":0,
"msg":"POD \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e error has been resolved",
"status":1,
"resolves":[
"EVENT_POD_ERROR"
]
},
{
"id":16013,
"name":"EVENT_POD_STOP_FAILED",
"level":2,
"msg":"POD \u003cobject_name\u003e could not be stopped on node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
]
},
{
"id":16014,
"name":"EVENT_POD_SWAP_HIGHMARK",
"level":1,
"msg":"POD \u003cobject_name\u003e has reached swap space high watermark",
"status":1,
"resolves":[
]
},
{
"id":16015,
"name":"EVENT_POD_SWAP_LOWMARK",
"level":0,
"msg":"POD \u003cobject_name\u003e has swap space usage is in normal range",
"status":1,
"resolves":[
"EVENT_POD_SWAP_HIGHMARK"
]
},
{
"id":1005,
"name":"EVENT_NODE_UNREACHABLE",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e is unreachable",
"status":1,
"resolves":[
]
},
{
"id":1006,
"name":"EVENT_NODE_REACHABLE",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e is now reachable",
"status":1,
"resolves":[
"EVENT_NODE_UNREACHABLE"
]
},
{
"id":1007,
"name":"EVENT_NODE_DOWN",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has been marked as down",
"status":1,
"resolves":[
]
},
{
"id":1008,
"name":"EVENT_NODE_UP",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e is up after being marked as down",
"status":1,
"resolves":[
"EVENT_NODE_DOWN"
]
},
{
"id":1011,
"name":"EVENT_NODE_MEM_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has reached memory high-watermark",
"status":1,
"resolves":[
]
},
{
"id":1012,
"name":"EVENT_NODE_MEM_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e has dropped below memory high-watermark to safe zone",
"status":1,
"resolves":[
"EVENT_NODE_MEM_HIGHMARK"
]
},
{
"id":1015,
"name":"EVENT_NODE_ROOTFS_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e root filesystem usage has hit high watermark",
"status":1,
"resolves":[
]
},
{
"id":1016,
"name":"EVENT_NODE_ROOTFS_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e root filesystem usage is in safe zone",
"status":1,
"resolves":[
"EVENT_NODE_ROOTFS_HIGHMARK"
]
},
{
"id":1023,
"name":"EVENT_NODE_SWAP_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e swap space has reached highmark",
"status":1,
"resolves":[
]
},
{
"id":1024,
"name":"EVENT_NODE_SWAP_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e swap space usage is in normal range",
"status":1,
"resolves":[
"EVENT_NODE_SWAP_HIGHMARK"
]
},
{
"id":1027,
"name":"EVENT_NODE_TEMP_HIGHMARK",
"level":1,
"msg":"",
"status":1,
"resolves":[
]
},
{
"id":1028,
"name":"EVENT_NODE_TEMP_LOWMARK",
"level":0,
"msg":"",
"status":1,
"resolves":[
"EVENT_NODE_TEMP_HIGHMARK"
]
},
{
"id":1029,
"name":"EVENT_NODE_NET_HIGHMARK",
"level":1,
"msg":"",
"status":1,
"resolves":[
]
},
{
"id":1030,
"name":"EVENT_NODE_NET_LOWMARK",
"level":0,
"msg":"",
"status":1,
"resolves":[
"EVENT_NODE_NET_HIGHMARK"
]
},
{
"id":1033,
"name":"EVENT_NODE_REMOVED",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e has been removed",
"status":1,
"resolves":[
]
},
{
"id":1034,
"name":"EVENT_NODE_VARFS_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has reached /var high watermark",
"status":1,
"resolves":[
]
},
{
"id":1035,
"name":"EVENT_NODE_VARFS_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e /var usage is in normal range",
"status":1,
"resolves":[
"EVENT_NODE_VARFS_HIGHMARK"
]
},
{
"id":1036,
"name":"EVENT_NODE_VARLOG_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has reached /var/log high watermark",
"status":1,
"resolves":[
]
},
{
"id":1037,
"name":"EVENT_NODE_VARLOG_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e /var/log usage is in normal range",
"status":1,
"resolves":[
"EVENT_NODE_VARLOG_HIGHMARK"
]
},
{
"id":1038,
"name":"EVENT_NODE_VARPGSQL_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has reached /var/lib/pgsql high watermark",
"status":1,
"resolves":[
]
},
{
"id":1039,
"name":"EVENT_NODE_VARPGSQL_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e /var/lib/pgsql usage is in normal range",
"status":1,
"resolves":[
"EVENT_NODE_VARPGSQL_HIGHMARK"
]
},
{
"id":1040,
"name":"EVENT_NODE_VARCRASH_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has reached /var/crash high watermark",
"status":1,
"resolves":[
]
},
{
"id":1041,
"name":"EVENT_NODE_VARCRASH_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e /var/crash usage is in normal range",
"status":1,
"resolves":[
"EVENT_NODE_VARCRASH_HIGHMARK"
]
},
{
"id":1042,
"name":"EVENT_NODE_VARROBIN_HIGHMARK",
"level":1,
"msg":"Node {zonename}:\u003cnodename\u003e has reached /var/lib/robin high watermark",
"status":1,
"resolves":[
]
},
{
"id":1043,
"name":"EVENT_NODE_VARROBIN_LOWMARK",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e /var/lib/robin usage is in normal range",
"status":1,
"resolves":[
"EVENT_NODE_VARROBIN_HIGHMARK"
]
},
{
"id":1044,
"name":"EVENT_NODE_ADDED",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e has been added",
"status":1,
"resolves":[
]
},
{
"id":4001,
"name":"EVENT_CONT_STARTED",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was started",
"status":1,
"resolves":[
"EVENT_CONT_ERROR",
"EVENT_CONT_CRASHED",
"EVENT_CONT_PLAN_FAILED",
"EVENT_CONT_RELOCATE_FAILED",
"EVENT_CONT_DEPLOY_FAILED"
]
},
{
"id":4002,
"name":"EVENT_CONT_STOPPED",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was stopped",
"status":1,
"resolves":[
]
},
{
"id":4003,
"name":"EVENT_CONT_RESTARTED",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was restarted",
"status":1,
"resolves":[
"EVENT_CONT_ERROR",
"EVENT_CONT_CRASHED",
"EVENT_CONT_RELOCATE_FAILED",
"EVENT_CONT_DEPLOY_FAILED"
]
},
{
"id":4004,
"name":"EVENT_CONT_DELETED",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was deleted",
"status":1,
"resolves":[
"EVENT_CONT_SWAP_HIGHMARK",
"EVENT_CONT_CRASHED",
"EVENT_CONT_PLAN_FAILED",
"EVENT_CONT_RELOCATE_FAILED",
"EVENT_CONT_DEPLOY_FAILED"
]
},
{
"id":4007,
"name":"EVENT_CONT_CRASHED",
"level":2,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e crashed",
"status":1,
"resolves":[
]
},
{
"id":4011,
"name":"EVENT_CONT_MEM_HIGHMARK",
"level":1,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e reached memory high-watermark",
"status":1,
"resolves":[
]
},
{
"id":4012,
"name":"EVENT_CONT_MEM_LOWMARK",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was previously at memory high-watermark, but dropped to safe zone now",
"status":1,
"resolves":[
"EVENT_CONT_MEM_HIGHMARK"
]
},
{
"id":4014,
"name":"EVENT_CONT_CPU_LOWMARK",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was previously at CPU utilization high-watermark, but has dropped to safe zone now",
"status":1,
"resolves":[
"EVENT_CONT_CPU_HIGHMARK"
]
},
{
"id":4016,
"name":"EVENT_CONT_BLKIO_LOWMARK",
"level":0,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e was previously at Block IO high-watermark, but has dropped to safe zone now",
"status":1,
"resolves":[
"EVENT_CONT_BLKIO_HIGHMARK"
]
},
{
"id":4023,
"name":"EVENT_CONT_PLAN_FAILED",
"level":0,
"msg":"Deployment plan generation for container \u003cobject_name\u003e failed",
"status":1,
"resolves":[
]
},
{
"id":4024,
"name":"EVENT_CONT_RELOCATED",
"level":0,
"msg":"container \u003cobject_name\u003e was relocated to {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_CONT_ERROR",
"EVENT_CONT_CRASHED",
"EVENT_CONT_PLAN_FAILED",
"EVENT_CONT_RELOCATE_FAILED",
"EVENT_CONT_DEPLOY_FAILED"
]
},
{
"id":4025,
"name":"EVENT_CONT_RELOCATE_FAILED",
"level":2,
"msg":"container \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e could not be relocated",
"status":1,
"resolves":[
]
},
{
"id":4026,
"name":"EVENT_CONT_DEPLOY_FAILED",
"level":2,
"msg":"container \u003cobject_name\u003e could not be deployed on node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
]
},
{
"id":4027,
"name":"EVENT_CONT_SWAP_HIGHMARK",
"level":1,
"msg":"container \u003cobject_name\u003e has reached swap space high watermark",
"status":1,
"resolves":[
]
},
{
"id":4028,
"name":"EVENT_CONT_SWAP_LOWMARK",
"level":0,
"msg":"container \u003cobject_name\u003e has swap space usage is in normal range",
"status":1,
"resolves":[
"EVENT_CONT_SWAP_HIGHMARK"
]
},
{
"id":15001,
"name":"EVENT_COLLECTION_ERROR",
"level":2,
"msg":"File Collection {object_name} on node {zonename}:\u003cnodename\u003e experienced an error.",
"status":1,
"resolves":[
]
},
{
"id":15002,
"name":"EVENT_COLLECTION_OFFLINE",
"level":0,
"msg":"File Collection {object_name} on node {zonename}:\u003cnodename\u003e is offline.",
"status":1,
"resolves":[
"EVENT_COLLECTION_ERROR",
"EVENT_COLLECTION_OFFLINE_FAILED",
"EVENT_COLLECTION_ONLINE_FAILED"
]
},
{
"id":15003,
"name":"EVENT_COLLECTION_OFFLINE_FAILED",
"level":2,
"msg":"Failed to take File Collection {object_name} on node {zonename}:\u003cnodename\u003e offline.",
"status":1,
"resolves":[
]
},
{
"id":15004,
"name":"EVENT_COLLECTION_ONLINE",
"level":0,
"msg":"File Collection {object_name} on node {zonename}:\u003cnodename\u003e is online.",
"status":1,
"resolves":[
"EVENT_COLLECTION_ERROR",
"EVENT_COLLECTION_OFFLINE_FAILED",
"EVENT_COLLECTION_ONLINE_FAILED"
]
},
{
"id":15005,
"name":"EVENT_COLLECTION_ONLINE_FAILED",
"level":2,
"msg":"Failed to take File Collection {object_name} on node {zonename}:\u003cnodename\u003e online.",
"status":1,
"resolves":[
]
},
{
"id":12001,
"name":"EVENT_MGT_MASTER_FAILOVER",
"level":0,
"msg":"Node {zonename}:\u003cnodename\u003e is now the MASTER Manager (\u003cdescription\u003e)",
"status":1,
"resolves":[
]
},
{
"id":12002,
"name":"EVENT_YUM_USAGE",
"level":1,
"msg":"User on node {zonename}:\u003cnodename\u003e is utilizing the YUM package manager with the command: yum \u003cdescription\u003e",
"status":1,
"resolves":[
]
},
{
"id":12003,
"name":"EVENT_NOTIFICATION",
"level":0,
"msg":"",
"status":1,
"resolves":[
]
},
{
"id":13001,
"name":"EVENT_SYSCONFIG_PRECHECK_WARNING",
"level":1,
"msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following System Configuration precheck warning was ignored: \u003cdescription\u003e",
"status":1,
"resolves":[
]
},
{
"id":13002,
"name":"EVENT_PACKAGES_PRECHECK_WARNING",
"level":1,
"msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following Package Available precheck warning was ignored: \u003cdescription\u003e",
"status":1,
"resolves":[
]
},
{
"id":13003,
"name":"EVENT_PHYPROP_PRECHECK_WARNING",
"level":1,
"msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following Physical System Properties precheck warning was ignored: \u003cdescription\u003e",
"status":1,
"resolves":[
]
},
{
"id":13004,
"name":"EVENT_NETWORK_PRECHECK_WARNING",
"level":1,
"msg":" When installing ROBIN on node {zonename}:\u003cnodename\u003e, the following Networking precheck warning was ignored: \u003cdescription\u003e",
"status":1,
"resolves":[
]
},
{
"id":2,
"name":"EVENT_RESOLVER",
"level":0,
"msg":"Active alerts for object '{object_name}' on node {zonename}:\u003cnodename\u003e have been resolved.",
"status":1,
"resolves":[
]
},
{
"id":3,
"name":"EVENT_KUBELET_CERT_ERROR",
"level":1,
"msg":"Kubelet certificate check failed. Please check the kubelet certificate in node on node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
]
},
{
"id":4,
"name":"EVENT_KUBELET_CERT_EXPIRY",
"level":1,
"msg":"Kubelet certificate on node {zonename}:\u003cnodename\u003e will expire soon.",
"status":1,
"resolves":[
]
},
{
"id":5,
"name":"EVENT_KUBELET_CERT_EXPIRED",
"level":2,
"msg":"Kubelet certificate on node {zonename}:\u003cnodename\u003e expired.",
"status":1,
"resolves":[
]
},
{
"id":6,
"name":"EVENT_KUBELET_CERT_OK",
"level":0,
"msg":"Kubelet certificate on {zonename}:\u003cnodename\u003e is not expiring soon.",
"status":1,
"resolves":[
"EVENT_KUBELET_CERT_ERROR",
"EVENT_KUBELET_CERT_EXPIRY",
"EVENT_KUBELET_CERT_EXPIRED"
]
},
{
"id":7,
"name":"EVENT_K8S_CERT_CHECK_ERROR",
"level":1,
"msg":"Certificate check failed. Please check 'kubeadm alpha certs check-expiration' is responding correctly in Robin master node.",
"status":1,
"resolves":[
]
},
{
"id":8,
"name":"EVENT_K8S_CERT_EXPIRY",
"level":1,
"msg":"One or more K8S certificates will expire soon.",
"status":1,
"resolves":[
]
},
{
"id":9,
"name":"EVENT_K8S_CERT_EXPIRED",
"level":2,
"msg":"One or more K8S certificates expired.",
"status":1,
"resolves":[
]
},
{
"id":10,
"name":"EVENT_K8S_CERT_OK",
"level":0,
"msg":"K8S certificates are not expiring soon.",
"status":1,
"resolves":[
"EVENT_K8S_CERT_CHECK_ERROR",
"EVENT_K8S_CERT_EXPIRY",
"EVENT_K8S_CERT_EXPIRED"
]
},
{
"id":5001,
"name":"EVENT_DISK_USAGE_HIGHMARK",
"level":1,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e high-watermark",
"status":1,
"resolves":[
]
},
{
"id":5002,
"name":"EVENT_DISK_USAGE_LOWMARK",
"level":0,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e has dropped below disk high-watermark to safe zone",
"status":1,
"resolves":[
"EVENT_DISK_USAGE_HIGHMARK"
]
},
{
"id":5003,
"name":"EVENT_DISK_TEMP_HIGHMARK",
"level":1,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e temperature high-watermark",
"status":1,
"resolves":[
]
},
{
"id":5004,
"name":"EVENT_DISK_TEMP_LOWMARK",
"level":0,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e has dropped below temperature high-watermark to safe zone",
"status":1,
"resolves":[
"EVENT_DISK_TEMP_HIGHMARK"
]
},
{
"id":5005,
"name":"EVENT_DISK_FAULTED",
"level":2,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is faulted",
"status":1,
"resolves":[
]
},
{
"id":5006,
"name":"EVENT_DISK_OFFLINE",
"level":2,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is offline",
"status":1,
"resolves":[
]
},
{
"id":5007,
"name":"EVENT_DISK_DEGRADED",
"level":2,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is degraded",
"status":1,
"resolves":[
]
},
{
"id":5008,
"name":"EVENT_DISK_OK",
"level":0,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e is healthy",
"status":1,
"resolves":[
"EVENT_DISK_FAULTED",
"EVENT_DISK_DEGRADED",
"EVENT_DISK_OFFLINE"
]
},
{
"id":5009,
"name":"EVENT_DISK_DETACH_FAILED",
"level":2,
"msg":"disk \u003cobject_name\u003e failed to detach from node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
]
},
{
"id":5010,
"name":"EVENT_DISK_ATTACH_FAILED",
"level":2,
"msg":"disk \u003cobject_name\u003e failed to attach to node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_DISK_DETACH_FAILED",
"EVENT_DISK_ATTACH_FAILED"
]
},
{
"id":5011,
"name":"EVENT_DISK_DETACHED",
"level":2,
"msg":"disk \u003cobject_name\u003e detached from node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_DISK_DETACH_FAILED"
]
},
{
"id":5012,
"name":"EVENT_DISK_ATTACHED",
"level":0,
"msg":"disk \u003cobject_name\u003e is attached to node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_DISK_DETACHED",
"EVENT_DISK_DETACH_FAILED",
"EVENT_DISK_ATTACH_FAILED"
]
},
{
"id":5013,
"name":"EVENT_DISK_REMOVE_FAILED",
"level":2,
"msg":"disk \u003cobject_name\u003e removal from node {zonename}:\u003cnodename\u003e failed",
"status":1,
"resolves":[
]
},
{
"id":5014,
"name":"EVENT_DISK_REMOVED",
"level":0,
"msg":"disk \u003cobject_name\u003e removed from node {zonename}:\u003cnodename\u003e",
"status":1,
"resolves":[
"EVENT_DISK_DETACHED",
"EVENT_DISK_DETACH_FAILED",
"EVENT_DISK_ATTACH_FAILED",
"EVENT_DISK_FAULTED",
"EVENT_DISK_DEGRADED",
"EVENT_DISK_OFFLINE",
"EVENT_DISK_REMOVE_FAILED",
"EVENT_DISK_SPACE_LIMIT_EXCEED"
]
},
{
"id":5015,
"name":"EVENT_DISK_SPACE_LIMIT_EXCEED",
"level":1,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e exceeds used space threshold",
"status":1,
"resolves":[
]
},
{
"id":5016,
"name":"EVENT_DISK_SPACE_LIMIT_OK",
"level":0,
"msg":"disk \u003cobject_name\u003e on node {zonename}:\u003cnodename\u003e used space threshold ok",
"status":1,
"resolves":[
"EVENT_DISK_SPACE_LIMIT_EXCEED"
]
}
],
"total":178,
"count":125,
"page_num":0
}
15.1.3. Notification Response Format¶
Detailed below are the various response structures of the event notifications received by each type of external subscriber. More details on subscribers and their linked subscriptions can be found here.
15.1.3.1. Kafka subscribers¶
The JSON payload received by Kafka based subscribers for event notifications is given below:
Expand
{
"$schema":"http://json-schema.org/draft-07/schema",
"type":"object",
"title":"The event schema",
"description":"The schema of the event.",
"required":[
"event_id",
"event_level",
"event_type",
"event_time",
"object_id",
"description",
"zone",
"object_uri",
"master_node_name",
"cluster_uuid"
],
"properties":{
"event_id":{
"$id":"#/properties/event_id",
"type":"integer",
"title":"event id",
"description":"The event id"
},
"event_level":{
"$id":"#/properties/event_level",
"type":"string",
"title":"event level",
"description":"The event level"
},
"event_type":{
"$id":"#/properties/event_type",
"type":"string",
"title":"event type",
"description":"The event type"
},
"event_time":{
"$id":"#/properties/event_time",
"type":"string",
"title":"event time",
"description":"The time of the event occured"
},
"object_id":{
"$id":"#/properties/object_id",
"type":"string",
"title":"object_id",
"description":"The object id on which the event has occured"
},
"description":{
"$id":"#/properties/description",
"type":"string",
"title":"The description of the event",
"description":"The description of the event"
},
"zone":{
"$id":"#/properties/zone",
"type":"string",
"title":"Robin cluster zone",
"description":"Robin cluster zone"
},
"object_uri":{
"$id":"#/properties/object_uri",
"type":"string",
"title":"Object URI to get more information",
"description":"Object URI to get more information",
},
"master_node_name":{
"$id":"#/properties/master_node_name",
"type":"string",
"title":"Current master node name",
"description":"Current master node name",
},
"cluster_uuid":{
"$id":"#/properties/cluster_uuid",
"type":"string",
"title":"Cluster UUID",
"description":"Cluster UUID",
},
"tags":{
"$id":"#/properties/tags",
"type":"string",
"title":"Tags which carries additional information",
"description":"Tags which carries additional information",
}
},
}
15.2. Alerts¶
Robin alerts are generated to notify the logged-in user that a negative event has occurred in the cluster. They are only generated when events at level WARN or ERROR are created. This is because these events might require immediate attention. Once an alert is raised, its state is set to ACTIVE. The alert will only be resolved if a resolving event is created or if a user manually resolves the alert.
The following commands are described in this section:
|
List all alerts |
15.2.1. Listing all alerts¶
Events of level ERROR or WARN are considered to be alerts as they need to be resolved before they can be dismissed. Robin stores all alerts that have occurred during the lifespan of a cluster. To view these alerts, run this command:
# robin alert list <id>
--page-size <page_size>
--page <page>
--hostname <hostname>
--nodeid <node_id>
--type <type>
--type-id <type_id>
--level <level>
--object <object>
--all
--ascending
--total
--json
|
ID of alert to inspect. Note: This is an optional parameter. |
|
Maximum number of alert records to include in a single output page |
|
Starting page number (relative to total number of pages of PAGE_SIZE) |
|
Filter alerts to include only those originating from a particular host |
|
Filter alerts to include only with a particular node ID |
|
Filter alerts to include only those of a particular event type |
|
Filter alerts to include only those with a particular event type-id |
|
Filter alerts to include only those of a specific LEVEL |
|
Filter alerts to include only those which concern a particular object |
|
Return all alerts, even those that were resolved |
|
Return events in ascending order of ID |
|
Return the total number of alerts |
|
Display output in JSON format |
Example
# robin alert list
ID | START_TIME | CUR_TIME | EVENT_TYPE | CUR_LEVEL | STATE | NODE_ID | OBJECT | EVENT_COUNT
-----+----------------------+----------------------+----------------------------------+-----------+--------+--------------+--------------------------------------------+-------------
1520 | 06 Feb 2020 15:04:53 | 06 Feb 2020 15:12:14 | EVENT_CONT_DEPLOY_FAILED | ERROR | ACTIVE | 1580198912:0 | my-mysql-01.t001-u000003.svc.cluster.local | 15
1519 | 06 Feb 2020 15:04:46 | 06 Feb 2020 15:04:46 | EVENT_CONT_ERROR | ERROR | ACTIVE | 1580198912:0 | my-mysql-01.t001-u000003.svc.cluster.local | 1
1518 | 31 Jan 2020 22:27:08 | 31 Jan 2020 22:27:08 | EVENT_APP_ADMIN_WAIT | WARN | ACTIVE | 1580198912:0 | my | 1
1360 | 29 Jan 2020 00:01:50 | 29 Jan 2020 00:01:50 | EVENT_CONT_DEPLOY_FAILED | ERROR | ACTIVE | 1580198912:0 | vnode-ipv6-55 | 1
846 | 28 Jan 2020 22:43:07 | 28 Jan 2020 22:43:07 | EVENT_CONT_ERROR | ERROR | ACTIVE | 1580198912:0 | vnode-ipv6-55 | 1
3 | 27 Jan 2020 16:27:08 | 27 Jan 2020 16:27:10 | EVENT_SYSCONFIG_PRECHECK_WARNING | WARN | ACTIVE | 1580198912:3 | cscale-82-38.robinsystems.com | 30
2 | 27 Jan 2020 16:16:55 | 27 Jan 2020 16:16:55 | EVENT_SYSCONFIG_PRECHECK_WARNING | WARN | ACTIVE | 1580198912:2 | cscale-82-37.robinsystems.com | 15
1 | 27 Jan 2020 16:09:09 | 27 Jan 2020 16:09:11 | EVENT_SYSCONFIG_PRECHECK_WARNING | WARN | ACTIVE | 1580198912:1 | cscale-82-36.robinsystems.com | 30
Returns all events that have occurred during a cluster’s lifespan.
End Point: /api/v3/robin_server/events/
Method: GET
URL Parameters:
sort=[id|-id]
: Utilizing this parameter results in the list of events returned being sorted by their id.total=[true|false]
: Utilizing this parameter results in the total number of events being returned.state="ACTIVE"
: Utilizing this parameter results in only events which havent been resolved being returned.physical_node=<physical_nodename>
: Utilizing this parameter results in only events that occured on the specified host being returned.nodeid=<physical_node_id>
: Utilizing this parameter results in only events that occured on the host with the specified id being returned.type=<event_type>
: Utilizing this parameter results in only events that match the specified type being returned.type_id=<event_type_id>
: Utilizing this parameter results in only events that match the specified type ID being returned.object_id=<object_id>
: Utilizing this parameter results in only events that are associated with the specified object ID being returned.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.X-Event-Port: <event_server_port>
: Port on which the Event Server is listening on; by default this is 29449. Note the value of this field should be a string.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"start":1,
"items":[
{
"id":1,
"zoneid":1597147518,
"nodeid":1,
"object_id":"cscale-82-140.robinsystems.com",
"type_id":13001,
"state":2,
"start_level":1,
"cur_level":1,
"count":11,
"tenant_id":0,
"user_id":0,
"start_time":1597147561.87808,
"cur_time":1597147561.987619,
"event_instances":[
{
"id":9,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.8780801,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/var/lib/docker not a partition (Folder not present)",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":10,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.8871074,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/var/lib/docker folder not present. Required space: 40G",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":11,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.8955083,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"PCI device passthrough not enabled. Set iommu=pt and intel_iommu=on in GRUB if planning to deploy KVM+SRIOV apps",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":12,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9149632,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds/var/log not a partition (Folder not present)",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":13,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9292195,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds/var/log folder not present. Required space: 60G",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":14,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9378107,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds/var/crash not a partition (Folder not present)",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":15,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9461741,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds/var/crash folder not present. Required space: 100G",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":16,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9559772,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds not a partition (Folder not present)",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":17,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9660244,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds folder not present. Required space: 40G",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":18,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9781108,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds/var/lib/pgsql not a partition (Folder not present)",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
},
{
"id":19,
"zoneid":1597147518,
"type_id":13001,
"object_id":"cscale-82-140.robinsystems.com",
"nodeid":1,
"level":1,
"parent_ref":0,
"tenant_id":0,
"user_id":0,
"timestamp":1597147561.9876194,
"create_time":"August 11, 2020 05:06:01",
"payload":{
"description":"/home/robinds/var/lib/pgsql folder not present. Required space: 50G",
"hostname":"cscale-82-140.robinsystems.com",
"nodename":"cscale-82-140.robinsystems.com",
"zonename":"default"
}
}
]
}
],
"total":1,
"count":1,
"page_num":1,
"page_size":20
}
15.2.2. Notification Response Format¶
Detailed below are the various response structures of the alert notifications received by each type of external subscriber. More details on subscribers and their linked subscriptions can be found here.
15.2.2.1. Kafka subscribers¶
The JSON payload received by Kafka based subscribers for alert notifications is shown below:
Expand
{
"$schema":"http://json-schema.org/draft-07/schema",
"type":"object",
"title":"Alert Object definition",
"description":"The alert schema definition",
"examples":[
{
"alert_id":4,
"alert_type":"EVENT_PROC_UNHEALTHY",
"object_id":"stormgr-server",
"description":"Health check failed for service stormgr-server on node default:telx101-compute04.robinsystems.com",
"zone":"default",
"node":"telx101-compute04.robinsystems.com",
"state":"ACTIVE",
"start_level":"WARN",
"cur_level":"WARN",
"count":1,
"start_time":"2020-11-13T11:05:00.626852296-08:00",
"cur_time":"2020-11-13T11:05:00.626852296-08:00"
}
],
"required":[
"alert_id",
"alert_type",
"object_id",
"description",
"zone",
"node",
"state",
"start_level",
"cur_level",
"count",
"start_time",
"cur_time",
"object_uri",
"master_node_name",
"cluster_uuid"
],
"properties":{
"alert_id":{
"$id":"#/properties/alert_id",
"type":"integer",
"title":"alert_id",
"description":"The alert id",
},
"alert_type":{
"$id":"#/properties/alert_type",
"type":"string",
"title":"alert_type",
"description":"The type of the alert",
"examples":[
"EVENT_PROC_UNHEALTHY"
]
},
"object_id":{
"$id":"#/properties/object_id",
"type":"string",
"title":"object_id",
"description":"The object id on which the alert is generated",
},
"description":{
"$id":"#/properties/description",
"type":"string",
"title":"Alert description",
"description":"Alert description",
"default":"",
"examples":[
"Health check failed for service stormgr-server on node default:telx101-compute04.robinsystems.com"
]
},
"zone":{
"$id":"#/properties/zone",
"type":"string",
"title":"zone id",
"description":"The cluster zone id",
},
"node":{
"$id":"#/properties/node",
"type":"string",
"title":"node id",
"description":"The node id of the object, if applicable",
"default":"",
},
"state":{
"$id":"#/properties/state",
"type":"string",
"title":"Alert state",
"description":"Alert state",
},
"start_level":{
"$id":"#/properties/start_level",
"type":"string",
"title":"Alert start_level",
"description":"Alert start level",
},
"cur_level":{
"$id":"#/properties/cur_level",
"type":"string",
"title":"Alert current level",
"description":"Alert current level",
},
"count":{
"$id":"#/properties/count",
"type":"integer",
"title":"Number of events for this event",
"description":"The number of events that have raised for this alert",
},
"start_time":{
"$id":"#/properties/start_time",
"type":"string",
"title":"Alert start time",
"description":"Alert start time",
"examples":[
"2020-11-13T11:05:00.626852296-08:00"
]
},
"cur_time":{
"$id":"#/properties/cur_time",
"type":"string",
"title":"Alert current time",
"description":"Alert current time",
"examples":[
"2020-11-13T11:05:00.626852296-08:00"
]
},
"object_uri":{
"$id":"#/properties/object_uri",
"type":"string",
"title":"Object URI to get more information",
"description":"Object URI to get more information",
},
"master_node_name":{
"$id":"#/properties/master_node_name",
"type":"string",
"title":"Current master node name",
"description":"Current master node name",
},
"cluster_uuid":{
"$id":"#/properties/cluster_uuid",
"type":"string",
"title":"Cluster UUID",
"description":"Cluster UUID",
},
"tags":{
"$id":"#/properties/tags",
"type":"string",
"title":"Tags which carries additional information",
"description":"Tags which carries additional information",
}
},
}
15.3. Notification of events¶
Robin provides a native mechanism to instantly notify parties of any events that may concern them. This feature is useful as it enables quick responses to failures of cluster-wide resources and infrastructure.
The parties are modeled as subscribers in Robin and will need to be added manually, along with the method used to notify them. Given the large number of events that are detected by Robin, a subscription to a specific event is needed to indicate that a particular event is of interest. Each subscriber is then notified when the event that is tied to a subscription occurs.
To summarize, receiving a notification for an event is a two step process:
Add a subscriber and their contact details
Add a subscription to an event that is of interest
Described below are the commands used to manage subscribers and subscriptions.
15.3.1. Managing subscribers¶
Subscribers are the intended recipients of notifications and should be configured before adding a subscription.
The following commands are described in this section:
|
Add a subscriber |
|
List all subscribers |
|
Update a subscriber’s attributes |
|
Remove a subscriber |
15.3.1.1. Registering a Robin subscriber¶
To add a new subscriber to the system, run this command:
# robin subscriber add <name> <subscriber_type>
--email-address <email_address>
--full-name <full_name>
--host <host>
--port <port>
--community <community>
--brokers <brokers>
--topic <topic>
--partition <partition>
--sasl-mechanism <sasl_mechanism>
--username <username>
--password <password>
--enable-ssl
Note
For email subscribers the email-address and fullname parameters are mandatory. Whilst for SNMP subscribers the host, port and community parameters are mandatory. Lastly for Kafka subscribers the brokers, and topic parameters are mandatory.
|
Name to assign to subscriber |
|
Type of subscriber. Options include: snmp, email, kafka |
|
Email address of subscriber |
|
Full name of subscriber |
|
SNMP Host for subscriber |
|
SNMP Port of subscriber |
|
SNMP Community of subscriber |
|
The host / IP address of Kafka Brokers. This parameter should be specified in one of the following formats: <hostname>:<port>, <ipv4-addr>:<port>, or [<ipv6-addr>]:<port> |
|
The Kafka topic where the events/alerts will be sent |
|
The Kafka partition where the events/alerts will be sent |
|
The Kafka broker SASL mechanism. Valid choices include: ‘PLAIN’ |
|
The Kafka broker username. |
|
The Kafka broker password |
|
Enable SSL for communication with Kafka brokers |
Example 1 (Add an email subscriber):
# robin subscriber add demo_user email --email-address demo@robin.io --full-name Robin
Successfully added subscriber 'demo_user' with 'email' notification
Example 2 (Add an Kafka subscriber with no auth/encryption):
# robin subscriber add k_ssl2 kafka --brokers "10.9.82.42:9092" --topic robin_events
Successfully added subscriber 'k_ssl2' with 'kafka' notification
Example 3 (Add an Kafka subscriber with SASL_PLAINTEXT auth and no encryption):
# robin subscriber add kafka_sasl_plain kafka --brokers "10.9.82.42:9093" --topic robin_events --username admin --password 12345 --sasl-mechanism PLAIN
Successfully added subscriber 'kafka_sasl_plain' with 'kafka' notification
Example 4 (Add an Kafka subscriber with SASL_SSL auth and encryption):
# robin subscriber add kafka_sasl_ssl kafka --brokers "10.9.82.42:9094" --topic robin_events --username admin --password 12345 --sasl-mechanism PLAIN --enable-ssl
Successfully added subscriber 'kafka_sasl_ssl' with 'kafka' notification
Note
A user with the same name can be both an email, SNMP and Kafka subscriber.
Adds a subscriber for Robin events. Note this is a two step process as outlined by the example API requests detailed below.
Retrieving the subscriber ID
End Point: /api/v3/robin_server/subscribers
Method: POST
URL Parameters: None
Data Parameters:
name: <name>
- This mandatory field within the payload specifies the name of the subscriber to be registered.
Port: RCM Port (default value is 29442)Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200Error Response Code: 500 (Internal Server Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
Output
{ "message":"{\"subscriber_id\":\"2\"}" }
Note
The
subscriber_id
specified within the above detailed response will be used when registering the subscriber.Registering the subscriber
End Point: /api/v3/robin_server/subscribers/<sub_id>/notification/<notif_type>
Method: POST
URL Parameters: None
Data Parameters:
email_address: <email>
- Utilizing this parameter within the payload, by specifying a string representing an email, results in the specified email being linked to the subscriber. This option is only valid for ‘email’ based subscribers and is a mandatory parameter for the aforementioned type.full_name: <name>
- Utilizing this parameter within the payload, by specifying a string representing a full name, results in the specified name being linked to the subscriber. This option is only valid for ‘email’ based subscribers and is a mandatory parameter for the aforementioned type.host: <hostname>
- Utilizing this parameter within the payload, by specifying a string representing a hostname, results in the specified SNMP host being linked to the subscriber. This option is only valid for ‘SNMP’ based subscribers and is a mandatory parameter for the aforementioned type.port: <port>
- Utilizing this parameter within the payload, by specifying a string representing a port, results in the specified SNMP post being linked to the subscriber. This option is only valid for ‘SNMP’ based subscribers and is a mandatory parameter for the aforementioned type.community: <community>
- Utilizing this parameter within the payload, by specifying a string representing a community, results in the specified SNMP community being linked to the subscriber. This option is only valid for ‘SNMP’ based subscribers and is a mandatory parameter for the aforementioned type.brokers: <brokers>
- Utilizing this parameter within the payload, by specifying a string representing the host/IP address of the Kafka brokers, results in the specified brokers being linked to the subscriber. This option is only valid for ‘Kafka’ based subscribers and is a mandatory parameter for the aforementioned type. In addition the string should be in one of the following formats: <hostname>:<port>, <ipv4-addr>:<port>, or [<ipv6-addr>]:<port>.topic: <topic>
- Utilizing this parameter within the payload, by specifying a string representing the Kafka topic where the event/alert notification will be sent, results in the specified topic being linked to the subscriber. This option is only valid for ‘Kafka’ based subscribers and is a mandatory parameter for the aforementioned type.partition: <partition>
- Utilizing this parameter within the payload, by specifying an integer representing the partition where the notification will be sent, results in the specified partition being linked to the subscriber. This option is only valid for ‘Kafka’ based subscribers.sasl_mechanism: <sasl_mechanism>
- Utilizing this parameter within the payload, by specifying a string representing the SASL mechanism used for authentication, results in the specified mechanism being linked to the subscriber. This option is only valid for ‘Kafka’ based subscribers and valid choices include: ‘PLAIN’.username: <username>
- Utilizing this parameter within the payload, by specifying a string representing the username for the aforementioned Kafka broker, results in the username being linked to the subscriber. This option is only valid for ‘Kafka’ based subscribers.password: <password>
- Utilizing this parameter within the payload, by specifying a string representing the password for the aforementioned Kafka broker, results in the password being linked to the subscriber. This option is only valid for ‘Kafka’ based subscribers.enable_ssl: true
- Utilizing this parameter within the payload results in SSL communication with the Kafka broker being enabled. This option is only valid for ‘Kafka’ based subscribers and by default it is false.
Port: RCM Port (default value is 29442)Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200Error Response Code: 500 (Internal Server Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error), 404 (Not Found Error)
Example Response: On success the reponse is empty.
15.3.1.2. Listing all subscribers¶
To list all of the subscribers currently registered with Robin alongside details such as the subscription type and ID, issue the following command:
# robin subscriber list <id>
--name <name>
--type <type>
--json
|
ID of subscriber to inspect (optional). |
|
Name of subscriber to inspect |
|
Filter subscribers to include only a particular type of subscriber |
|
Display output in JSON format |
Example
# robin subscriber list
ID | Name | Type | Details
---+-------+-------+--------------------------------------------------------
56 | demo | email | Full name: demo user, Email Address: demo@robin.io
56 | demo | snmp | Host: cscale-82-45, Port: 162, Community: public
Returns all of the subscribers currently registered with Robin alongside details such as the subscription type and ID.
End Point: /api/v3/robin_server/subscribers/
Method: GET
URL Parameters:
name=<name_of_subscriber>
: Utilizing this parameter results in only details for the specified subscriber being present in the response payload.
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
Output
{
"items":[
{
"id":56,
"name":"demo_user",
"email":{
"id":1,
"full_name":"Robin",
"email_address":"demo@robin.io"
},
"snmp":{
"id":1,
"host":"cscale-82-45",
"port":162,
"community":"public"
}
}
]
}
15.3.1.3. Testing a subscriber¶
In order to test if a registered subscriber can actually receive notifications, run this command:
# robin subscriber test <id>
--type <type>
|
ID of subscriber to test |
|
Type of subscriber. Valid choices are ‘email’ or ‘snmp’ |
Example
# robin subscriber test 1 --type email
Successfully attempted test for subscriber '1' with 'email' notification. Ensure this notification was recieved in order to confirm the subscriber is viable.
Tests if a registered subscriber is able to recieve a notification.
End Point: /api/v3/robin_server/subscribers/<sub_id>/notification/<notif_type>/test
Method: POST
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 404 (Not Found Error)
Example Response: On success the reponse is empty.
15.3.1.4. Updating a subscriber¶
To update one or many attributes of a subscriber, run this command:
# robin subscriber update <id>
--email-address <email_address>
--full-name <full_name>
--host <host>
--port <port>
--community <community>
|
ID of subscriber to update |
|
Updated value for the subscribers email address |
|
Updated value for the subscribers full name |
|
Updated value for the subscribers SNMP host |
|
Updated value for the subscribers SNMP port |
|
Updated value for the subscribers SNMP community |
Example
# robin subscriber list
ID | Name | Type | Details
---+------------+-------+--------------------------------------------------------
1 | demo_user | email | Full name: demo user, Email Address: demo@robin.io
2 | demo_two | snmp | Host: cscale-82-45, Port: 162, Community: public
# robin subscriber update 1 --email-address change_demo@robin.io
Successfully updated subscriber '1' with 'email' notification
# robin subscriber list
ID | Name | Type | Details
---+------------+-------+--------------------------------------------------------
1 | demo_user | email | Full name: demo user, Email Address: change_demo@robin.io
2 | demo_two | snmp | Host: cscale-82-45, Port: 162, Community: public
Updates one or more attributes of a subscriber.
End Point: /api/v3/robin_server/subscribers/<sub_id>/notification/<notif_type>
Method: PUT
URL Parameters: None
Data Parameters:
email_address: <email>
- Utilizing this parameter within the payload, by specifying a string representing an updated email, results in the new email being linked to the subscriber. This option is only valid for ‘email’ based subscribers.full_name: <name>
- Utilizing this parameter within the payload, by specifying a string representing an updated full name, results in the new name being linked to the subscriber. This option is only valid for ‘email’ based subscribers.host: <hostname>
- Utilizing this parameter within the payload, by specifying a string representing an updated hostname, results in the new SNMP host being linked to the subscriber. This option is only valid for ‘SNMP’ based subscribers.port: <port>
- Utilizing this parameter within the payload, by specifying a string representing an updated port, results in the new SNMP post being linked to the subscriber. This option is only valid for ‘SNMP’ based subscribers.community: <community>
- Utilizing this parameter within the payload, by specifying a string representing an updated community, results in the new SNMP community being linked to the subscriber. This option is only valid for ‘SNMP’ based subscribers.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 404 (Not Found Error)
Example Response: On success the reponse is empty.
15.3.1.5. Removing a subscriber¶
To remove a subscriber currently registered with Robin, run this command:
# robin subscriber remove <id>
--type <type>
--yes
|
ID of subscriber to remove |
|
Type of the specified subscriber to remove |
|
Confirm deletion without prompting |
Example
# robin subscriber list
ID | Name | Type | Details
---+-------+-------+--------------------------------------------------------
56 | demo | email | Full name: demo user, Email Address: demo@robin.io
56 | demo | snmp | Host: cscale-82-45, Port: 162, Community: public
# robin subscriber remove 1 --type snmp
Successfully removed 'snmp' notification for subscriber '1'
# robin subscriber list
ID | Name | Type | Details
---+-------+-------+--------------------------------------------------------
56 | demo | email | Full name: demo user, Email Address: demo@robin.io
Note
If a subscriber is of multiple types and --type
parameter is not specified during the delete operation all entries for this subscriber will be removed.
Removes a subscriber such that no notifications are sent to the specified user.
End Point: /api/v3/robin_server/subscribers/<subscriber_id>
Method: DELETE
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response: On success the reponse is empty.
15.3.2. Managing subscriptions¶
Subscriptions indicate to Robin which events/alerts a user is interested in. Each subscription has an associated list of event types and subscribers. As a result, when an event that is part of a subscription occurs, all the subscribers that are linked with the subscriptions are notified.
The following commands are described in this section:
|
Add a subscription |
|
List subscriptions |
|
Update a subscription |
|
Remove a subscription |
15.3.2.1. Registering a Robin subscription¶
To add a new subscription, run this command with the appropriate options:
# robin subscription add <subscriber_id> <subscription_type> <event_types>
--subscription-file <subscription_file>
--object-id <object_id>
--nodeid <node_id>
--zoneid <zone_id>
--disable
--enable
--threshold <threshold>
--elapsed-ticks <elapsed_ticks>
--throttle <throttle>
|
ID of subscriber to associate with the subscription |
|
Type of subscription. Valid choices are alert, event, and file. |
|
Event types to associate with the subscription. Note: This can be provided via the subscription file |
|
JSON formatted file that contains lists of events and alerts to subscribe to |
|
ID of objects to match |
|
ID of nodes to match |
|
ID of zone to match |
|
Disable this subscription when it is first added |
|
Enable this subscription when it is first added |
|
Number of instances of event/alert before launching notification |
|
Number of seconds to allow for the threshold to be met |
|
Number of seconds before a repeat notification will be sent |
Example
# robin subscription add 1 event 5006,4002 --enable
Successfully added event subscription for subscriber 1
Adds one or more subscriptions to particular event types.
End Point: subscribers/<sbr_id>/subscriptions/<sub_type>
Note
Valid values for subscription type include: ‘events’ and ‘alerts’.
Method: POST
URL Parameters: None
Data Parameters:
subscriptions: <list_of_dicts>
– This mandatory parameter within the payload is a comma seperated list of dictionaries (with the keys given below) and results in the subscriptions to the specified event types being added.type_name: <name>
- This mandatory field within each dictionary specifies the name of the event type to be subscribed to.enabled: true
- Utilizing this parameter within the dictionary results in the subscription being activated. By default it is False.threshold: <threshold>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in the value being set as the subscriptions threshold. This represents the number of instances of the given event/alert that need to be generated before launching the notification. The default value is 1.elapsed_ticks: <elapsed_ticks>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in the value being set as the number of seconds to allow for the threshold to be met. By default there is no elapsed time limit.throttle: <throttle>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in the value being set as the number of seconds before a repeat notification will be sent. By default there is no throttling.object_id: <object_id>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in a notification being sent out only if the given event/alert occurs for the object corresponding to the ID specified.nodeid: <nodeid>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in a notification being sent out only if the given event/alert occurs on the host corresponding to the ID specified.zoneid: <zoneid>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in a notification being sent out only if the given event/alert occurs within the zone corresponding to the ID specified.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response: On success the reponse is empty.
15.3.2.2. Listing all subscriptions¶
To list all subscriptions currently registered with Robin for a particular subscriber alongside details such as the associated events, issue the following command:
# robin subscription list <id>
--full
--json
|
ID of subscription to inspect. |
|
Display additional information about the subscriptions |
|
Display output in JSON format |
Example
# robin subscription list 2
ID | Type | Subscriber ID | Event Type | Enabled | Threshold | Elapsed Ticks | Throttle
---+--------------+---------------+--------------------+---------+-----------+---------------+----------
20 | SYSTEM_EVENT | 2 | EVENT_DISK_OFFLINE | True | 1 | 0 | 0
21 | SYSTEM_EVENT | 2 | EVENT_CONT_STOPPED | True | 1 | 0 | 0
Returns all subscriptions currently registered with Robin for a particular subscriber alongside details such as the associated events.
End Point: /api/v3/robin_server/subscribers/<subscriber_id>/subscriptions
Method: GET
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error)
Example Response:
Output
{
"items":[
{
"id":2,
"name":"demo_user",
"email":{
"id":1,
"full_name":"Robin",
"email_address":"demo@robin.io"
},
"snmp":{
"id":1,
"host":"cscale-82-45",
"port":162,
"community":"public"
},
"event_subscriptions":[
{
"id":20,
"type_name":"EVENT_DISK_OFFLINE",
"object_id:omitempty":"",
"enabled":true,
"threshold":1,
"throttle":0,
"elapsed_ticks":0
},
{
"id":21,
"type_name":"EVENT_CONT_STOPPED",
"object_id:omitempty":"",
"enabled":true,
"threshold":1,
"throttle":0,
"elapsed_ticks":0
}
]
}
]
}
15.3.2.3. Updating a Subscription¶
To update one or more attributes of a subscription, run this command:
# robin subscriber update <subscriber_id> <subscription_type> <subscription_id>
--disable
--enable
--threshold <threshold>
--elapsed-ticks <elapsed_ticks>
--throttle <throttle>
|
ID of subscriber associated with the subscription to update |
|
Type of subscription. Valid choices are alert and event |
|
ID of subscription to update |
|
Disable this subscription |
|
Enable this subscription |
|
Updated value of the number of instances of event/alert before launching notification |
|
Updated value of the number of seconds to allow for the threshold to be met |
|
Updated value of the number of seconds before a repeat notification will be sent |
Example
# robin subscription list 2
ID | Type | Subscriber ID | Event Type | Enabled | Threshold | Elapsed Ticks | Throttle
---+--------------+---------------+--------------------+---------+-----------+---------------+----------
1 | SYSTEM_EVENT | 2 | EVENT_DISK_OFFLINE | True | 1 | 0 | 86400
2 | SYSTEM_EVENT | 2 | EVENT_CONT_STOPPED | True | 1 | 0 | 86400
# robin subscription update 2 event 1 --threshold 3 --disable
Successfully updated event subscription with id 20 for subscriber 2
# robin subscription list
ID | Type | Subscriber ID | Event Type | Enabled | Threshold | Elapsed Ticks | Throttle
---+--------------+---------------+--------------------+---------+-----------+---------------+----------
1 | SYSTEM_EVENT | 2 | EVENT_DISK_OFFLINE | False | 3 | 0 | 86400
2 | SYSTEM_EVENT | 2 | EVENT_CONT_STOPPED | True | 1 | 0 | 86400
Updates a pre-existing subscription.
End Point: subscribers/<sbr_id>/subscriptions/<sub_type>/sbp_id
Note
Valid values for subscription type include: ‘event’ and ‘alert’.
Method: PUT
URL Parameters: None
Data Parameters:
enabled: [true | false]
- Utilizing this parameter within the payload, by specifying a boolean value, determines whether or not the subscription is activated.threshold: <threshold>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in the subscription’s threshold being updated to the given value. This represents the number of instances of the given event/alert that need to be generated before launching the notification.elapsed_ticks: <elapsed_ticks>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in the subscription’s elapsed time being updated to the given value. This represents the number of seconds to allow for the threshold to be met.throttle: <throttle>
- Utilizing this parameter within the dictionary, by specifiying an integer, results in the subscription’s throttle value being updated to the given value. This represents the number of seconds before a repeat notification will be sent.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response: On success the reponse is empty.
15.3.2.4. Removing a subscription¶
To remove a subscription currently registered with Robin, run this command:
# robin subscription remove <subscriber_id> <subscription_type> <subscription_id>
--yes
|
ID of subscriber associated with subscription to remove |
|
Type of subscription. Valid choices are alert and event |
|
ID of subscription to remove |
|
Confirm deletion without prompting |
Example
# robin subscription list
ID | Type | Subscriber ID | Event Type | Enabled | Threshold | Elapsed Ticks | Throttle
---+--------------+---------------+--------------------+---------+-----------+---------------+----------
1 | SYSTEM_EVENT | 2 | EVENT_DISK_OFFLINE | False | 3 | 0 | 86400
2 | SYSTEM_EVENT | 2 | EVENT_CONT_STOPPED | True | 1 | 0 | 86400
# robin subscription remove 2 event 1
Successfully deleted event subscription with id 20 for subscriber 2
# robin subscriber list
ID | Type | Subscriber ID | Event Type | Enabled | Threshold | Elapsed Ticks | Throttle
---+--------------+---------------+--------------------+---------+-----------+---------------+----------
2 | SYSTEM_EVENT | 2 | EVENT_CONT_STOPPED | True | 1 | 0 | 86400
Removes a subscription that a subscriber currently holds.
End Point:
/api/v3/robin_server/subscribers/<subscriber_id>/subscriptions/<type>/<subscription_id>
Method: DELETE
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response: On success the reponse is empty.