15. High Availability

15.1. High Availability of the Robin control plane

The Robin CNS management plane is comprised of services that manage the physical resources of a cluster, and the management of applications deployed to the cluster. These management services don’t run directly on the host. Rather, they run in containers deployed on each of the nodes in the cluster. A number of the services are active on all nodes in the cluster. These services are responsible for handling the deployment of application resources on the node and for monitoring the health of the node.

Starting with Robin CNS v5.4.4, the Robin Control plane and data plane are separated. In the earlier architecture before Robin CNS v5.4.4, both the control plane and data plane are in a single Robin Pod.

The following illustration explains the Robin CNS Architecture:

_images/CNS_architectural_change.png

The following points explain the architecture:

  • Robin Master Pod is deployed as a Deployment. The Robin Master Pod acts as a Manager. It runs only the following master services:

  • Consul server

  • Robin Server

  • Robin event server

  • Robin Authentication server

  • Sherlock

  • Node monitor

  • PGSQL

  • HTTP server

  • Only a single Master Pod is deployed in the cluster unlike the previous Robin CNS releases.

  • Robin Agent Pods are deployed as a DaemonSet and the Agent Pods run only the agent services, such as agent and monitor.

  • Robin IOMGR Pods are deployed as a DaemonSet and responsible for handling I/O requests from the application.

  • In Robin CNS v5.4.4 and later, Patroni is used as an independent PostgreSQL HA cluster and it is deployed as part of Robin CNS installation. As the PostgreSQL HA is managed outside Robin master Pods, its failover is not tied to Robin control-plane services failover.

    • A maximum of the three Patroni instances (Pods) are present in a cluster. The Patroni HA cluster has one leader, one synchronous replica, and one asynchronous replica.

    • When there is a Google Anthos user cluster node pool of less than 3 nodes, Robin CNS installs a single replica Patroni cluster and brings up Robin CNS.

    • Patroni cluster will automatically scale up when additional nodes are added to the node pool later.

      For example, when you install Robin CNS in a setup with two Anthos User clusters in a node pool, Robin CNS installs one Patroni instance, one Robin Master Pod, and two Robin Agent Pods. When additional nodes are added to the cluster, the Patroni instances will scale up automatically.

  • The /dev bind mount from the host into Robin Pod is only needed for agent/iomgr/nodeplugin services.

  • With control plane and data plane separation, master services will not have access to /dev.

  • The Robin master services provide access to user interfaces (UI/CLI/API) to CNS users. Due to this, no privileged Pod will take up any user APIs, reducing the attack surface significantly.

15.2. Robin Manager

A node running Master Pod acts a Manager.

The following commands are described in this section:

robin manager list

View robin manager list and services

15.2.1. Viewing Manager Details

You can view the manager details by running the following command:

# robin manager <history> <list>

history

Displays all master failover events

list

Displays list of managers

15.2.1.1. View Manager Histroy

Run the following command to view Robin manager history.

# robin manager <history>

Example

root@eqx04-flash06:~# robin manager history
+---------------------------------+----------------------------+
|             Hostname            |         Start Time         |
+---------------------------------+----------------------------+
| hypervvm-72-42.robinsystems.com | 2023-09-06 20:44:58.523284 |
+---------------------------------+----------------------------+

15.2.1.2. View Manager List

Run the following command to view Robin manager list and services status.

robin manager list --services
                   --all

--services

Displays services status info for manager

--all

Displays all PODs and status

Example:

root@eqx04-flash06:~# robin manager list --services
+-----------------------------------+-------+------+------+------+----------+------+---------+-------+-------+
|              Hostname             | ConSr | RSer | REvt | RAer | Sherlock | NMon | Stormgr | PGSQL | Httpd |
+-----------------------------------+-------+------+------+------+----------+------+---------+-------+-------+
| master.robin-server.service.robin |   UP  |  UP  |  UP  |  UP  |    UP    |  UP  |    UP   |   UP  |   UP  |
+-----------------------------------+-------+------+------+------+----------+------+---------+-------+-------+

UP: Running
PARTIAL: Partially Running
UNKNOWN: Not Managed
CRIT: Critical and Down
DOWN: Not Running
root@eqx04-flash06:~# robin manager list --all
+-----------------------------------+--------------+---------------+---------------+--------------+---------+
|              Hostname             |  Service IP  |     POD IP    |      Node     |   Node IP    |  Status |
+-----------------------------------+--------------+---------------+---------------+--------------+---------+
| master.robin-server.service.robin | 10.107.24.37 | 192.180.2.250 | eqx03-flash06 | 10.9.140.106 | Running |
+-----------------------------------+--------------+---------------+---------------+--------------+---------+