17. High Availability

17.1. High availability of the Kubernetes control plane

A Kubernetes cluster having a single master node is vulnerable should the master node ever fail. The master node manages the etcd database containing control data for the cluster. Key Kubernetes services, such as the API server, controller manager, and scheduler also run on the master node. Without the master node, it’s not possible to access information about currently deployed applications, allocated resources, etc. It’s also not possible to create new resources such as Pod, services, configmaps, etc.

In a high availability (HA) configuration, the Kubernetes cluster is configured with a minimum of three master nodes. The etcd database containing cluster control data is replicated across each of the master nodes. Key Kubernetes services also run on each of the master nodes. Should one of the master nodes fail, the remaining two will ensure the integrity of cluster control data and that the cluster is kept up and running.

Note that a highly available Kubernetes cluster requires at least two master nodes. This has to do with the consensus algorithm employed by etcd. To have a quorum, a minimum of half plus one masters must be present. This prevents a “split brain” scenario if a network partition occurs. Only the half with a quorum of master nodes will continue on.

17.1.1. High availability of the Kubernetes API server

Robin Platform uses keepalived and HAProxy services to provide high availability access to the Kubernetes API server.

Keepalived

The keepalived service is responsible for managing a Virtual IP address (VIP) where all requests to the Kubernetes API server are sent.

The keepalived service, which runs on each of the master nodes, implements the Virtual Router Redundancy Protocol (VRRP), a multicast protocol. All nodes in a VRRP cluster (“physical routers”) must be on the same subnet. Together, they form a single, “virtual router” for managing a given VIP.

It’s possible for multiple virtual routers to exist on the same subnet. For this reason, each VRRP cluster is configured with a unique (for a given subnet) “Virtual Router ID” (VRID). The use of a VRID ensures that all packets for a given VRRP cluster are only seen by the physical routers from that cluster. A VRID is an arbitrary numeric value from 1-255.

A VIP and VRID needs to be selected before installing the first master node in a Robin Platform cluster (they will be included on the command line). Care needs to be taken that he selected VIP is not already in use and that the VRID has not been used with another VRRP cluster in the same subnet (keepalived is not the only utility that implements the VRRP protocol).

The keepalived and HAProxy services are configured during the installation of Robin Platform. With the installation of each subsequent master node (up to a total of 3), the node will be added to the keepalived VRRP cluster, in addition to Kubernetes and Robin Platform clusters.

When installation of the first master node is complete, the VIP will be active on that node. If that node goes down, or gets partitioned from the rest of the cluster, the VIP will be brought up on another master node.

HAProxy

The HAProxy service is responsible for redirecting API server requests to instances of the API server running on each of the master nodes. Since the API server is stateless, requests are redirected in a round-robin manner.

The haproxy service also runs on each of the master nodes. But only the instance running on the node with the VIP exposed sees any action. The other instances wait for when the VIP fails-over to their respective nodes.

17.2. High Availability of the Robin control plane

The Robin Platform management plane is comprised of services that manage the physical resources of a cluster, and the management of applications deployed to the cluster. These management services don’t run directly on the host. Rather, they run in containers deployed on each of the nodes in the cluster. A number of the the services are active on all nodes in the cluster. These services are responsible for handling the deployment of application resources on the node and for monitoring the health of the node. The Agent services include:

  • robin-agent

  • iomgr-server

  • monitor-server

Each master node in a Robin Platform cluster (to a maximum of three) is automatically assigned the role of Manager, with the first Manager becoming the MASTER and each additional Manager becoming a SLAVE. There are two management services running on all three Manager nodes:

  • The robin-watchdog service is responsible for maintaining integrity of the Robin Platform management plane. On the MASTER Manager, robin-watchdog makes sure that all management services that should be running are up and healthy. On SLAVE Managers, robin-watchdog monitors the state of the MASTER and waits in the wings for its chance to become the MASTER.

  • The postgresql-9.6 service acts as a datastore for all Robin Platform control data. The instance of postgresql-9.6 running on the MASTER Manager is responsible for committing all database transactions. The instances running on the other two Managers are read-only SLAVEs (one synchronous and the other asynchronous). Before a database transaction can be committed, the affected data blocks must be flushed to disk on the MASTER node and on the node where the synchronous SLAVE is running. This ensures that there will be no data loss should the MASTER node (the MASTER instance of postgresql-9.6) go down.

A number of Robin Platform management services only run on the MASTER Manager node. These services, which handle the management of all cluster resources, deployed applications, etc., include:

  • stormgr-server

  • robin-server

  • robin-file-server

  • robin-event-server

Disaster recovery

In the event that the MASTER Manager fails or becomes unhealthy (catastrophic hardware failure, network isolation, failure of a key management service, etc.), one of the SLAVE Manager nodes will take over as MASTER. This way, data integrity is maintained (for key metadata related to the management of storage for the Robin cluster and for deployed applications) and a mechanism is provided for recovering from hard failures. When a fail-over does occur, the central management plane will not be operational for a short while. It takes a little bit of time to convert one of the backup instances of the postgresql database into the master instance, etc.

Note

A Highly Available Robin Platform cluster can suffer the loss of one Manager node (MASTER or SLAVE). If a second node fails, then it will no longer be possible to commit database transactions, as there would not be a synchronous postgresql instance.