17. Asynchronous Disaster Recovery¶
Starting from Robin CNP 5.4.1, Robin.io provides the snapshot-based Asynchronous Disaster Recovery(DR) feature.
The feature enables you to replicate your Kubernetes-based stateful applications along with its constructs (PVC, StatefulSet, Config maps, Secrets, Services, etc.) onto a remote secondary peer cluster (site). You can manually fail over to the secondary cluster in the event of a disaster or maintenance activities.
The DR feature provides the encryption option. You can enable it when transmitting data over the wire to a peer cluster.For more information, see Over-Wire Data Transfer Encryption.
The Robin Asynchronous Disaster Recovery feature allows you to bring your applications online faster by failing over to the secondary cluster(site) when a disaster occurs with a minimum application downtime and allows to failback later.
You can configure the required Recovery Point Objective (RPO) for your applications.
As Robin CNP runs on the cloud and on-premise, you can use the Robin Replication feature on both platforms. For more information, see Set Up Robin Asynchronous Disaster Recovery.
17.1. Concepts¶
Described below are ideas and constructs that are fundamental to the Asynchronous Disaster Recovery feature. It is highly recommended that all of the below concepts are understood thoroughly before proceeding with the utilization of the replication process. This is because of each section summarizes a crucial component of the feature and thus will need to be taken into account when planning the recovery procedure incase of a failure.
17.1.1. Peer Clusters¶
A Peer Cluster in the context of the Disaster Recovery process represents a Robin cluster. As such, a DR pair requires two Peer Clusters that can communicate with each other and be synced. Each Peer Cluster will host a Protection Group of its own and thus recieve snapshots of applications based on the role of the Protection Group they host.
In order for the replication process to begin, the two designated Robin clusters needed to be paired together and become Peer Clusters to one another. Details on this pairing process can be found here.
17.1.1.1. Endpoint¶
An Endpoint is the IP address that a Peer Cluster hosting a Secondary Protection Group will use to contact the Robin cluster where the Primary Protection Group resides. As a result, it helps secure a connection between Peer Clusters.
By default, if no Endpoint is manually specified the VIP of highly available clusters is used. When this is not available, the IP address of the master node is used in its place. Information on the Endpoint to use is transferred to the Peer Cluster via the encoded blob generated on the pair intiating cluster.
Note
For HA environments, it is recommended to use the VIP for the Endpoint. Details on how to update an Endpoint that is currently in-use can be found here.
17.1.2. Protection Groups¶
A Protection Group in the Robin Asynchronous DR feature is a logical construct that plays a key role in the disaster recovery setup and managing DR configuration on a Robin cluster. For more information, see Manage Protection Group.
You can have multiple Protection Groups in a Robin cluster as per your requirements.
The Protection Group comprises the following entities:
Applications
Peer cluster name
Asynchronous DR Role (Primary or Secondary)
Replication Policy
17.1.2.1. Roles¶
A role is designated to each Protection Group when its created. The following are the supported roles and their implications for applications:
Primary - Applications in a Protection Group with this role will be actively replicated and their data will be transfered to the associated Peer Cluster(s).
Secondary - Applications in a Protection Group with this role will receive updates from the cluster where the Primary Protection Group resides but their respective Pods will not be running. Protection Group(s) with this role are in place to take over the Primary role in the event of a failover.
For more information on role changes, see Failover or Failback a Cluster.
Note
You cannot have two active Protection Groups with the Primary role at the same time in a DR setup.
17.1.3. Replication Policy¶
A Replication Policy defines the frequency of data transfer between Peer Clusters. It needs to be attached to a primary Protection Group in order to take affect. The aforementioned frequency applies to all the applications in the Protection Group. The replication of application metadata and data is primarily done through the snapshot mechanism with snapshots essentially being between clusters. For more information on how to manage Replication Policies, see Manage Replication Policy.
17.1.3.1. Replication Snapshots¶
Replication Snapshots are snapshots that a Replication Policy automatically creates of applications attached to a Protection Group as per the cadence defined in its configuration. As they are system generated they are transferred as part of the replication process by default. Moreover all Replication Snapshots utilize the following naming convention, <appname>_<namespace>_<repl-policy-name>_<zoneid>_<creation-time>
, as well as being associated with the standard system_replicate:yes
label.
Note
The creation time is displayed in the standard UNIX timestamp format within the snapshot name.
More details on how to configure the creation of these snapshots when first defining a Replication Policy can be found here. In addition, the Replication Snapshot feature can be enabled after the initial configuration of a Replication Policy using the robin replication-policy enable-repl-snaps
command, details of which can be found here.
17.1.3.2. Utilizing Labels¶
Robin enables users to label snapshots in order to help categorize and indentify them. These labels can also be specified within a Replication policy, at the time of creation or after the fact, such that snapshots with the associated labels are replicated. This enables snapshots which are created manually or outside the replication process, via a snapshot schedule for instance, to be replicated to secondary Protection Groups as well.
For example, if a Replication Policy is associated with the label REPDEMO:Yes
, application snapshots with the same label will be replicated at the frequency set
for the Replication Policy alongside any replication snapshots if they are enabled. In addition, the number of snapshots with this label to retain is also configurable in case
a given number need to be maintained. Details on how to associate labels with a Replication Policy can be found here.
Note
More than one label can be associated with a Replication Policy so as to allow a variety of normal snapshots to be replicated.
17.1.4. Sync States¶
The robin protection-group info
command, details for which can be found here, can be used to assess the sync state of applications associated with a Protection Group. The following are all the valid states an application can be in:
SYNC_SUCCESSFUL - This signifies a synchronization was successful with the latest synced snapshot also being displayed. This is a mirrored state so depending on the role of the Protection Group on the cluster where the state is checked either
PRIMARY_SYNC_SUCCESSFUL
orSECONDARY_SYNC_SUCCESSFUL
will be displayed.IN_PROGRESS - This signifies a synchronization is currently in progress. This is a mirrored state so depending on the role of the Protection Group on the cluster where the state is checked either
PRIMARY_SYNC_IN_PROGRESS
orSECONDARY_SYNC_IN_PROGRESS
will be displayed.SYNC_FAILED - This signifies a synchronization failed. This is a mirrored state so depending on the role of the Protection Group on the cluster where the state is checked either
PRIMARY_SYNC_FAILED
orSECONDARY_SYNC_FAILED
will be displayed.INITIAL_SYNC_PENDING - This signifies the very first synchronization still has not completed. This is a mirrored state so depending on the role of the Protection Group on the cluster where the state is checked either
PRIMARY_INITIAL_SYNC_PENDING
orSECONDARY_INITIAL_SYNC_PENDING
will be displayed.INITIAL_SYNC_IN_PROGRESS - This signifies the very first synchronization currently running. This is a mirrored state so depending on the role of the Protection Group on the cluster where the state is checked either
PRIMARY_INITIAL_SYNC_IN_PROGRESS
orSECONDARY_INITIAL_SYNC_IN_PROGRESS
will be displayed.INITIAL_SYNC_FAILED - This signifies the very first synchronization has failed. This is a mirrored state so depending on the role of the Protection Group on the cluster where the state is checked either
PRIMARY_INITIAL_SYNC_FAILED
orSECONDARY_INITIAL_SYNC_FAILED
will be displayed.PRIMARY_NO_APP - This signifies a failover event has happened but the intial synchronization of the application within the primary Protection Group has not completed. This state will only be when the command is run on the cluster where the new primary Protection Group is hosted.
SECONDARY_APP_WITH_NO_PRIMARY - This signifies a failover event has happened but the intial synchronization of the application on the new primary Protection Group did not complete. This state will only be when the command is run on the cluster where the new secondary Protection Group is hosted.
PENDING_USER_ASSIGNMENT_AT_PEER - This signifies the application needs to be assigned to a user present on the cluster hosting the secondary Protection Group. This state will only be when the command is run on the cluster where the primary Protection Group is hosted.
PENDING_USER_ASSIGNMENT - This signifies the application needs to be assigned to a user present on the cluster hosting the secondary Protection Group. This state will only be when the command is run on the cluster where the secondary Protection Group is hosted.
UNDEFINED - This signifies that the application sync state does not meet any of the above states and will only appear in rare conditions.
17.1.5. Role-based Access Control¶
Role-based Access Control (RBAC) for objects created as part of the replication process follows a similar model to that of the Robin platform; namely it based around tenants and the users contained within them. Specifically Protection Groups must be associated with a tenant and only applications within the same tenant can be assigned to them. Consequently, users within the same tenant must own the applications that are to be associated with the given Protection Group.
This model holds true for secondary Protection Groups on Peer Clusters as well. However there is some flexibility with regards to the which tenants the secondary Protection Groups are assigned to as well as which users own the respective attached applications on Peer Clusters. The determining factor for these associations is the Auto-Assignment Mode that is set on the Peer Cluster. Details on the different modes available and what each offers can be found here.
17.1.6. Auto-Assignment Modes¶
The Auto-Assignment Mode configured on the Peer Cluster hosting a seconday Protection Group determines which tenant the aforementioned Protection Group is associated to. Furthermore it also determines which of the tenant users should be assigned the applications attached to the Protection Group.
The following are the different modes available and how they influence assignments on the Peer Cluster:
MIRROR |
This is the default mode. As its name suggests, it assigns the secondary Protection Group to the tenant on the Peer Cluster with the same name as the one associated to the primary Protection Group. This is also the case for the tenant users to which applications are assigned to. As a result, when utilizing this mode, the configuration of tenants and users on the Peer Cluster must mirror that of the Robin cluster hosting the primary Protection Group. If this is not the case, the Peer Cluster cannot be registered. |
DEFAULT |
With this mode, the Protection Group is assigned to the default |
MANUAL |
With this mode, the |
As its name suggest, the auto_assign_mode
config attribute controls the current setting for the Auto-Assignment Mode. More information on the variable can be found here. In order to update the mode in use, the robin config update
command, detailed here must be used.
Important
The desired Auto-Assignment mode must be set on the Peer Cluster hosting the secondary Protection Group in order to take affect.
17.1.7. Metrics¶
Metrics for objects that are part of the replication process can be viewed using a provided Grafana dashboard. The following details are displayed as part of the dashboard for each registered Protection Group:
Protection Group role
Data transfer rate
Sync and pause states for associated applications
Creation time and ID of the last synced snapshot for associated applications
More information about enabling metrics can be found here.
17.1.8. Limitations¶
The following are the limitations of the asynchronous DR feature:
Snapshot restore to the original location from the backup is not supported.
Disaster recovery configuration with different media type clusters is not supported.
Multisite asynchronous DR is not supported.
Applications which are associated with a Protection Group cannot be scaled or have volumes added to them. In order to run these operations on an application linked to a Protection Group, it must be removed from the Protection Group first, then updated and finally re-associated.
Clones of applications cannot be associated with Protection Groups and thus are not valid for replication. This also holds true for applications with ephemeral volumes (AEVs).
Robin Bundle application with PDV and AEV is not supported for disaster recovery.
17.1.9. Failover¶
A failover
is the term used to describe an event wherein the Robin cluster
hosting the primary Protection Group goes offline. This shutdown might be due to
planned maintenance or an unexpected disaster. In this scenario, the secondary
Protection Group hosted on a Peer Cluster needs to become the primary Protection Group
in order to continue serving applications and their data. In addition the previous
primary Protection Group will have to assume the secondary role. These role changes
have to performed manually using the robin protection-group change-role
command,
detailed here. An example can also be seen here.
Note
If the Robin cluster hosting the primary Protection Group(s) is accessible, the
respective Protection Group(s) must have their roles changed to secondary
before the
original secondary Protection Group(s) roles can change their roles to primary
. If the cluster
hosting the primary Protection Group(s) cannot be reached, the original secondary Protection Group(s)
will be allowed to assume the primary role. However as soon as its accessible again this role change
must occur as there cannot be two active primary Protection Groups in a DR pair.
17.1.10. Failback¶
A failback
is the term used to describe an event wherein the original
primary Protection Group that assumed the role of a seconday Protection Group
due to a failover
event returns to being a primary Protection Group
once again. The end goal of a failback
operation is to make the original
primary Protection Group the leader again thus restoring the original configuration
of the DR pair. However, before the operation is performed it is highly recommended
that users confirm the delta changes from the current primary Protection Group to
the original are replicated in order to minimize data loss. As with the failover
operation the role change needs to be performed manually using the
robin protection-group change-role
command, detailed here.
Important
Performing a failback
operation is optional, it is solely up to the users descretion and preference.
17.2. Disaster Recovery process¶
17.2.1. Recommended procedure¶
Detailed below is the recommended procedure for when an unexpected disaster occurs, including both failover
and failback
operations:
When an outage affects the Robin cluster hosting the primary Protection Group the first step is to transfer the primary role. As a result, if the cluster is accesible, enact a role change from primary to secondary for the Protection Group it hosts. If it is not accessible in the given moment, it is okay to skip this step however the role change must be done as soon as the cluster is back online. This is important because there can be only one primary Protection Group in a DR pair at any point-in-time.
After the primary Protection Group has completed the role change, change the role of a seconday Protection Group on a Peer Cluster such that it becomes the new primary Protection Group. This should return the availability of all replicated applications. Additional details on what happens during this role change can be found below.
After the original cluster where the disaster occured recovers and its Protection Group is successfully marked as a secondary, wait for the delta changes from the new primary Protection Group (original secondary) to replicate to the original primary Protection Group. This step is especially important for the
failback
operation and if the DR configuration is to be returned to its original state.Now that all Protection Groups are synced the original primary and secondary Protection Groups can be returned to their initial respective roles.
Important
Whenever a role swap operation occurs between Protection Groups, the primary Protection Group must always have its role changed to secondary successfully first before the secondary Protection Group can assume the primary role.
17.2.2. Changing roles for a Protection Group¶
During any disaster recovery process, whether as part of a failover
or failback
operation, the roles of respective Protection Groups within the
replication pair must be changed. This role change has to be performed manually and so it is manadatory that the robin protection-group change-role
command, details for which can be found here, be run after logging into the respective cluster. All role change operations abort ongoing replication
tasks and wait until all pending replication tasks are complete before it can begin.
Detailed below are the two directions roles can be changed, and what occurs with each respective transition:
Primary to Secondary: When a Protection Group stops assuming the primary role, the applications associated with it are stopped and thus go offline. Any ongoing data transfers from the cluster hosting the primary Protection Group to the peer clusters is also halted.
Secondary to Primary: When a Protection Group assumes the primary role, all data transfer processes from the previous primary Protection Group are aborted and all associated applications are started. Each application is assumes its state and configuration based off their most recent and healthy snapshot.
Note
This transition occurs only when the cluster is in a healthy state (all hosts and volumes are in the Ready state) or the cluster was brought back after a certain time (e.g reboot, power off, etc.)
17.2.3. Example Role Change¶
The following example showcases how to change Protection Group’s role from Secondary to Primary on a Peer Cluster.
# robin protection-group change-role prod-pg Primary
Job: 12695 Name: ProtectionGroupRoleChange State: VALIDATED Error: 0
Job: 12695 Name: ProtectionGroupRoleChange State: WAITING Error: 0
Job: 12695 Name: ProtectionGroupRoleChange State: COMPLETED Error: 0
# robin protection-group info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: Sanfrancisco**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: minute_repl
+----------+---------------------------+-------------------------------------------+--------------------------+
| App Name | State | Latest Synced Snapshot | Latest Sycned Snap ctime |
+----------+---------------------------+-------------------------------------------+--------------------------+
| mysqldb | SECONDARY_SYNC_SUCCESSFUL | mysqldb_minute_repl-1639611848-1639692409 | 2021-12-16 14:06:54 |
+----------+---------------------------+-------------------------------------------+--------------------------+
Note
Details on the robin protection-group change-role
command can be found here.
17.3. Managing Peer Clusters¶
Topics covered in this section:
|
Create a Peer Cluster |
|
Pair a Peer Cluster |
|
Add description for a Peer Cluster |
|
List all Peer Clusters |
|
Enable or disable encryption for a Peer Cluster |
|
Unpair a Peer Cluster |
|
Update endpoint of a Peer Cluster |
17.3.1. Create a Peer Cluster¶
As described previously two Robin CNP clusters are needed as part of a DR pair.
One cluster is designated as the pairing initiating cluster whilst the other is
referred to as the peer
. The command described below enables the pairing process to begin
by registering the peer cluster. When this is successful, an encoded blob is generated on the
pairing initiating cluster. The encoded blob contains the required metadata for pairing.
You must copy the blob from where it was generated and securely share it
with the administrator at your peer cluster location. The blob must then be utilized by
the robin peer-cluster pair
command, details of which can be
found here, in order to complete the pairing process.
Important
The encoded blob expires after ten minutes. You must use it within this time period on the peer cluster in order to complete pairing process.
Note
For HA environments, it is recommended to use the VIP for the Endpoint.
To register a peer and generate an encoded blob for pairing, run the following command:
# robin peer-cluster create <name>
--description <description>
--endpoint <endpoint>
|
Name of the peer cluster |
|
Description of the Peer Cluster |
|
The IP Address at which the peer cluster will contact the pairing initiating cluster. If not provided, the cluster VIP or master node IP Address will be used |
Example
# robin peer-cluster create test_peer_secondary
Provide this token to the secondary cluster:
eyJlbmRwb2ludCI6ICIxMC45LjYxLjQ3IiwgInBlZXJfYXV0aF90b2tlbiI6ICJleUowZVhBaU9pSktWMVFpTENKaGJHY2lPaUpJVXpJMU5pSjkuZXlKMWMyVnlYMmxrSWpveUxDSjBaVzVoYm5SZmFXUWlPakVzSW1WNGNDSTZNVFk1T0RRd01UY3pObjAuZkFveUM4S3FfQUdfZ2U5cTZKLXVpdC1mRjJtdUVsMzUtNFYzVklFeS1lWSIsICJwZWVyX3Rva2VuIjogIjNDUWVJNG1oQVRoV0VKb0FENTZVUFI1VSIsICJwZWVyX2NsdXN0ZXJfaWQiOiAiMWNkZmNiY2YtMTliYS00YThjLWEzMDMtNGNiMGFjZmU3N2YxIiwgInBlZXJfem9uZWlkIjogMTY0NjQ0NjI3OX0=
17.3.2. Pair a Peer Cluster¶
As described previously two Robin CNP clusters are needed as part of a DR pair.
One cluster is designated as the pairing initiating cluster whilst the other is
referred to as the peer
. The command described below enables the pairing process
to be completed on the peer cluster. It enables users to submit the encoded blob
generated on the pairing intiated cluster and consequently results in an association
of the two clusters by transmitting details of the peer back to the peer initiating cluster.
The blob can be generated on the pairing initiated cluster via the
robin peer-cluster create
command, details of which can be found here.
Important
The encoded blob expires after ten minutes. You must use it within this time period on the peer cluster in order to complete pairing process.
To complete pairing on your peer cluster, run the following steps:
# robin peer-cluster pair <name> <blob>
--description <description>
|
Peer cluster name |
|
Encoded blob created on the peer initiating Peer Cluster |
|
Description of the Peer Cluster |
Example
# robin peer-cluster pair test_peer_primary eyJlbmRwb2ludCI6ICIxMC45LjYxLjQ3IiwgInBlZXJfYXV0aF90b2tlbiI6ICJleUowZVhBaU9pSktWMVFpTENKaGJHY2lPaUpJVXpJMU5pSjkuZXlKMWMyVnlYMmxrSWpveUxDSjBaVzVoYm5SZmFXUWlPakVzSW1WNGNDSTZNVFk1T0RRd01UY3pObjAuZkFveUM4S3FfQUdfZ2U5cTZKLXVpdC1mRjJtdUVsMzUtNFYzVklFeS1lWSIsICJwZWVyX3Rva2VuIjogIjNDUWVJNG1oQVRoV0VKb0FENTZVUFI1VSIsICJwZWVyX2NsdXN0ZXJfaWQiOiAiMWNkZmNiY2YtMTliYS00YThjLWEzMDMtNGNiMGFjZmU3N2YxIiwgInBlZXJfem9uZWlkIjogMTY0NjQ0NjI3OX0= --wait
Job: 52 Name: PeerClusterPair State: VALIDATED Error: 0
Job: 52 Name: PeerClusterPair State: COMPLETED Error: 0
17.3.3. Add Description to a Peer Cluster¶
You can add or update the description for a peer cluster when required. In order to do so, run the following command:
# robin peer-cluster add-description <name> <description>
|
Name of the peer cluster |
|
Description for the peer cluster |
17.3.4. List all Peer Clusters¶
To list all peer clusters currently registered, run the following command:
# robin peer-cluster list --verbose
|
Include additional information in the output |
Example
# robin peer-cluster list
+----+---------+--------+------------+--------------+------------+
| ID | Name | State | IP-Address | Description | Zoneid |
+----+---------+--------+------------+--------------+------------+
| 2 | NewYork | PAIRED | 10.7.82.20 | Primary Site | 1639611100 |
+----+---------+--------+------------+--------------+------------+
17.3.5. Manage Over-Wire Data Transfer Encryption¶
You can enable encryption for a particular peer cluster in order to ensure the data
transmitted to the peer is secure. By default, over-the-wire data transfer encryption
is disabled and so must be individually enabled for each per replication pair.
Multiple encryption algorithms are supported, namely: AES128
, AES256
, and CHACHA20
.
To enable, disable or change the encryption type for a peer cluster at any point of time,
run the following command:
# robin peer-cluster encryption <name>
--algorithm <algorithm>
--enable
--disable
|
Name of the Peer Cluster to update the encryption status of |
|
Encryption algorithm for the peer cluster. Options include: AES128, AES256, CHACHA20. Default is AES-256 |
|
Enables on-wire encryption for the specified peer cluster |
|
Disables on-wire encryption for the specified peer cluster |
Note
At least one of the --enable
or --disable
parameters must be given, however both options cannot be specified at the same time.
Example
# robin peer-cluster encryption --enable --algorithm aes128 hyperv1
17.3.6. Unpair a Cluster¶
In order to remove the relationship to a peer cluster and consequently unpair the clusters, run the following command:
Note
The peer cluster to be disassociated must be removed from all Protection Groups before it can be unpaired.
# robin peer-cluster unpair <peer>
--inform
|
Name or zone ID of the Peer Cluster |
|
Inform Peer Cluster to unpair as well |
Note
If the --inform
parameter is not utilized the unpair command must be run on both clusters in the replication pair.
Example
# robin peer-cluster unpair NewYork
Job: 53 Name: PeerClusterUnpair State: VALIDATED Error: 0
Job: 53 Name: PeerClusterUnpair State: COMPLETED Error: 0
17.3.7. Update endpoint of a Peer¶
The endpoint being used to contact a peer can be updated when there is a change to the associated IP Address. You must update the endpoint only when the network connection to the peer is available and the peer is not part of any Protection Groups. More information on the concept of endpoints can be found here.
Note
Contact Robin Support team before updating the endpoint.
To update an endpoint of a peer, run the following command:
# robin peer-cluster update-endpoint <name> <endpoint>
--yes
|
Name of the peer cluster for which we are updating endpoint. |
|
New endpoint of the peer cluster. |
|
Do not prompt the user for confirmation. |
Example
# robin peer-cluster update-endpoint hyperv1 10.9.61.74
Please contact Robin before using this command as it will result in
serious repercussions. Do you wish to continue [y/n]?
17.4. Managing Protection Groups¶
Topics covered in this section:
|
Create a Protection Group |
|
Add peer to Protection Group |
|
Remove peer from Protection Group |
|
Assign tenant to Protection Group |
|
Assign application in the Protection Group to a user |
|
Delete a Protection Group |
|
Add an application to Protection Group |
|
Remove an application from a Protection Group |
|
Change role of Protection Group |
|
Attach a replication policy to a Protection Group |
|
Detach a replication policy from a Protection Group |
|
Run replication policy for a specific peer |
|
Print info of protection group |
|
List all Protection Groups |
|
Pause replication for one or more applications in a Protection Group |
|
Resume replication for one or more applications in a Protection Group |
17.4.1. Create a Protection Group¶
In order to replicate applications across peer clusters a Protection Group
must be created on both clusters within the peer pair. The cluster on which
the initial creation command for the Protection Group is run is set as the
primary
by default. In addition to this a Protection Group of the same name
is created on the specified peer clusters, if any, and are marked with the
secondary
role.
Note
Although any number of Protection Groups can be created, currently only one peer per Protection Group is supported.
To create a protection group, run the following command:
# robin protection-group create <name>
--peers <peers>
|
Name of the protection group |
|
A comma separated list of Peers to use with this Protection Group |
Note
You can also add peers after the fact using the robin protection-group add-peer
command.
Example
# robin protection-group create prod-pg --peers NewYork
Job: 999 Name: ProtectionGroupCreateMultiPeer State: PROCESSED Error: 0
Job: 999 Name: ProtectionGroupCreateMultiPeer State: WAITING Error: 0
Job: 999 Name: ProtectionGroupCreateMultiPeer State: COMPLETED Error: 0
# robin protection-group list
+----+---------+---------+----------------+-------------+
| ID | Name | Role | Tenant | Peers |
+----+---------+---------+----------------+-------------+
| 3 | prod-pg | PRIMARY | Administrators | ['NewYork'] |
+----+---------+---------+----------------+-------------+
Creates a primary Protection Group on the cluster from which it was initiated and replicates the configuration to any Peer Cluster(s) specified such that a secondary Protection Group with the same name is spawned on the aforementioned Peer Cluster(s).
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: POST
URL Parameters: None
Data Parameters:
action: pg-create
- This mandatory field within the payload specifies the Protection Group create operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to be created.peers: <list_of_peers>
- Utilizing this parameter by specifiying a comma seperated list of Peers results in a secondary Protection Group being created on each of Peer Clusters with mirrored conifgurations.add-peer: true
- Utilizing this parameter within the payload, by specifying a boolean value, determines whether or not a Peer is added to the Protection Group to be created. This attribute is only valid when a list of Peers is specified via thepeers
attribute.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
Output
{
"jobid":275,
"plan":{
"kind":"robin",
"quiesce":[
"fs"
],
"original_app_tenant":1,
"snapname":"snap1",
"action":"snapshot",
"original_app_owner":3,
"opcode":8,
"current_user":{
"session_expires":"2020-09-10T10:40:27",
"tenant_id":1,
"tenant":"Administrators",
"user_capabilities":[
"AllSuperAdminCapabilities"
],
"user_contexts":[
"robin"
],
"username":"robin",
"user_permissions":{
},
"tenant_role":"superadmin",
"user_context":"robin",
"tenants":[
"Administrators"
],
"user_id":3,
"namespace":"t001-u000003",
"ip_addr":"172.17.0.1"
},
"namespace":"t001-u000003",
"name":"custom-labels-app"
}
}
17.4.2. Add a Peer Cluster to a Protection Group¶
You can add a Peer Cluster to an existing Protection Group in order to increase the number the clusters an application is replicated to. As part of this process the configuration of the Protection Group is replicated to the Peer Cluster, resulting in an identical secondary Protection Group being spawned on the aforemntioned cluster. To add a Peer Cluster to an existing Protection Group, run the following command:
# robin protection-group add-peer <name> <peer>
|
Name of the Protection Group |
|
Name of the Peer Cluster |
Example
# robin protection-group add-peer prod-pg NewYork
Submitted job '47'. Use 'robin job wait 47' to track the progress
# robin protection-group list
+----+---------+---------+----------------+-------------+
| ID | Name | Role | Tenant | Peers |
+----+---------+---------+----------------+-------------+
| 3 | prod-pg | PRIMARY | Administrators | ['NewYork'] |
+----+---------+---------+----------------+-------------+
Associates a Peer Cluster to a pre-existing Protection Group and replicates the Protection Group configuration to aformentioned clusters such that a secondary Protection Group with the same name is spawned on them.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: POST
URL Parameters: None
Data Parameters:
action: pg-create
- This mandatory field within the payload specifies the Protection Group create operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to add the specified Peer Cluster to.add-peer: true
- This mandatory boolean field within the payload, indicates a Peer is to be added to the Proection Group.peer_name: <peer_name>
- This mandatory field within the payload specifies the name of the Peer to add the specified Protection Group.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
17.4.3. List all Protection Groups¶
To view the list of all Protection Groups currently registered with a cluster, run the following command:
# robin protection-group list --json
|
Output in JSON |
Example
# robin protection-group list
+----+---------+---------+----------------+-------+
| ID | Name | Role | Tenant | Peers |
+----+---------+---------+----------------+-------+
| 2 | prod-pg | PRIMARY | Administrators | [] |
+----+---------+---------+----------------+-------+
Lists all Protection Groups.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: GET
URL Parameters: None
Data Parameters: None
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 200
Error Response Code: 500 (Internal Server Error)
Example Response:
17.4.4. View details of a Protection Group¶
Issue the following command to get detailed information such as the state, role, attached Replication Policy, associated Peer Cluster(s) and their state(s) with regard to a specific Protection Group:
# robin protection-group info <name>
--verbose
|
Protection Group name |
|
Show verbose information about the specified Protection Group |
Example
# robin pg info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: NewYork**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: No Replication Policy has been attached yet
Returns details about a specific Protection Group such as its state, role, attached Replication Policy, associated Peer Cluster(s) and their state(s).
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: GET
URL Parameters: None
Data Parameters:
action: get-pg-info
- This mandatory field within the payload specifies the Protection Group information retrieval operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to fetch the details for.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error)
Example Response:
17.4.5. Assign a tenant to a secondary Protection Group¶
If the primary Protection Group was created with the auto-assignment mode set to MANUAL
, a tenant must be assigned to the assocaiated secondary Protection Group(s)
on the Peer Cluster(s) where said Protection Group(s) are hosted to facilitate the replication process.
To assign a tenant to a Protection Group on the secondary cluster with RBAC, there must be one Tenant admin on the secondary cluster with all the permissions for Protection Groups, such as create, update, and delete.
However, the DeleteProtectioGroup
is not part of the permissions of the AllTenantAdminCapabilities
.
You can add the DeleteProtectioGroup
in the following two scenarios:
When adding a tenant user
When updating tenant user capabilities
Use the following command when adding a user:
robin user add <username> <tenant> <tenant_role> [<user_capabilities>]
For more information, see Robin User Management.
Use the following command when updating the tenant user capabilities:
robin tenant update-user-capabilities <tenant_name> <username> <user_capabilities>
For more information, see Update user capabilities
In order to meet this requirement, run the following command:
# robin protection-group assign-tenant <name> <tenant>
|
Protection Group name |
|
Name of the tenant to assign to specified Protection Group |
Note
More information on Role-based Access Control (RBAC) in Robin’s Asynchronous Disaster Recovery feature can be found here.
Assigns a tenant to a secondary Protection Group.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: pg-assign-tenant
- This mandatory field within the payload specifies the assign tenant operation for Protection Groups is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to assign the tenant to.tenant_name: <tenant_name>
- This mandatory field within the payload specifies the name of the tenant to assign to the given Protection Group.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
17.4.6. Assign Application on Secondary Cluster¶
If the primary Protection Group was created with the auto-assignment mode set to MANUAL
, any application(s) attached to the primary Protection Group must be assigned to
a user within a tenant associated with the respective secondary Protection Group(s) on the Peer Cluster(s), where said Protection Group(s) are hosted, to facilitate the replication process.
In order to meet this requirement, run the following command:
# robin protection-group assign-app <name> <app> <user> <namespace>
|
Protection Group name |
|
Name of the application to assign |
|
Name of user to assign specified application to |
|
Namespace the specified application will be deployed in |
Note
Tenants must be assigned to secondary Protection Group(s) on the Peer Cluster(s) before an application can be assigned using the robin protection-group assign-tenant
command,
details of which can be found here.
Assigns an application to a user within the tenant associated with a secondary Protection Group.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: pg-assign-app
- This mandatory field within the payload specifies the assign application operation for Protection Groups is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to assign the application to.app_name: <app_name>
- This mandatory field within the payload specifies the name of the application to assign to the given user within the aforementioned Protection Group.user_name: <user_name>
- This mandatory field within the payload specifies the name of the user, who is part of the tenant linked to aforementioned Protection Group, which the given application will be assigned to.namespace: <namespace>
- This mandatory field within the payload specifies the name of the namespace in which the application will be created.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
17.4.7. Add an application to a Protection Group¶
You need to add a required application to the Protection Group for it to be replicated to a Peer Cluster via snapshots. Each application can only be attached to one Protection Group. When an application is selected for replication, all the application-dependent constructs such as volumes, configmaps etc are replicated too.
Important
An application will only be replicated to a Peer if the Replication Policy associated with the Protection Group is attached to has replication snapshots enabled.
All types of applications, including Robin Bundle applications, Helm applications and FlexApps (with PVCs) can be associated to a Protection Group. However, this only holds true for the parent applications, as cloned applications cannot be linked to a Protection Group. In addition, replication for applications with ephemeral volumes (AEVs) is not supported.
Note
If you are adding a Robin Bundle app to the Protection Group, you must add the same Robin Bundle to both peers.
When adding an app to the Protection Group, the auto_assign_mode
and association of users with the namespace plays an important role.
You must consider the following important points:
Ensure that the same users exist on both primary and the secondary clusters.
Namespace and user mapping must be the same on both primary and secondary clusters.
Note
The process to add the app to the Protection Group fails if the above-mentioned points are not met. In this scenario, you must change the auto_assign_mode
to manual
on the secondary cluster and assign the app a tenant user.
After the app addition succeeds with the auto_assign_mode
set to manual
on the secondary, you can notice the following states of the
Protection Group: PENDING_USER_ASSIGNMENT_AT_PEER
on the primary and PENDING_USER_ASSIGNMENT
on the secondary.
To add an application to the Protection Group, run the following command:
# robin protection-group add-app <name> <app_name>
--peer <peer_name>
--nanmespace <namespace>
|
Name of Protection Group |
|
Name of application |
|
Specific peer to replicate application to |
|
Namespace in which application is registered. This option only needs to be used when there are multiple applications with the same name |
Note
When an application is attached to a Protection Group all regular operations can be continue to be run on it except for horizontal scaling and volume addition.
Example
# robin protection-group add-app prod-pg mysqldb --wait
Job: 10245 Name: ProtectionGroupAddApps State: PROCESSED Error: 0
Job: 10245 Name: ProtectionGroupAddApps State: WAITING Error: 0
Job: 10245 Name: ProtectionGroupAddApps State: COMPLETED Error: 0
# robin pg info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: NewYork**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: minute_repl
+----------+-------------------------+-------------------------------------------+--------------------------+
| App Name | State | Latest Synced Snapshot | Latest Sycned Snap ctime |
+----------+-------------------------+-------------------------------------------+--------------------------+
| mysqldb | PRIMARY_SYNC_SUCCESSFUL | mysqldb_minute_repl-1639611848-1639688268 | 2021-12-16 12:57:52 |
+----------+-------------------------+-------------------------------------------+--------------------------+
Attaches an application to a Protection Group such that it can be replicated to Peer Clusters via snapshots.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: pg-add-app
- This mandatory field within the payload specifies the add application operation for Protection Groups is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to add an application to.app_name: <app_name>
- This mandatory field within the payload specifies the name of the application to add to the aforementioned Protection Group.peer_name: <peer_name>
- Utilizing this parameter by specifiying the name of a Peer results in the application only replicating to the specified peer.namespace: <namespace>
- Utilizing this parameter by specifiying the name of a namespace results in the application within the given namespace being replicated. This option only needs to be used when there are multiple applications with the same name.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
17.4.8. Remove an application from a Protection Group¶
In order to disable the replication of an application entirely, an application can be removed from its associated Protection Group. This operation should be only performed on the cluster linked to the primary Protection Group. Upon completion, the application will also be removed from the secondary Protection Group(s) and their respective clusters. However, the replication snapshots for the application will remain on all clusters linked to the Protection Group(s) regardless of their role.
Note
All replication of the application must be stopped before it can be removed from the Protection Group.
As a result, either the replication must be paused via the robin protection-group pause-replication
command, details of which can be found here, or by ensuring there is no
Replication Policy associated with the Protection Group.
To remove the application from the Protection Group, run the following command:
# robin protection-group remove-app <name> <app_name>
--peer <peer>
--namespace <namespace>
|
Name of the Protection Group |
|
Name of the application |
|
Only stop application replicating to the specified peer |
|
Namespace in which application is registered. This option only needs to be used when there are multiple applications with the same name |
Example
# robin protection-group remove-app prod-pg mysqldb -–peer NewYork --wait
Job: 12693 Name: ProtectionGroupRemoveApps State: VALIDATED Error: 0
Job: 12693 Name: ProtectionGroupRemoveApps State: WAITING Error: 0
Job: 12693 Name: ProtectionGroupRemoveApps State: COMPLETED Error: 0
Detachs an application from a Protection Group such that it will no longer be replicated to Peer Clusters.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: pg-remove-app
- This mandatory field within the payload specifies the remove application operation for Protection Groups is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to remove an application from.app_name: <app_name>
- This mandatory field within the payload specifies the name of the application to remove from the aforementioned Protection Group.peer_name: <peer_name>
- Utilizing this parameter by specifiying the name of a Peer results in replication of the application only being stopped for the given Peer.namespace: <namespace>
- Utilizing this parameter by specifiying the name of a namespace results in the application within the given namespace being removed from the Protection Group. This option only needs to be used when there are multiple applications with the same name.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.4.9. Change role¶
The role of a Protection Group can be changed in order to influence the direction of replication for a set of applications as well as to aid the failover process. By default, it involves all current replication tasks being aborted and so requires all hosts and volumes within the cluster to be in a healthy state. To change the role of a Protection Group, run the following command:
Note
This command only alters the relevant Protection Group on the cluster it is run on.
# robin protection-group change-role <name> <role>
--force
--no-abort
|
Name of the Protection Group to change the role for |
|
Role of the Protection Group. Options include: ‘Primary’ or ‘Secondary’ |
|
Change the role of a Protection group without checking Peers. Note this overrides checking if there already exists another Protection Group on a Peer with the Primary role |
|
Do not abort ongoing replication tasks |
Note
If the --no-abort
option is utilized, the operation will wait for all replication tasks to complete before continuing which could result in a severe time lag.
Important
When the role of Protection Group is changed from primary to secondary, a new snapshot is taken for each application on the respective cluster where the role change occured. These snapshots retain the data that is not replicated to the Peer cluster before the role change so as to make sure no data is lost and abide by the following naming convention: <appname>_snap_before_role_change_<timestamp>
. These snapshots must be cleaned up manually if and when needed.
Example
# robin protection-group change-role prod-pg Primary
Job: 12695 Name: ProtectionGroupRoleChange State: VALIDATED Error: 0
Job: 12695 Name: ProtectionGroupRoleChange State: WAITING Error: 0
Job: 12695 Name: ProtectionGroupRoleChange State: COMPLETED Error: 0
# robin protection-group info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: Sanfrancisco**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: minute_repl
+----------+---------------------------+-------------------------------------------+--------------------------+
| App Name | State | Latest Synced Snapshot | Latest Sycned Snap ctime |
+----------+---------------------------+-------------------------------------------+--------------------------+
| mysqldb | SECONDARY_SYNC_SUCCESSFUL | mysqldb_minute_repl-1639611848-1639692409 | 2021-12-16 14:06:54 |
+----------+---------------------------+-------------------------------------------+--------------------------+
Changes the role of a Protection Group in order to influence the direction of replication for a set of applications as well as to aid the failover process.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: change-role
- This mandatory field within the payload specifies the change role operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group for which the role needs to be changed.role: <role>
- This mandatory field within the payload specifies the name of the new role to assign to the Protection Group. Note valid choices for this field include: Primary’ or ‘Secondary’force: [true|false]
- Utilizing this parameter within the payload, by specifying a boolean value, determines whether or not Peers associated with the Protection Group are checked before the role is changed. If set totrue
the associated Peers will not be checked.abort: [true|false]
- Utilizing this parameter within the payload, by specifying a boolean value, determines whether or not ongoing replication tasks are aborted before the role change.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.4.10. Attach a Replication Policy to a Protection Group¶
In order to have applications within a Protection Group be replicated in a scheduled manner, a Replication Policy needs to be associated with the aforementioned Protection Group. The configuration of the Replication Policy determines the frequency at which data is transferred from the primary cluster to secondary peer cluster. Details on how to create a Replication Policy can be found here.
To attach the Replication Policy to a Protection Group, run the following command:
# robin protection-group attach-repl-policy <name> <policy_name>
--peer <peer>
|
Name of the Protection Group |
|
Name of the Replication Policy |
|
Name of specific peer to replicate applications to based on specified Replication Policy configuration. If not specified, the Replication Policy will apply to all peers within the Protection Group |
Example
# robin protection-group attach-repl-policy prod-pg minute_repl
Job: 9998 Name: ProtectionGroupAttachReplicationPolicy State: VALIDATED Error: 0
Job: 9998 Name: ProtectionGroupAttachReplicationPolicy State: WAITING Error: 0
Job: 9998 Name: ProtectionGroupAttachReplicationPolicy State: COMPLETED Error: 0
# robin protection-group info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: NewYork**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: minute_repl
Attaches a Replication Policy to a Protection Group such that applications within a Protection Group can be replicated in a scheduled manner.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: attach-replication-policy
- This mandatory field within the payload specifies the attach Replication Policy operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to which the given Replication Policy needs to be attached.policy_name: <policy_name>
- This mandatory field within the payload specifies the name of the Replication Policy that should be attached to the aforementioned Protection Group.peer_name: <peer>
- Utilizing this parameter within the payload, by specifying a string representing the name of a Peer, results in applications only being replicated to given Peer. If not specified, the Replication Policy will apply to all peers within the Protection Group.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
17.4.11. Detach a Replication Policy from a Protection Group¶
In order to update an attached Repliction Policy or add a new one, the current Replication Policy needs to be detached. When you detach a Replication Policy from a Protection Group, ongoing replications might fail and the replication schedule for the Protection Group stops permanently.
To detach a Replication Policy from a Protection Group, run the following command:
# robin protection-group detach-repl-policy <name> <policy_name>
--peer <peer>
|
Name of the Protection Group |
|
Name of the Replication Policy |
|
Name of the Peer Cluster to detach the Replication Policy for. If not specified, the policy is disassociated with all peers within the Protection Group |
Example
# robin protection-group detach-repl-policy prod-pg minute_repl --peer NewYork --wait
Job: 12695 Name: ProtectionGroupUpdate State: VALIDATED Error: 0
Job: 12695 Name: ProtectionGroupUpdate State: WAITING Error: 0
Job: 12695 Name: ProtectionGroupUpdate State: COMPLETED Error: 0
# robin pg info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: NewYork**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: No Replication Policy has been attached yet
Detaches a Replication Policy from a Protection Group such that the replication schedule for the Protection Group stops permanently and allows for the Replication Policy to be updated.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: detach-replication-policy
- This mandatory field within the payload specifies the detach Replication Policy operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group from which the given Replication Policy needs to be detached.policy_name: <policy_name>
- This mandatory field within the payload specifies the name of the Replication Policy that should be detached from the aforementioned Protection Group.peer_name: <peer>
- Utilizing this parameter within the payload, by specifying a string representing the name of a Peer, results in the Replication Policy only being detached for the given Peer. If not specified, the policy is disassociated with all peers within the Protection Group.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error), 409 (Duplicate Resource Error)
Example Response:
17.4.12. Start replication on demand¶
You can manually replicate applications in a Protection Group to a Peer site when required. In addition, Replications Policies that were previously created can be specified in order to create custom snapshots. In order replicate applications within a protection group on demand, run the following command:
# robin protection-group run-rpol <name>
--rpol <rpol>
--peer <peer>
--app <app_name>
--namespace <namespace>
|
Name of the Protection Group |
|
Name of the Replication Policy to use. Note any Replication Policy specified must be attached to the Protection Group beforehand |
|
Name of Peer Cluster |
|
Name of the specific application that should be replicated |
|
Namespace of the application to sync |
Note
This command should be run the cluster linked to the primary Protection Group.
Example
# robin protection-group run-rpol test_pg --peer hyperv1 --wait
Job: 1463 Name: ProtectionGroupRunPolicy State: AGENT_WAIT Error: 0
Job: 1463 Name: ProtectionGroupRunPolicy State: FINALIZED Error: 0
Job: 1463 Name: ProtectionGroupRunPolicy State: COMPLETED Error: 0
Allows users to manually replicate application(s) in a Protection Group to a Peer site on demand.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: PUT
URL Parameters: None
Data Parameters:
action: force-rpol-run
- This mandatory field within the payload specifies the run Replication Policy operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group that the replication should run for.policy_name: <policy_name>
- Utilizing this parameter within the payload, by specifying a string representing the name of a Replication Policy, results in the given Replication Policy being used to create and filter custom snapshots to be replicated. If not specified, the Replication Policy attached to the Protection Group is used.peer_name: <peer>
- Utilizing this parameter within the payload, by specifying a string representing the name of a Peer, results in application(s) only being replicated to the given Peer. If not specified, the application is replicated to all Peers within the Protection Group.app_name: <app>
- Utilizing this parameter within the payload, by specifying a string representing the name of an application, results in the given application being replicated. If not specified, all applications associated to the aforementioned Protection Group will be replicated.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.4.13. Remove a Peer from a Protection Group¶
In order to remove a peer cluster from a Protection Group, run the following command:
# robin protection-group remove-peer <name> <peer>
--force
|
Name of the Protection Group |
|
Name of the Peer cluster |
|
Force delete the Peer even if the cluster cannot be informed or there is no connection to it |
Note
When a peer is removed from the primary Protection Group by running the above command on the associated cluster, the secondary protection group on the specified peer is also removed.
Example
# robin protection-group remove-peer prod-pg NewYork --wait
Job: 12693 Name: ProtectionGroupUpdate State: VALIDATED Error: 0
Job: 12693 Name: ProtectionGroupUpdate State: WAITING Error: 0
Job: 12693 Name: ProtectionGroupUpdate State: COMPLETED Error: 0
# robin protection-group list
+----+---------+---------+----------------+-------+
| ID | Name | Role | Tenant | Peers |
+----+---------+---------+----------------+-------+
| 3 | prod-pg | PRIMARY | Administrators | [] |
+----+---------+---------+----------------+-------+
Removes a Peer Cluster from a Protection Group. If the Peer is removed from a primary Protection Group, the secondary Protection Group on the specified Peer is also removed.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: POST
URL Parameters: None
Data Parameters:
action: pg-remove-peer
- This mandatory field within the payload specifies the remove Peer operation for Protection Groups is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group from which to remove the given Peer Cluster.peer_name: <peer_name>
- This mandatory field within the payload specifies the name of the Peer to remove from the specified Protection Group.force: [true|false]
- Utilizing this parameter within the payload, by specifying a boolean value, determines whether or not a Peer should be removed even though it cannot be reached. If set totrue
the given Peer will be removed regardless of if a connection can be made.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.4.14. Pause replication¶
Robin allows for the replication of one or more applications to be paused at any given time. This is particularly useful when an application needs to be removed from a Protection Group or or if there are any infrastructure (network or storage) related issues.
Note
All ongoing replication jobs will still complete after the command is issued however no new replication jobs will be spawned.
To pause the replication of one or more applications, run the following command:
# robin protection-group pause-replication <name>
--app <app>
--peer <peer>
--namespace <namespace>
|
Name of the Protection Group |
|
Name of application for which replication should be paused. If not specified, replication for all applications in the Protection Group will be paused |
|
Name of Peer Cluster to which replication should be paused. If not specified, replication to all peers within the Protection Group will be paused |
|
Namespace of the application to be paused |
Note
The replication status for an application can be found using the robin protection-group info command
.
Pauses replication for one or more applications within a Protection Group allowing them to be removed or for any infrstructure related issues hindering the replication process to be addressed.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: POST
URL Parameters: None
Data Parameters:
action: pg-pause-replication
- This mandatory field within the payload specifies the pause replication operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group for which replication should be paused.apps: <list_of_applications>
- Utilizing this parameter within the payload, by specifying a comma seperated list of application names, results in only replication for the given application(s) being paused. If not specified, replication for all applications within the Protection Group will be paused.peer_name: <peer_name>
- Utilizing this parameter within the payload, by specifying a string representing the name of a Peer, results in replication to the given Peer being paused. If not specified, replication to all Peers within the Protection Group will be paused.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.4.15. Resume replication¶
To resume the replication of one or more applications that was previously paused, run the following command:
# robin protection-group resume-replication <name>
--app <app>
--peer <peer>
|
Name of the Protection Group |
|
Name of application for which replication should be resumed. If not specified, replication for all applications in the Protection Group will be resumed |
|
Name of Peer Cluster to which replication should be resumed. If not specified, replication to all peers within the Protection Group will be resumed |
|
Namespace of the application to be resumed |
Resumes replication for one or more applications within a Protection Group, that was previously suspended.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: POST
URL Parameters: None
Data Parameters:
action: pg-resume-replication
- This mandatory field within the payload specifies the resume replication operation is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group for which replication should be resumed.apps: <list_of_applications>
- Utilizing this parameter within the payload, by specifying a comma seperated list of application names, results in only replication for the given application(s) being resumed. If not specified, replication for all applications within the Protection Group will be resumed.peer_name: <peer_name>
- Utilizing this parameter within the payload, by specifying a string representing the name of a Peer, results in replication to the given Peer being resumed. If not specified, replication to all Peers within the Protection Group will be resumed.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.4.16. Delete a Protection Group¶
Deleting a primary Protection Group results in the replication relationship being broken. Before it can be deleted all Peers, applications, and Replication Policies must be disassociated with the Protection Group. In order to properly remove the primary Protection group the delete operation must be run on the cluster associated with it.
Note
Secondary Protection Group(s) are automatically deleted on their associated cluster(s) when the Peer is removed from the Protection Group.
To delete a Protection Group, run the following command:
# robin protection-group delete <name>
|
Name of the Protection Group |
Example
# robin protection-group delete prod-pg
Job: 12699 Name: ProtectionGroupDelete State: VALIDATED Error: 0
Job: 12699 Name: ProtectionGroupDelete State: WAITING Error: 0
Job: 12699 Name: ProtectionGroupDelete State: COMPLETED Error: 0
# robin protection-group list
+----+------+------+--------+-------+
| ID | Name | Role | Tenant | Peers |
+----+------+------+--------+-------+
+----+------+------+--------+-------+
Deletes a primary Protection Group such that the replication relationship is broken. However before it can be deleted all Peers, applications, and Replication Policies must be disassociated with the Protection Group.
End Point: /api/v3/robin_server/protection_groups?version=1&max_version=1
Method: DELETE
URL Parameters: None
Data Parameters:
action: pg-delete
- This mandatory field within the payload specifies the delete operation for Protection Groups is to be performed.name: <pg_name>
- This mandatory field within the payload specifies the name of the Protection Group to be deleted.
Port: RCM Port (default value is 29442)
Headers:
Authorization: <auth_token>
: Authorization token to identify which user is sending the request. The token can be acquired from the login API.
Success Response Code: 202
Error Response Code: 500 (Internal Server Error), 404 (Not Found Error), 401 (Unauthorized Error), 400 (Invalid API Usage Error)
Example Response:
17.5. Managing Replication Policies¶
Topics covered in this section:
|
Create a Replication Policy |
|
List Replication Policies |
|
Get information about a Replication Policy |
|
Update replication schedule for a Replication Policy |
|
Associate snapshot label(s) to a Replication Policy |
|
Update retention count for snapshot label(s) associated to a Replication Policy |
|
Disassociate snapshot label(s) from a Replication Policy |
|
Enable application replication snapshots for a Replication Policy |
|
Update retention count for application replication snapshots for a Replication Policy |
|
Disable application replication snapshots for a Replication Policy |
|
Delete a Replication Policy |
17.5.1. Create a Replication Policy¶
A Replication Policy defines the frequency of data transfer between Peer Clusters.
The policy must be attached to a Protection Group for it to become part of the replication relationship. In turn,
the same frequency defined in the Replication Policy applies to all the applications
in the Protection Group. The replication of application metadata and data is primarily
done through the snapshot mechanism with snapshots essentially being transferred to the
secondary Protection Group from the primary. As a result, a Replication Policy can utilize
snapshots created via a snapshot schedule, details for which can be found here apps.html#manage-application-snapshot-schedules, or
replication snapshots created by the policy itself. The former can be configured by
specifying the labels associated with the snapshot schedule when configuring the policy such
that any snapshot with those labels are replicated whilst the latter simply requires the --create-repl-snapshots
option to be given.
Note
A Replication Policy can also replicate a combination of snapshots created manually or by a schedule
and replication snapshots if both the --labels
and --create-repl-snapshots
parameters are utilized. This is becausse
replication snapshots created by the policy are automatically transferred whilst the additional label
filter will enable snapshots created externally to be transferred too.
Important
Regardless of the type of snapshot that is chosen for replication, only snapshots created from the application(s) attached to the
same Protection Group, that the Replication Policy will eventually be associated to, will be transferred. That is to say for externally created
snapshots only those that meet the specified label filter and are based off attached applications will be replicated. On the other hand,
if Replication Snapshots are enabled, the attached applications will be snapshotted at the cadence of the policy and the resulting
snapshot will be replicated. The names of these snapshots will in the following format: <appname>_<namespace>_<repl-policy-name>_<zoneid>_<creation-time>
.
You must define the frequency of a Replication Policy in a JSON file when creating the Replication Policy or you can use CRON expression. The same format used for setting up application snapshot schedules, detailed here apps.html#manage-application-snapshot-schedules, applies in this case too.
JSON file example:
{
"frequency": "minute",
"minute": 1
}
To create a Replication Policy, run the following command:
# robin replication-policy create <name>
--sched-json <sched_json>
--sched-cron <sched_cron>
--labels <labels>
--create-repl-snapshots
--retain-repl-snapshots
|
Replication Policy name |
|
JSON file containing schedule information |
|
CRON string specifying schedule interval |
|
Snapshot labels to filter which snapshots need to be
transferred. Should be provided as a comma
separated list in the following format |
|
Enable replication snapshots |
|
Number of replication snapshots to maintain on peer clusters |
Example 1 (Creating a standard Replication Policy):
# robin replication-policy create --sched_json ~/minute.json first-replication-policy
Submitted job '9996'. Use 'robin job wait 9996' to track the progress
# robin replication-policy info first-replication-policy
Name: first-replication-policy
Policy Owner: admin
Policy Interval: 60
Create Snapshots: False
Snapshot Labels
+-----+-------+--------+
| Key | Value | Retain |
+-----+-------+--------+
+-----+-------+--------+
Example 2 (Creating a Replication Policy with custom labels):
# robin replication-policy create Rep-test --sched_json 5min_repl.json --labels REPDEMO:Yes:5
Submitted job '9998'. Use 'robin job wait 9998' to track the progress
# robin replication-policy info Rep-test
Name: Rep-test
Policy Owner: robin
Policy Interval: 300
Create Snapshots: True
Retain Snapshot Count: 9
Snapshot Labels
+---------+-------+--------+
| Key | Value | Retain |
+---------+-------+--------+
| REPDEMO | YES | 5 |
+---------+-------+--------+
17.5.2. List Replication Policies¶
To view a list of Replication Policies, run the following command:
# robin replication-policy list --json
|
Output in JSON |
Example
# robin replication-policy list
+-------------+-------+----------+
| Name | Owner | Interval |
+-------------+-------+----------+
| minute_repl | admin | 60 |
+-------------+-------+----------+
17.5.3. Show information about a specific Replication Policy¶
To view details of a Replication Policy, run the following command:
# robin replication-policy info <name>
|
Replication Policy name |
Example
# robin replication-policy info minute_repl
Name: minute_repl
Policy Owner: admin
Policy Interval: 60
Create Snapshots: True
Retain Snapshot Count: 10
Snapshot Labels
+-----+-------+--------+
| Key | Value | Retain |
+-----+-------+--------+
+-----+-------+--------+
17.5.4. Update replication schedule of a Replication Policy¶
In order to update a Replication Policy schedule such that the cadence at which an application is replicated is modified, run the following command:
Note
Any changes made to an existing Replication Policy are reflected on the mirror policy set on the associated Peer Clusters unless otherwise specified. This helps to maintain the desired RPO and RTO in case of Protection Group role change.
# robin replication-policy update-schedule <name>
--sched-json <sched_json>
--sched-cron <sched_cron>
--skip-mirror
|
Replication Policy name |
|
JSON file containing updated schedule information |
|
Updated CRON string specifying schedule interval |
|
Skip updating associated Peer Clusters hosting secondary Protection Groups |
Example
# robin replication-policy update-schedule --sched_json minute.json pri_rpol
Job: 12705 Name: ReplicationPolicyUpdateSchedule State: VALIDATED Error: 0
Job: 12705 Name: ReplicationPolicyUpdateSchedule State: WAITING Error: 0
Job: 12705 Name: ReplicationPolicyUpdateSchedule State: COMPLETED Error: 0
# robin replication-policy list
+----------+-------+----------+
| Name | Owner | Interval |
+----------+-------+----------+
| pri_rpol | admin | 60 |
+----------+-------+----------+
17.5.5. Add labels to a Replication Policy¶
After the creation of a Replication Policy you can associate new and additional labels with it such that any snapshots linked to the aforementioned labels will be replicated to the attached Peer Clusters.
Note
You must provide the same labels in the pod and PVC of the application for successful replication.
To expand the list of snapshot label filters used by a Replication Policy, run the following command:
# robin replication-policy add-label <name> <labels>
--skip-mirror
|
Replication Policy name |
|
Comma separated list in the following format |
|
Skip updating associated Peer Clusters hosting secondary Protection Groups |
Important
Only snapshots that match the label filter being added and are created from application(s) attached to the Protection Group the relevant Replication Policy is associated to will be transferred.
Example
# robin replication-policy add-label minute_repl SNAP:YES:5
Submitted job '31'. Use 'robin job wait 31' to track the progress
# robin replication-policy info minute_repl
Name: minute_repl
Policy Owner: admin
Policy Interval: 60
Create Snapshots: False
Snapshot Labels
+------+-------+--------+
| Key | Value | Retain |
+------+-------+--------+
| SNAP | YES | 5 |
+------+-------+--------+
17.5.6. Update labels associated with a Replication Policy¶
You can update the number of snapshots that you want to retain on attached Peer Clusters for a given set of labels in a Replication Policy by running the following command:
# robin replication-policy update-label <name> <labels> <retain>
--skip-mirror
|
Replication Policy name |
|
Comma separated list of snapshot label filters to update, with each label in the format: |
|
Number of snapshots with this label to be maintained on Peer Clusters, not including the cluster on which the snapshot was created |
|
Skip updating associated Peer Clusters hosting secondary Protection Groups |
Example
# robin replication-policy update-label 5min_auto SNAP:YES 10
# robin replication-policy info 5min_auto
Name: 5min_auto
Policy Owner: robin
Policy Interval: 300
Create Snapshots: True
Retain Snapshot Count: 5
Snapshot Labels
+------+-------+--------+
| Key | Value | Retain |
+------+-------+--------+
| SNAP | YES | 10 |
+------+-------+--------+
17.5.7. Remove labels from a Replication Policy¶
To remove a snapshot label filter from a Replication Policy, such that it no longer replicates snapshots with the given set of labels, run the following command:
# robin replication-policy remove-label <name> <labels>
--skip-mirror
|
Replication Policy name |
|
Comma separated list of snapshot label filters to disassociate with the Replication Policy, with each label in the format: |
|
Skip updating associated Peer Clusters hosting secondary Protection Groups |
Example
# robin replication-policy info 5min_auto
Name: 5min_auto
Policy Owner: robin
Policy Interval: 300
Create Snapshots: True
Retain Snapshot Count: 5
Snapshot Labels
+------+-------+--------+
| Key | Value | Retain |
+------+-------+--------+
| SNAP | YES | 10 |
+------+-------+--------+
# robin replication-policy remove-label 5min_auto SNAP:YES --wait
Job: 844 Name: ReplicationPolicyUpdatePeers State: VALIDATED Error: 0
Job: 844 Name: ReplicationPolicyUpdatePeers State: WAITING Error: 0
Job: 844 Name: ReplicationPolicyUpdatePeers State: COMPLETED Error: 0
# robin replication-policy info 5min_auto
Name: 5min_auto
Policy Owner: robin
Policy Interval: 300
Create Snapshots: True
Retain Snapshot Count: 5
Snapshot Labels
+-----+-------+--------+
| Key | Value | Retain |
+-----+-------+--------+
+-----+-------+--------+
17.5.8. Enable Replication Snapshots for a Replication Policy¶
Replication snapshots are system generated snapshots of applications attached
to the Protection Group which the Replication Policy is associated to. They utilize the
default system label of system_label:Yes
and are created in cadence
with the Replication Policy schedule. The names of the snapshots to be created
will in the following format: <appname>_<namespace>_<repl-policy-name>_<zoneid>_<creation-time>
.
In addition, the number of these snapshots to maintain
on Peer Clusters is also configurable. The feature controlling the creation
of these snapshots can be enabled post-creation of the Replication Policy or
if it was previously disabled, by running the following command:
# robin replication-policy enable-repl-snaps <name> <retain_repl_snaps>
--skip-mirror
|
Replication Policy name |
|
Number of snapshots to be maintained on all Peer Clusters including the one on which it was created |
|
Skip updating associated Peer Clusters hosting secondary Protection Groups |
Note
Whether or not the Replication Snapshot feature is enabled can be seen using the robin replication-policy info
command.
Example:
# robin replication-policy enable-repl-snaps 5min_auto 6
Submitted job '14742'. Use 'robin job wait 14742' to track the progress
# robin replication-policy info 5min_auto
Name: 5min_auto
Policy Owner: robin
Policy Interval: 300
Create Snapshots: True
Retain Snapshot Count: 6
Snapshot Labels
+-----+-------+--------+
| Key | Value | Retain |
+-----+-------+--------+
+-----+-------+--------+
17.5.9. Update number of Replication Snapshots retained by a Replication Policy¶
The number of Replication Snapshots to be maintained for a given policy can be updated as needed. When the value is updated, the change will only be reflected during the next scheduled run to create a new Replication Snapshot. To update the number of Replication Snapshots to retain for a particular Replication Policy, run the following command:
# robin replication-policy update-repl-snaps <name> <retain_repl_snaps>
--skip-mirror
|
Replication Policy name |
|
Number of snapshots to be maintained on all Peer Clusters including the one on which it was created |
|
Skip updating associated Peer Clusters hosting secondary Protection Groups |
17.5.10. Disable Replication Snapshots for a Replication Policy¶
To disable the creation of system-generated Replication Snapshots for a particular Replication Policy, run the following command:
# robin replication-policy disable-repl-snaps <name>
--skip-mirror
|
Replication Policy name |
|
Skip updating associated Peer Clusters |
Example:
# robin replication-policy disable-repl-snaps 5min_auto
Submitted job '14740'. Use 'robin job wait 14740' to track the progress
# robin replication-policy info 5min_auto
Name: 5min_auto
Policy Owner: robin
Policy Interval: 300
Create Snapshots: False
Snapshot Labels
+-----+-------+--------+
| Key | Value | Retain |
+-----+-------+--------+
+-----+-------+--------+
17.5.11. List Replication Snapshots¶
You can view the list of the system generated Replication Snapshots by utilizing the robin snapshot list
command,
details of which can be found here, alongside the label filter and the default system label of system_replicate:yes
.
An example of this shown below:
Example
# robin snapshot list --labels system_replicate:yes
+----------------------------------+--------+---------------+----------+--------------------------------------------+
| Snapshot ID | State | App Name | App Kind | Snapshot name |
+----------------------------------+--------+---------------+----------+--------------------------------------------+
| 1cb934a4ab3a11ec8b84b3d731b0b3bb | ONLINE | test-ns/pgsql | flexapp | pgsql_test_5min_auto-1648050416-1648102657 |
| b26fa730ac2911ec9282312d4a1cadc1 | ONLINE | test-ns/pgsql | flexapp | pgsql_test_5min_auto-1648050416-1648205557 |
| 6540b266ac2a11ecb401af83176cced7 | ONLINE | test-ns/pgsql | flexapp | pgsql_test_5min_auto-1648050416-1648205857 |
| 18c4b84aac2b11ec952e47e3b463ab7d | ONLINE | test-ns/pgsql | flexapp | pgsql_test_5min_auto-1648050416-1648206158 |
| c9b57c68ac2b11eca959d7db3dfb1472 | ONLINE | test-ns/pgsql | flexapp | pgsql_test_5min_auto-1648050416-1648206456 |
| 7d5c34c0ac2c11ecb94a031211f1ef31 | ONLINE | test-ns/pgsql | flexapp | pgsql_test_5min_auto-1648050416-1648206757 |
+----------------------------------+--------+---------------+----------+--------------------------------------------+
Note
The same command can be used with any label filter provided to the Replication Policy to showcase which externally generated snapshots will be replicated.
17.5.12. Delete a Replication Policy¶
To delete a Replication Policy, run the following command:
# robin replication-policy delete <name>
Note
In order to delete a Replication Policy, it must not be attached to any Protection Group(s).
|
Replication Policy name |
Example
# robin replication-policy delete minute_repl –-wait
Job: 12710 Name: ReplicationPolicyDelete State: VALIDATED Error: 0
Job: 12710 Name: ReplicationPolicyDelete State: WAITING Error: 0
Job: 12710 Name: ReplicationPolicyDelete State: COMPLETED Error: 0
17.6. Asynchronous Disaster Recovery quickstart guide¶
17.6.1. Prerequisites¶
The following are the prerequisites for setting up asynchronous Disaster Recovery:
Two Robin clusters of the same version (5.4.1+) are needed to create a replication pair.
A user with administrator privileges for both clusters.
17.6.2. Overview of Steps¶
The following steps need to be completed in order to setup a functioning replication procedure to safeguard an application:
Note
It is highly recommneded that the general Disaster Recovery concepts, detailed here are reviewed before starting the tutorial.
17.6.3. Step 1: Initiate Pairing¶
First login to the first cluster from where the pairing is to be initiated and register the second cluster of the peer pair, using the following command:
# robin peer-cluster create test_peer
Provide this token to the secondary cluster:
eyJlbmRwb2ludCI6ICIxMC45LjYxLjQ3IiwgInBlZXJfYXV0aF90b2tlbiI6ICJleUowZVhBaU9pSktWMVFpTENKaGJHY2lPaUpJVXpJMU5pSjkuZXlKMWMyVnlYMmxrSWpveUxDSjBaVzVoYm5SZmFXUWlPakVzSW1WNGNDSTZNVFk1T0RRd01UY3pObjAuZkFveUM4S3FfQUdfZ2U5cTZKLXVpdC1mRjJtdUVsMzUtNFYzVklFeS1lWSIsICJwZWVyX3Rva2VuIjogIjNDUWVJNG1oQVRoV0VKb0FENTZVUFI1VSIsICJwZWVyX2NsdXN0ZXJfaWQiOiAiMWNkZmNiY2YtMTliYS00YThjLWEzMDMtNGNiMGFjZmU3N2YxIiwgInBlZXJfem9uZWlkIjogMTY0NjQ0NjI3OX0=
Note
More details regarding the robin peer-cluster create
command such as additional parameters and an explanation of the commands use case can be found here.
As displayed in the output of the command, the token provided must be set on the second cluster of the cluster pair. Details on how to achieve this are given in the second step of this walkthrough here.
Important
The encoded blob expires after ten minutes. You must use it within this time period on the peer cluster in order to complete pairing process.
17.6.4. Step 2: Complete Pairing¶
To complete the peer pairing process, the blob created from the previous step must be registered on the second cluster within the cluster pair by running the following command on the aforementioned cluster:
Important
The encoded blob expires after ten minutes. You must use it within this time period on the peer cluster in order to complete pairing process.
# robin peer-cluster pair test_peer_primary eyJlbmRwb2ludCI6ICIxMC45LjYxLjQ3IiwgInBlZXJfYXV0aF90b2tlbiI6ICJleUowZVhBaU9pSktWMVFpTENKaGJHY2lPaUpJVXpJMU5pSjkuZXlKMWMyVnlYMmxrSWpveUxDSjBaVzVoYm5SZmFXUWlPakVzSW1WNGNDSTZNVFk1T0RRd01UY3pObjAuZkFveUM4S3FfQUdfZ2U5cTZKLXVpdC1mRjJtdUVsMzUtNFYzVklFeS1lWSIsICJwZWVyX3Rva2VuIjogIjNDUWVJNG1oQVRoV0VKb0FENTZVUFI1VSIsICJwZWVyX2NsdXN0ZXJfaWQiOiAiMWNkZmNiY2YtMTliYS00YThjLWEzMDMtNGNiMGFjZmU3N2YxIiwgInBlZXJfem9uZWlkIjogMTY0NjQ0NjI3OX0= --wait
Job: 52 Name: PeerClusterPair State: VALIDATED Error: 0
Job: 52 Name: PeerClusterPair State: COMPLETED Error: 0
Note
More details regarding the robin peer-cluster pair
command such as additional parameters and an explanation of the commands functionality can be found here.
17.6.5. Step 3: Create a Protection Group¶
After setting up the peers such that the two clusters now form a replication pair, a protection group must be created on each site in order to facilitate the replication process. In order to create a protection group, run the following command:
# robin protection-group create prod-pg --peers NewYork
Job: 999 Name: ProtectionGroupCreateMultiPeer State: PROCESSED Error: 0
Job: 999 Name: ProtectionGroupCreateMultiPeer State: WAITING Error: 0
Job: 999 Name: ProtectionGroupCreateMultiPeer State: COMPLETED Error: 0
Important
The protection group creation command only needs to be run on one cluster within the pair as a mirror protection group will be spawned on the given peer(s) as part of the operation. By default
the protection group on created on the cluster where the initial command was run will have the primary
role whilst the mirror protection groups will have the secondary
role.
Note
More details regarding the robin protection-group create
command such as additional parameters and an explanation of the commands functionality can be found here.
In order to ensure the protection group was successfully created, issue the following command:
# robin protection-group list
+----+---------+---------+----------------+-------------+
| ID | Name | Role | Tenant | Peers |
+----+---------+---------+----------------+-------------+
| 3 | prod-pg | PRIMARY | Administrators | ['NewYork'] |
+----+---------+---------+----------------+-------------+
Note
More details regarding the robin protection-group list
command such as additional parameters and an explanation of the commands functionality can be found here.
17.6.6. Step 4: Create a Replication Policy¶
A Replication Policy is a construct where in which the frequency of transferrance of application snapshots is defined.
In addition it also contains details on which application snapshots should be transferred as part of the replication process
by utilizing a label filter. As a result it is the primary driving force in determining what data is transferred between peer
clusters as well as how often the data is moved. In order to create a Replication Policy that attempts to replicate externally created
application snapshots with the label SNAP:YES
whilst retaining at least 5 of these snapshots on any relevant peer every minute,
run the following command:
# robin replication-policy create first-replication-policy --sched-cron * * * * * --labels SNAP:YES:5
Submitted job '9998'. Use 'robin job wait 9998' to track the progress
# robin replication-policy info first-replication-policy
Name: first-replication-policy
Policy Owner: admin
Policy Interval: 60
Create Snapshots: False
Snapshot Labels
+------+-------+--------+
| Key | Value | Retain |
+------+-------+--------+
| SNAP | YES | 5 |
+------+-------+--------+
Note
More details regarding the robin replication-policy create
command such as additional parameters and an explanation of the commands functionality can be found here.
17.6.7. Step 5: Enable Replication Policy Snapshots¶
In the previous step a Replication Policy was created which aimed to move snapshots created manually or using a snapshot schedule with a given label every minute. In addition to this snapshots can be created in cadence with the Replication Policy schedule and transferred. These are known as Replication Snapshots and more information about this type of snapshot can be found here. In order to enable this optional feature, run the following command:
# robin replication-policy enable-repl-snaps first-replication-policy 10
Submitted job '9997'. Use 'robin job wait 9997' to track the progress
# robin replication-policy info first-replication-policy
Name: first-replication-policy
Policy Owner: admin
Policy Interval: 60
Create Snapshots: True
Snapshot Labels
+------+-------+--------+
| Key | Value | Retain |
+------+-------+--------+
| SNAP | YES | 5 |
+------+-------+--------+
Note
More details regarding the robin replication-policy enable-repl-snaps
command such as additional parameters and an explanation of the commands functionality can be found here.
17.6.8. Step 6: Attach Replication Policy to Protection Group¶
After the desired Replication Policy has been created it needs to be attached the previously created Protection Group. This is needed in order for the schedule defined within the policy to come into effect and consequently for the replication process to begin. In order to attach the Replication Policy to a Protection Group, run the following command:
Note
Alongside the Replication Policy, any applications targeted for replication also need to be attached to the Protection Group before the replication process can begin. The means by which to do this is described in the next step.
# robin protection-group attach-repl-policy prod-pg first-replication-policy
Submitted job '9998'. Use 'robin job wait 9998' to track the progress
# robin protection-group info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: NewYork**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: first-replication-policy
Note
More details regarding the robin protection-group attach-repl-policy
command such as additional parameters and an explanation of the commands functionality can be found here.
17.6.9. Step 7: Attach an Application to Protection Group¶
In the previous steps, the construct of Replication Snapshots and their creation via a Replication Policy was introduced. In order for these snapshots to actually come into fruition, applications must first be attached to a Protection Group as they will be the subject of the aforementioned snapshots. Moreover, for snapshots created otherwise only those associated with attached applications and match the label filter will be replicated. To attach an application to a Protection Group, run the following command:
# robin protection-group add-app prod-pg mysqldb --wait
Job: 10245 Name: ProtectionGroupAddApps State: PROCESSED Error: 0
Job: 10245 Name: ProtectionGroupAddApps State: WAITING Error: 0
Job: 10245 Name: ProtectionGroupAddApps State: COMPLETED Error: 0
# robin pg info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
Peer Cluster Name: NewYork
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: first-replication-policy
+----------+-------------------------+--------------------------------------------------------+--------------------------+
| App Name | State | Latest Synced Snapshot | Latest Sycned Snap ctime |
+----------+-------------------------+--------------------------------------------------------+--------------------------+
| mysqldb | PRIMARY_SYNC_SUCCESSFUL | mysqldb_first-replication-policy-1639611848-1639688268 | 2021-12-16 12:57:52 |
+----------+-------------------------+--------------------------------------------------------+--------------------------+
Note
More details regarding the robin protection-group add-app
command such as additional parameters and an explanation of the commands functionality can be found here.
17.6.10. Step 8: Verify Data Transfer¶
After completing all the previous steps, the replication procedure for applications associated with a given Protection Group will be
setup. The application data transferred between peer clusters can be tracked with the previously seen robin protection-group info
command.
Specifically the state of the respective snapshots will change in the given order upon successfully replicating:
SYNC Pending
SYNC In Progress
SYNC Successful
Note
Each of the above states will be prefixed with the role of the Protection Group, either Primary
or Secondary
based on the cluster the aforementioned command is run on.
More information on the given Sync states can be found here.
Show below is an example output of the successful replication of a snapshot on the cluster hosting the primary Protection Group.
# robin protection-group info prod-pg
Name: prod-pg
Tenant: Administrators
Role: PRIMARY
Peer Information:
=================
**Peer Cluster Name: NewYork**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: minute_repl
+----------+-------------------------+-------------------------------------------+--------------------------+
| App Name | State | Latest Synced Snapshot | Latest Sycned Snap ctime |
+----------+-------------------------+-------------------------------------------+--------------------------+
| mysqldb | PRIMARY_SYNC_SUCCESSFUL | mysqldb_minute_repl-1639611848-1639688868 | 2021-12-16 13:07:53 |
+----------+-------------------------+-------------------------------------------+--------------------------+
On the other hand, below is an example output of the successful replication of a snapshot on the cluster hosting a secondary Protection Group.
# robin protection-group info prod-pg
Name: prod-pg
Tenant: Administrators
Role: SECONDARY
Peer Information:
=================
**Peer Cluster Name: Sanfrancisco**
Peer Cluster State: PAIRED
Peer Cluster ProtectionGroupState: PROTECTED
Replication Policy: minute_repl-1
+----------+---------------------------+-------------------------------------------+--------------------------+
| App Name | State | Latest Synced Snapshot | Latest Sycned Snap ctime |
+----------+---------------------------+-------------------------------------------+--------------------------+
| mysqldb | SECONDARY_SYNC_SUCCESSFUL | mysqldb_minute_repl-1639611848-1639688989 | 2021-12-16 13:09:53 |
+----------+---------------------------+-------------------------------------------+--------------------------+