Chapter 17 Group Replication

This chapter explains MySQL Group Replication and how to install, configure and monitor groups. MySQL Group Replication is a MySQL Server plugin that enables you to create elastic, highly-available, fault-tolerant replication topologies.

Groups can operate in a single-primary mode with automatic primary election, where only one server accepts updates at a time. Alternatively, for more advanced users, groups can be deployed in multi-primary mode, where all servers can accept updates, even if they are issued concurrently.

There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time. Servers can leave and join the group and the view is updated accordingly. Sometimes servers can leave the group unexpectedly, in which case the failure detection mechanism detects this and notifies the group that the view has changed. This is all automatic.

The chapter is structured as follows:

17.1 Group Replication Background

This section provides background information on MySQL Group Replication.

The most common way to create a fault-tolerant system is to resort to making components redundant, in other words the component can be removed and the system should continue to operate as expected. This creates a set of challenges that raise complexity of such systems to a whole different level. Specifically, replicated databases have to deal with the fact that they require maintenance and administration of several servers instead of just one. Moreover, as servers are cooperating together to create the group several other classic distributed systems problems have to be dealt with, such as network partitioning or split brain scenarios.

Therefore, the ultimate challenge is to fuse the logic of the database and data replication with the logic of having several servers coordinated in a consistent and simple way. In other words, to have multiple servers agreeing on the state of the system and the data on each and every change that the system goes through. This can be summarized as having servers reaching agreement on each database state transition, so that they all progress as one single database or alternatively that they eventually converge to the same state. Meaning that they need to operate as a (distributed) state machine.

MySQL Group Replication provides distributed state machine replication with strong coordination between servers. Servers coordinate themselves automatically when they are part of the same group. The group can operate in a single-primary mode with automatic primary election, where only one server accepts updates at a time. Alternatively, for more advanced users the group can be deployed in multi-primary mode, where all servers can accept updates, even if they are issued concurrently. This power comes at the expense of applications having to work around the limitations imposed by such deployments.

There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time. Servers can leave and join the group and the view is updated accordingly. Sometimes servers can leave the group unexpectedly, in which case the failure detection mechanism detects this and notifies the group that the view has changed. This is all automatic.

For a transaction to commit, the majority of the group have to agree on the order of a given transaction in the global sequence of transactions. Deciding to commit or abort a transaction is done by each server individually, but all servers make the same decision. If there is a network partition, resulting in a split where members are unable to reach agreement, then the system does not progress until this issue is resolved. Hence there is also a built-in, automatic, split-brain protection mechanism.

All of this is powered by the provided Group Communication System (GCS) protocols. These provide a failure detection mechanism, a group membership service, and safe and completely ordered message delivery. All these properties are key to creating a system which ensures that data is consistently replicated across the group of servers. At the very core of this technology lies an implementation of the Paxos algorithm. It acts as the group communication engine.

17.1.1 Replication Technologies

Before getting into the details of MySQL Group Replication, this section introduces some background concepts and an overview of how things work. This provides some context to help understand what is required for Group Replication and what the differences are between classic asynchronous MySQL Replication and Group Replication.

17.1.1.1 Primary-Secondary Replication

Traditional MySQL Replication provides a simple Primary-Secondary approach to replication. There is a primary (master) and there is one or more secondaries (slaves). The primary executes transactions, commits them and then they are later (thus asynchronously) sent to the secondaries to be either re-executed (in statement-based replication) or applied (in row-based replication). It is a shared-nothing system, where all servers have a full copy of the data by default.

Figure 17.1 MySQL Asynchronous Replication

A transaction received by the master is executed, written to the binary log, then committed, and a response is sent to the client application. The record from the binary log is sent to the relay logs on Slave 1 and Slave 2 before the commit takes place on the master. On each of the slaves, the transaction is applied, written to the slave's binary log, and committed. The commit on the master and the commits on the slaves are all independent and asynchronous.

There is also semisynchronous replication, which adds one synchronization step to the protocol. This means that the Primary waits, at commit time, for the secondary to acknowledge that it has received the transaction. Only then does the Primary resume the commit operation.

Figure 17.2 MySQL Semisynchronous Replication

A transaction received by the master is executed and written to the binary log. The record from the binary log is sent to the relay logs on Slave 1 and Slave 2. The master then waits for an acknowledgement from the slaves. When both of the slaves have returned the acknowledgement, the master commits the transaction, and a response is sent to the client application. After each slave has returned its acknowlegement, it applies the transaction, writes it to the binary log, and commits it. The commit on the master depends on the acknowledgement from the slaves, but the commits on the slaves are independent from each other and from the commit on the master.

In the two pictures above, you can see a diagram of the classic asynchronous MySQL Replication protocol (and its semisynchronous variant as well). Blue arrows represent messages exchanged between servers or messages exchanged between servers and the client application.

17.1.1.2 Group Replication

Group Replication is a technique that can be used to implement fault-tolerant systems. The replication group is a set of servers that interact with each other through message passing. The communication layer provides a set of guarantees such as atomic message and total order message delivery. These are very powerful properties that translate into very useful abstractions that one can resort to build more advanced database replication solutions.

MySQL Group Replication builds on top of such properties and abstractions and implements a multi-master update everywhere replication protocol. In essence, a replication group is formed by multiple servers and each server in the group may execute transactions independently. But all read-write (RW) transactions commit only after they have been approved by the group. Read-only (RO) transactions need no coordination within the group and thus commit immediately. In other words, for any RW transaction the group needs to decide whether it commits or not, thus the commit operation is not a unilateral decision from the originating server. To be precise, when a transaction is ready to commit at the originating server, the server atomically broadcasts the write values (rows changed) and the correspondent write set (unique identifiers of the rows that were updated). Then a global total order is established for that transaction. Ultimately, this means that all servers receive the same set of transactions in the same order. As a consequence, all servers apply the same set of changes in the same order, therefore they remain consistent within the group.

However, there may be conflicts between transactions that execute concurrently on different servers. Such conflicts are detected by inspecting the write sets of two different and concurrent transactions, in a process called certification. If two concurrent transactions, that executed on different servers, update the same row, then there is a conflict. The resolution procedure states that the transaction that was ordered first commits on all servers, whereas the transaction ordered second aborts, and thus is rolled back on the originating server and dropped by the other servers in the group. This is in fact a distributed first commit wins rule.

Figure 17.3 MySQL Group Replication Protocol

A transaction received by Master 1 is executed. Master 1 then sends a message to the replication group, consisting of itself, Master 2, and Master 3. When all three members have reached consensus, they certify the transaction. Master 1 then writes the transaction to its binary log, commits it, and sends a response to the client application. Masters 2 and 3 write the transaction to their relay logs, then apply it, write it to the binary log, and commit it.

Finally, Group Replication is a shared-nothing replication scheme where each server has its own entire copy of the data.

The figure above depicts the MySQL Group Replication protocol and by comparing it to MySQL Replication (or even MySQL semisynchronous replication) you can see some differences. Note that some underlying consensus and Paxos related messages are missing from this picture for the sake of clarity.

17.1.2 Group Replication Use Cases

Group Replication enables you to create fault-tolerant systems with redundancy by replicating the system state throughout a set of servers. Consequently, even if some of the servers fail, as long it is not all or a majority, the system is still available, and all it could have degraded performance or scalability, it is still available. Server failures are isolated and independent. They are tracked by a group membership service which relies on a distributed failure detector that is able to signal when any servers leave the group, either voluntarily or due to an unexpected halt. There is a distributed recovery procedure to ensure that when servers join the group they are brought up to date automatically. There is no need for server fail-over, and the multi-master update everywhere nature ensures that not even updates are blocked in the event of a single server failure. Therefore MySQL Group Replication guarantees that the database service is continuously available.

It is important to understand that although the database service is available, in the event of a server crash, those clients connected to it must be redirected, or failed over, to a different server. This is not something Group Replication attempts to resolve. A connector, load balancer, router, or some form of middleware are more suitable to deal with this issue.

To summarize, MySQL Group Replication provides a highly available, highly elastic, dependable MySQL service.

17.1.2.1 Examples of Use Case Scenarios

The following examples are typical use cases for Group Replication.

  • Elastic Replication - Environments that require a very fluid replication infrastructure, where the number of servers has to grow or shrink dynamically and with as few side-effects as possible. For instance, database services for the cloud.

  • Highly Available Shards - Sharding is a popular approach to achieve write scale-out. Use MySQL Group Replication to implement highly available shards, where each shard maps to a replication group.

  • Alternative to Master-Slave replication - In certain situations, using a single master server makes it a single point of contention. Writing to an entire group may prove more scalable under certain circumstances.

  • Autonomic Systems - Additionally, you can deploy MySQL Group Replication purely for the automation that is built into the replication protocol (described already in this and previous chapters).

17.1.3 Group Replication Details

This section presents details about some of the services that Group Replication builds on.

17.1.3.1 Failure Detection

There is a failure detection mechanism provided that is able to find and report which servers are silent and as such assumed to be dead. At a high level, the failure detector is a distributed service that provides information about which servers may be dead (suspicions). Later if the group agrees that the suspicions are probably true, then the group decides that a given server has indeed failed. This means that the remaining members in the group take a coordinated decision to exclude a given member.

Suspicions are triggered when servers go mute. When server A does not receive messages from server B during a given period, a timeout occurs and a suspicion is raised.

If a server gets isolated from the rest of the group, then it suspects that all others have failed. Being unable to secure agreement with the group (as it cannot secure a quorum), its suspicion does not have consequences. When a server is isolated from the group in this way, it is unable to execute any local transactions.

17.1.3.2 Group Membership

MySQL Group Replication relies on a group membership service. This is built into the plugin. It defines which servers are online and participating in the group. The list of online servers is often referred to as a view. Therefore, every server in the group has a consistent view of which are the members participating actively in the group at a given moment in time.

Servers have to agree not only on transaction commits, but also which is the current view. Therefore, if servers agree that a new server becomes part of the group, then the group itself is reconfigured to integrate that server in it, triggering a view change. The opposite also happens, if a server leaves the group, voluntarily or not, then the group dynamically rearranges its configuration and a view change is triggered.

Note though that when a member leaves voluntarily, it first initiates a dynamic group reconfiguration. This triggers a procedure, where all members have to agree on the new view without the leaving server. However, if a member leaves involuntarily (for example it has stopped unexpectedly or the network connection is down) then the failure detection mechanism realizes this fact and a reconfiguration of the group is proposed, this one without the failed member. As mentioned this requires agreement from the majority of servers in the group. If the group is not able to reach agreement (for example it partitioned in such a way that there is no majority of servers online), then the system is not be able to dynamically change the configuration and as such, blocks to prevent a split-brain situation. Ultimately, this means that the administrator needs to step in and fix this.

17.1.3.3 Fault-tolerance

MySQL Group Replication builds on an implementation of the Paxos distributed algorithm to provide distributed coordination between servers. As such, it requires a majority of servers to be active to reach quorum and thus make a decision. This has direct impact on the number of failures the system can tolerate without compromising itself and its overall functionality. The number of servers (n) needed to tolerate f failures is then n = 2 x f + 1.

In practice this means that to tolerate one failure the group must have three servers in it. As such if one server fails, there are still two servers to form a majority (two out of three) and allow the system to continue to make decisions automatically and progress. However, if a second server fails involuntarily, then the group (with one server left) blocks, because there is no majority to reach a decision.

The following is a small table illustrating the formula above.

Group Size

Majority

Instant Failures Tolerated

1

1

0

2

2

0

3

2

1

4

3

1

5

3

2

6

4

2

7

4

3

The next Chapter covers technical aspects of Group Replication.

17.2 Getting Started

MySQL Group Replication is provided as a plugin to MySQL server, and each server in a group requires configuration and installation of the plugin. This section provides a detailed tutorial with the steps required to create a replication group with at least three servers.

17.2.1 Deploying Group Replication in Single-Primary Mode

Each of the server instances in a group can run on an independent physical machine, or on the same machine. This section explains how to create a replication group with three MySQL Server instances on one physical machine. This means that three data directories are needed, one per server instance, and that you need to configure each instance independently.

Figure 17.4 Group Architecture

Three server instances, S1, S2, and S3, are deployed as an interconnected group, and clients communicate with each of the server instances.

This tutorial explains how to get and deploy MySQL Server with the Group Replication plugin, how to configure each server instance before creating a group, and how to use Performance Schema monitoring to verify that everything is working correctly.

17.2.1.1 Deploying Instances for Group Replication

The first step is to deploy three instances of MySQL Server. Group Replication is a built-in MySQL plugin provided with MySQL Server 5.7.17 and later. For more background information on MySQL plugins, see Section 5.5, “MySQL Server Plugins”. This procedure assumes that MySQL Server was downloaded and unpacked into the directory named mysql-5.7. The following procedure uses one physical machine, therefore each MySQL server instance requires a specific data directory for the instance. Create the data directories in a directory named data and initialize each one.

mkdir data
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s1
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s2
mysql-5.7/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-5.7 --datadir=$PWD/data/s3

Inside data/s1, data/s2, data/s3 is an initialized data directory, containing the mysql system database and related tables and much more. To learn more about the initialization procedure, see Section 2.10.1.1, “Initializing the Data Directory Manually Using mysqld”.

Warning

Do not use --initialize-insecure in production environments, it is only used here to simplify the tutorial. For more information on security settings, see Section 17.5, “Group Replication Security”.

17.2.1.2 Configuring an Instance for Group Replication

This section explains the configuration settings required for MySQL Server instances that you want to use for Group Replication. For background information, see Section 17.7.2, “Group Replication Limitations”.

Group Replication Server Settings

To install and use the Group Replication plugin you must configure the MySQL Server instance correctly. It is recommended to store the configuration in the instance's configuration file. See Section 4.2.6, “Using Option Files” for more information. Unless stated otherwise, what follows is the configuration for the first instance in the group, referred to as s1 in this procedure. The following section shows an example server configuration.

[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s1
basedir=<full_path_to_bin>/mysql-5.7/

port=24801
socket=<full_path_to_sock_dir>/s1.sock

These settings configure MySQL server to use the data directory created earlier and which port the server should open and start listening for incoming connections.

Note

The non-default port of 24801 is used because in this tutorial the three server instances use the same hostname. In a setup with three different machines this would not be required.

Group Replication depends on a network connection between the members, which means that each member must be able to resolve the network address of all of the other members. For example in this tutorial all three instances run on one machine, so to ensure that the members can contact each other you could add a line to the option file such as report_host=127.0.0.1.

Replication Framework

The following settings configure replication according to the MySQL Group Replication requirements.

server_id=1
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

These settings configure the server to use the unique identifier number 1, to enable global transaction identifiers and to store replication metadata in system tables instead of files. Additionally, it instructs the server to turn on binary logging, use row-based format and disable binary log event checksums. For more details see Section 17.7.1, “Group Replication Requirements”.

Group Replication Settings

At this point the my.cnf file ensures that the server is configured and is instructed to instantiate the replication infrastructure under a given configuration. The following section configures the Group Replication settings for the server.

transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24901"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group=off
Note

The loose- prefix used for the group_replication variables above instructs the server to continue to start if the Group Replication plugin has not been loaded at the time the server is started.

  • Configuring transaction_write_set_extraction instructs the server that for each transaction it has to collect the write set and encode it as a hash using the XXHASH64 hashing algorithm.

  • Configuring group_replication_group_name tells the plugin that the group that it is joining, or creating, is named "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa".

  • Configuring group_replication_start_on_boot instructs the plugin to not start operations automatically when the server starts. This is important when setting up Group Replication as it ensures you can configure the server before manually starting the plugin. Once the member is configured you can set group_replication_start_on_boot to on so that Group Replication starts automatically upon server boot.

  • Configuring group_replication_local_address tells the plugin to use the network address 127.0.0.1 and port 24901 for internal communication with other members in the group.

    Important

    Group Replication uses this address for internal member-to-member connections using XCom. This address must be different to the hostname and port used for SQL and it must not be used for client applications. It must be reserved for internal communication between the members of the group while running Group Replication.

    The network address configured by group_replication_local_address must be resolvable by all group members. For example, if each server instance is on a different machine with a fixed network address you could use the IP of the machine, such as 10.0.0.1. If you use hostnames then you must use a fully qualified name and ensure it is resolvable through DNS, correctly configured /etc/hosts files, or other name resolution processes. The recommended port for group_replication_local_address is 33061. In this tutorial we use three server instances running on one machine, thus ports 24901 to 24903 are used for the internal communication network address.

  • Configuring group_replication_group_seeds sets the hostname and port of the group members which are used by the new member to establish its connection to the group. These members are called the seed members. Once the connection is established, the group membership information is listed at performance_schema.replication_group_members. Usually the group_replication_group_seeds list contains the hostname:port of each of the group member's group_replication_local_address, but this is not obligatory and a subset of the group members can be chosen as seeds.

    Important

    The hostname:port listed in group_replication_group_seeds is the seed member's internal network address, configured by group_replication_local_address and not the SQL hostname:port used for client connections, and shown for example in performance_schema.replication_group_members table.

    The server that starts the group does not make use of this option, since it is the initial server and as such, it is in charge of bootstrapping the group. In other words, any existing data which is on the server bootstrapping the group is what is used as the data for the next joining member. The second server joining asks the one and only member in the group to join, any missing data on the second server is replicated from the donor data on the bootstrapping member, and then the group expands. The third server joining can ask any of these two to join, data is synchronized to the new member, and then the group expands again. Subsequent servers repeat this procedure when joining.

    Warning

    When joining multiple servers at the same time, make sure that they point to seed members that are already in the group. Do not use members that are also joining the group as as seeds, because they may not yet be in the group when contacted.

    It is good practice to start the bootstrap member first, and let it create the group. Then make it the seed member for the rest of the members that are joining. This ensures that there is a group formed when joining the rest of the members.

    Creating a group and joining multiple members at the same time is not supported. It may work, but chances are that the operations race and then the act of joining the group ends up in an error or a time out.

  • Configuring group_replication_bootstrap_group instructs the plugin whether to bootstrap the group or not.

    Important

    This option must only be used on one server instance at any time, usually the first time you bootstrap the group (or in case the entire group is brought down and back up again). If you bootstrap the group multiple times, for example when multiple server instances have this option set, then they could create an artificial split brain scenario, in which two distinct groups with the same name exist. Disable this option after the first server instance comes online.

Configuration for all servers in the group is quite similar. You need to change the specifics about each server (for example server_id, datadir, group_replication_local_address). This is illustrated later in this tutorial.

17.2.1.3 User Credentials

Group Replication uses the asynchronous replication protocol to achieve Section 17.9.5, “Distributed Recovery”, synchronizing group members before joining them to the group. The distributed recovery process relies on a replication channel named group_replication_recovery which is used to transfer transactions from donor members to members that join the group. Therefore you need to set up a replication user with the correct permissions so that Group Replication can establish direct member-to-member recovery replication channels.

Start the server using the options file:

mysql-5.7/bin/mysqld --defaults-file=data/s1/s1.cnf

Create a MySQL user with the REPLICATION-SLAVE privilege. This process should not be captured in the binary log to avoid the changes being propagated to other server instances. Connect to server s1 and issue the following statements:

mysql> SET SQL_LOG_BIN=0;

In the following example the user rpl_user with the password password is shown. When configuring your servers use a suitable user name and password. Binary logging is enabled again once the user is created.

mysql> CREATE USER rpl_user@'%' IDENTIFIED BY 'password';
mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
mysql> FLUSH PRIVILEGES;
mysql> SET SQL_LOG_BIN=1;

Once the user has been configured, use the CHANGE MASTER TO statement to configure the server to use the given credentials for the group_replication_recovery replication channel the next time it needs to recover its state from another member. Issue the following, replacing rpl_user and password with the values used when creating the user.

mysql> CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password' \\
		      FOR CHANNEL 'group_replication_recovery';
      

Distributed recovery is the first step taken by a server that joins the group and does not have the same set of transactions as the group members. If these credentials are not set correctly for the group_replication_recovery replication channel and the rpl_user as shown, the server cannot connect to the donor members and run the distributed recovery process to gain synchrony with the other group members, and hence ultimately cannot join the group.

Similarly, if the server cannot correctly identify the other members via the server's hostname the recovery process can fail. It is recommended that operating systems running MySQL have a properly configured unique hostname, either using DNS or local settings. This hostname can be verified in the Member_host column of the performance_schema.replication_group_members table. If multiple group members externalize a default hostname set by the operating system, there is a chance of the member not resolving to the correct member address and not being able to join the group. In such a situation use report_host to configure a unique hostname to be externalized by each of the servers.

17.2.1.4 Launching Group Replication

Once server s1 has been configured and started, install the Group Replication plugin. Connect to the server and issue the following command:

INSTALL PLUGIN group_replication SONAME 'group_replication.so';
Important

The mysql.session user must exist before you can load Group Replication. mysql.session was added in MySQL version 5.7.19. If your data dictionary was initialized using an earlier version you must run the mysql_upgrade procedure. If the upgrade is not run, Group Replication fails to start with the error message There was an error when trying to access the server with user: mysql.session@localhost. Make sure the user is present in the server and that mysql_upgrade was ran after a server update..

To check that the plugin was installed successfully, issue SHOW PLUGINS; and check the output. It should show something like this:

mysql> SHOW PLUGINS;
+----------------------------+----------+--------------------+----------------------+-------------+
| Name                       | Status   | Type               | Library              | License     |
+----------------------------+----------+--------------------+----------------------+-------------+
| binlog                     | ACTIVE   | STORAGE ENGINE     | NULL                 | PROPRIETARY |

(...)

| group_replication          | ACTIVE   | GROUP REPLICATION  | group_replication.so | PROPRIETARY |
+----------------------------+----------+--------------------+----------------------+-------------+

To start the group, instruct server s1 to bootstrap the group and then start Group Replication. This bootstrap should only be done by a single server, the one that starts the group and only once. This is why the value of the bootstrap configuration option was not saved in the configuration file. If it is saved in the configuration file, upon restart the server automatically bootstraps a second group with the same name. This would result in two distinct groups with the same name. The same reasoning applies to stopping and restarting the plugin with this option set to ON.

SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;

Once the START GROUP_REPLICATION statement returns, the group has been started. You can check that the group is now created and that there is one member in it:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE  |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| group_replication_applier | ce9be252-2b71-11e6-b8f4-00212844f856 | myhost      |       24801 | ONLINE        |
+---------------------------+--------------------------------------+-------------+-------------+---------------+

The information in this table confirms that there is a member in the group with the unique identifier ce9be252-2b71-11e6-b8f4-00212844f856, that it is ONLINE and is at myhost listening for client connections on port 24801.

For the purpose of demonstrating that the server is indeed in a group and that it is able to handle load, create a table and add some content to it.

mysql> CREATE DATABASE test;
mysql> USE test;
mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL);
mysql> INSERT INTO t1 VALUES (1, 'Luis');

Check the content of table t1 and the binary log.

mysql> SELECT * FROM t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 | Luis |
+----+------+

mysql> SHOW BINLOG EVENTS;
+---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+
| Log_name      | Pos | Event_type     | Server_id | End_log_pos | Info                                                               |
+---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+
| binlog.000001 |   4 | Format_desc    |         1 |         123 | Server ver: 5.7.19-gr080-log, Binlog ver: 4                        |
| binlog.000001 | 123 | Previous_gtids |         1 |         150 |                                                                    |
| binlog.000001 | 150 | Gtid           |         1 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1'  |
| binlog.000001 | 211 | Query          |         1 |         270 | BEGIN                                                              |
| binlog.000001 | 270 | View_change    |         1 |         369 | view_id=14724817264259180:1                                        |
| binlog.000001 | 369 | Query          |         1 |         434 | COMMIT                                                             |
| binlog.000001 | 434 | Gtid           |         1 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2'  |
| binlog.000001 | 495 | Query          |         1 |         585 | CREATE DATABASE test                                               |
| binlog.000001 | 585 | Gtid           |         1 |         646 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:3'  |
| binlog.000001 | 646 | Query          |         1 |         770 | use `test`; CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL) |
| binlog.000001 | 770 | Gtid           |         1 |         831 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4'  |
| binlog.000001 | 831 | Query          |         1 |         899 | BEGIN                                                              |
| binlog.000001 | 899 | Table_map      |         1 |         942 | table_id: 108 (test.t1)                                            |
| binlog.000001 | 942 | Write_rows     |         1 |         984 | table_id: 108 flags: STMT_END_F                                    |
| binlog.000001 | 984 | Xid            |         1 |        1011 | COMMIT /* xid=38 */                                                |
+---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+

As seen above, the database and the table objects were created and their corresponding DDL statements were written to the binary log. Also, the data was inserted into the table and written to the binary log. The importance of the binary log entries is illustrated in the following section when the group grows and distributed recovery is executed as new members try to catch up and become online.

17.2.1.5 Adding Instances to the Group

At this point, the group has one member in it, server s1, which has some data in it. It is now time to expand the group by adding the other two servers configured previously.

17.2.1.5.1 Adding a Second Instance

In order to add a second instance, server s2, first create the configuration file for it. The configuration is similar to the one used for server s1, except for things such as the location of the data directory, the ports that s2 is going to be listening on or its server_id. These different lines are highlighted in the listing below.

[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s2
basedir=<full_path_to_bin>/mysql-5.7/

port=24802
socket=<full_path_to_sock_dir>/s2.sock

#
# Replication configuration parameters
#
server_id=2
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24902"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off

Similar to the procedure for server s1, with the option file in place you launch the server.

mysql-5.7/bin/mysqld --defaults-file=data/s2/s2.cnf

Then configure the recovery credentials as follows. The commands are the same as used when setting up server s1 as the user is shared within the group. Issue the following statements on s2.

SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password' \\
	FOR CHANNEL 'group_replication_recovery';

Install the Group Replication plugin and start the process of joining the server to the group. The following example installs the plugin in the same way as used while deploying server s1.

mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';

Add server s2 to the group.

mysql> START GROUP_REPLICATION;

Unlike the previous steps that were the same as those executed on s1, here there is a difference in that you do not issue SET GLOBAL group_replication_bootstrap_group=ON; before starting Group Replication, because the group has already been created and bootstrapped by server s1. At this point server s2 only needs to be added to the already existing group.

Tip

When Group Replication starts successfully and the server joins the group it checks the super_read_only variable. By setting super_read_only to ON in the member's configuration file, you can ensure that servers which fail when starting Group Replication for any reason do not accept transactions. If the server should join the group as read-write instance, for example as the primary in a single-primary group or as a member of a multi-primary group, when the super_read_only variable is set to ON then it is set to OFF upon joining the group.

Checking the performance_schema.replication_group_members table again shows that there are now two ONLINE servers in the group.

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE  |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| group_replication_applier | 395409e1-6dfa-11e6-970b-00212844f856 | myhost      |       24801 | ONLINE        |
| group_replication_applier | ac39f1e6-6dfa-11e6-a69d-00212844f856 | myhost      |       24802 | ONLINE        |
+---------------------------+--------------------------------------+-------------+-------------+---------------+

As server s2 is also marked as ONLINE, it must have already caught up with server s1 automatically. Verify that it has indeed synchronized with server s1 as follows.

mysql> SHOW DATABASES LIKE 'test';
+-----------------+
| Database (test) |
+-----------------+
| test            |
+-----------------+

mysql> SELECT * FROM test.t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 | Luis |
+----+------+

mysql> SHOW BINLOG EVENTS;
+---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+
| Log_name      | Pos  | Event_type     | Server_id | End_log_pos | Info                                                               |
+---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+
| binlog.000001 |    4 | Format_desc    |         2 |         123 | Server ver: 5.7.17-log, Binlog ver: 4                              |
| binlog.000001 |  123 | Previous_gtids |         2 |         150 |                                                                    |
| binlog.000001 |  150 | Gtid           |         1 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1'  |
| binlog.000001 |  211 | Query          |         1 |         270 | BEGIN                                                              |
| binlog.000001 |  270 | View_change    |         1 |         369 | view_id=14724832985483517:1                                        |
| binlog.000001 |  369 | Query          |         1 |         434 | COMMIT                                                             |
| binlog.000001 |  434 | Gtid           |         1 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2'  |
| binlog.000001 |  495 | Query          |         1 |         585 | CREATE DATABASE test                                               |
| binlog.000001 |  585 | Gtid           |         1 |         646 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:3'  |
| binlog.000001 |  646 | Query          |         1 |         770 | use `test`; CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL) |
| binlog.000001 |  770 | Gtid           |         1 |         831 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4'  |
| binlog.000001 |  831 | Query          |         1 |         890 | BEGIN                                                              |
| binlog.000001 |  890 | Table_map      |         1 |         933 | table_id: 108 (test.t1)                                            |
| binlog.000001 |  933 | Write_rows     |         1 |         975 | table_id: 108 flags: STMT_END_F                                    |
| binlog.000001 |  975 | Xid            |         1 |        1002 | COMMIT /* xid=30 */                                                |
| binlog.000001 | 1002 | Gtid           |         1 |        1063 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:5'  |
| binlog.000001 | 1063 | Query          |         1 |        1122 | BEGIN                                                              |
| binlog.000001 | 1122 | View_change    |         1 |        1261 | view_id=14724832985483517:2                                        |
| binlog.000001 | 1261 | Query          |         1 |        1326 | COMMIT                                                             |
+---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+

As seen above, the second server has been added to the group and it has replicated the changes from server s1 automatically. According to the distributed recovery procedure, this means that just after joining the group and immediately before being declared online, server s2 has connected to server s1 automatically and fetched the missing data from it. In other words, it copied transactions from the binary log of s1 that it was missing, up to the point in time that it joined the group.

17.2.1.5.2 Adding Additional Instances

Adding additional instances to the group is essentially the same sequence of steps as adding the second server, except that the configuration has to be changed as it had to be for server s2. To summarise the required commands:

1. Create the configuration file

[mysqld]

# server configuration
datadir=<full_path_to_data>/data/s3
basedir=<full_path_to_bin>/mysql-5.7/

port=24803
socket=<full_path_to_sock_dir>/s3.sock

#
# Replication configuration parameters
#
server_id=3
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24903"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off

2. Start the server

mysql-5.7/bin/mysqld --defaults-file=data/s3/s3.cnf

3. Configure the recovery credentials for the group_replication_recovery channel.

SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password'  \\
FOR CHANNEL 'group_replication_recovery';

4. Install the Group Replication plugin and start it.

INSTALL PLUGIN group_replication SONAME 'group_replication.so';
START GROUP_REPLICATION;

At this point server s3 is booted and running, has joined the group and caught up with the other servers in the group. Consulting the performance_schema.replication_group_members table again confirms this is the case.

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE  |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| group_replication_applier | 395409e1-6dfa-11e6-970b-00212844f856 | myhost      |       24801 | ONLINE        |
| group_replication_applier | 7eb217ff-6df3-11e6-966c-00212844f856 | myhost      |       24803 | ONLINE        |
| group_replication_applier | ac39f1e6-6dfa-11e6-a69d-00212844f856 | myhost      |       24802 | ONLINE        |
+---------------------------+--------------------------------------+-------------+-------------+---------------+

Issuing this same query on server s2 or server s1 yields the same result. Also, you can verify that server s3 has also caught up:

mysql> SHOW DATABASES LIKE 'test';
+-----------------+
| Database (test) |
+-----------------+
| test            |
+-----------------+

mysql> SELECT * FROM test.t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 | Luis |
+----+------+

mysql> SHOW BINLOG EVENTS;
+---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+
| Log_name      | Pos  | Event_type     | Server_id | End_log_pos | Info                                                               |
+---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+
| binlog.000001 |    4 | Format_desc    |         3 |         123 | Server ver: 5.7.17-log, Binlog ver: 4                              |
| binlog.000001 |  123 | Previous_gtids |         3 |         150 |                                                                    |
| binlog.000001 |  150 | Gtid           |         1 |         211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1'  |
| binlog.000001 |  211 | Query          |         1 |         270 | BEGIN                                                              |
| binlog.000001 |  270 | View_change    |         1 |         369 | view_id=14724832985483517:1                                        |
| binlog.000001 |  369 | Query          |         1 |         434 | COMMIT                                                             |
| binlog.000001 |  434 | Gtid           |         1 |         495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2'  |
| binlog.000001 |  495 | Query          |         1 |         585 | CREATE DATABASE test                                               |
| binlog.000001 |  585 | Gtid           |         1 |         646 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:3'  |
| binlog.000001 |  646 | Query          |         1 |         770 | use `test`; CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL) |
| binlog.000001 |  770 | Gtid           |         1 |         831 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4'  |
| binlog.000001 |  831 | Query          |         1 |         890 | BEGIN                                                              |
| binlog.000001 |  890 | Table_map      |         1 |         933 | table_id: 108 (test.t1)                                            |
| binlog.000001 |  933 | Write_rows     |         1 |         975 | table_id: 108 flags: STMT_END_F                                    |
| binlog.000001 |  975 | Xid            |         1 |        1002 | COMMIT /* xid=29 */                                                |
| binlog.000001 | 1002 | Gtid           |         1 |        1063 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:5'  |
| binlog.000001 | 1063 | Query          |         1 |        1122 | BEGIN                                                              |
| binlog.000001 | 1122 | View_change    |         1 |        1261 | view_id=14724832985483517:2                                        |
| binlog.000001 | 1261 | Query          |         1 |        1326 | COMMIT                                                             |
| binlog.000001 | 1326 | Gtid           |         1 |        1387 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:6'  |
| binlog.000001 | 1387 | Query          |         1 |        1446 | BEGIN                                                              |
| binlog.000001 | 1446 | View_change    |         1 |        1585 | view_id=14724832985483517:3                                        |
| binlog.000001 | 1585 | Query          |         1 |        1650 | COMMIT                                                             |
+---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+

17.3 Monitoring Group Replication

Use the Perfomance Schema tables to monitor Group Replication, assuming that the Performance Schema is enabled. Group Replication adds the following tables:

  • performance_schema.replication_group_member_stats

  • performance_schema.replication_group_members

These Perfomance Schema replication tables also show information about Group Replication:

  • performance_schema.replication_connection_status

  • performance_schema.replication_applier_status

The replication channels created by the Group Replication plugin are named:

  • group_replication_recovery - This channel is used for the replication changes that are related to the distributed recovery phase.

  • group_replication_applier - This channel is used for the incoming changes from the group. This is the channel used to apply transactions coming directly from the group.

The following sections describe the information available in each of these tables.

17.3.1 Replication_group_member_stats

Each member in a replication group certifies and applies transactions received by the group. Statistics regarding the certifier and applier procedures are useful to understand how the applier queue is growing, how many conflicts have been found, how many transactions were checked, which transactions are committed everywhere, and so on.

The performance_schema.replication_group_member_stats table provides information related to the certification process. The information is shared between all the server instances that are members of the replication group, so information on all the group members can be queried from any member.

Table 17.1 replication_group_member_stats

Field

Description

Channel_name

The name of the Group Replication channel.

Member_id

The member server UUID. This has a different value for each member in the group. This also serves as a key since it is unique to each member.

Count_transactions_in_queue

The number of transactions in the queue pending conflict detection checks. Once the transactions have been checked for conflicts, if they pass the check, they are queued to be applied as well.

Count_transactions_checked

The number of transactions that have been checked for conflicts.

Count_conflicts_detected

The number of transactions that did not pass the conflict detection check.

Count_transactions_rows_validating

The current size of the conflict detection database (against which each transaction is certified).

Transactions_committed_all_members

The transactions that have been successfully committed on all members of the replication group. This is updated at a fixed time interval.

Last_conflict_free_transaction

The transaction identifier of the last conflict free transaction checked.


These fields are important for monitoring the performance of the members connected in the group. For example, suppose that one of the group’s members is delayed and is not able to keep up to date with the other members of the group. In this case you might see a large number of transactions in the queue. Based on this information, you could decide to either remove the member from the group, or delay the processing of transactions on the other members of the group in order to reduce the number of queued transactions. This information can also help you to decide how to adjust the flow control of the Group Replication plugin.

17.3.2 Replication_group_members

This table is used for monitoring the status of the different server instances that are tracked in the current view, or in other words are part of the group and as such are tracked by the membership service.

Table 17.2 replication_group_members

Field

Description

Channel_name

The name of the Group Replication channel.

Member_id

The member server UUID.

Member_host

The network address of the group member.

Member_port

The MySQL connection port on which the group member is listening.

Member_state

The current state of the group member (which can be ONLINE, ERROR, RECOVERING, OFFLINE or UNREACHABLE).


For more information about the Member_host value and its impact on the distributed recovery process, see Section 17.2.1.3, “User Credentials”.

17.3.3 Replication_connection_status

When connected to a group, some fields in this table show information regarding Group Replication. For instance, the transactions that have been received from the group and queued in the applier queue (the relay log).

Table 17.3 replication_connection_status

Field

Description

Channel_name

The name of the Group Replication channel.

Group_name

Shows the value of the name of the group. It is always a valid UUID.

Source_UUID

Shows the identifier for the group. It is similar to the group name and it is used as the UUID for all the transactions that are generated during group replication.

Service_state

Shows whether the member is a part of the group or not. The possible values of service state can be {ON, OFF and CONNECTING};

Received_transaction_set

Transactions in this GTID set have been received by this member of the group.


17.3.4 Replication_applier_status

The state of the Group Replication related channels and threads can be observed using the regular replication_applier_status table as well. If there are many different worker threads applying transactions, then the worker tables can also be used to monitor what each worker thread is doing.

Table 17.4 replication_applier_status

Field

Description

Channel_name

The name of the Group Replication channel.

Service_state

Shows whether the applier service is ON or OFF.

Remaining_delay

Shows whether there is some applier delay configured.

Count_transactions_retries

The number of retries performed while applying a transaction.

Received_transaction_set

Transactions in this GTID set have been received by this member of the group.


17.3.5 Group Replication Server States

The table replication_group_members is updated whenever there is a view change, for example when the configuration of the group is dynamically changed. At that point, servers exchange some of their metadata to synchronize themselves and continue to cooperate together.

There are various states that a server instance can be in. If servers are communicating properly, all report the same states for all servers. However, if there is a network partition, or a server leaves the group, then different information may be reported, depending on which server is queried. Note that if the server has left the group then obviously it cannot report updated information about other servers' state. If there is a partition, such that quorum is lost, servers are not able to coordinate between themselves. As a consequence, they cannot guess what the status of different servers is. Therefore, instead of guessing their state they report that some servers are unreachable.

Table 17.5 Server State

Field

Description

Group Synchronized

ONLINE

The member is ready to serve as a fully functional group member, meaning that the client can connect and start executing transactions.

Yes

RECOVERING

The member is in the process of becoming an active member of the group and is currently going through the recovery process, receiving state information from a donor.

No

OFFLINE

The plugin is loaded but the member does not belong to any group.

No

ERROR

The state of the member. Whenever there is an error on the recovery phase or while applying changes, the server enters this state.

No

UNREACHABLE

Whenever the local failure detector suspects that a given server is not reachable, because maybe it has crashed or was disconnected involuntarily, it shows that server's state as 'unreachable'.

No


Important

Once an instance enters ERROR state, the super_read_only option is set to ON. To leave the ERROR state you must manually configure the instance with super_read_only=OFF.

Note that Group Replication is not synchronous, but eventually synchronous. More precisely, transactions are delivered to all group members in the same order, but their execution is not synchronized, meaning that after a transaction is accepted to be committed, each member commits at its own pace.

17.4 Group Replication Operations

This section describes the different modes of deploying Group Replication, explains common operations for managing groups and provides information about how to tune your groups. .

17.4.1 Deploying in Multi-Primary or Single-Primary Mode

Group Replication operates in the following different modes:

  • single-primary mode

  • multi-primary mode

The default mode is single-primary. It is not possible to have members of the group deployed in different modes, for example one configured in multi-primary mode while another one is in single-primary mode. To switch between modes, the group and not the server, needs to be restarted with a different operating configuration. Regardless of the deployed mode, Group Replication does not handle client-side fail-over, that must be handled by the application itself, a connector or a middleware framework such as a proxy or router.

When deployed in multi-primary mode, statements are checked to ensure they are compatible with the mode. The following checks are made when Group Replication is deployed in multi-primary mode:

  • If a transaction is executed under the SERIALIZABLE isolation level, then its commit fails when synchronizing itself with the group.

  • If a transaction executes against a table that has foreign keys with cascading constraints, then the transaction fails to commit when synchronizing itself with the group.

These checks can be deactivated by setting the option group_replication_enforce_update_everywhere_checks to FALSE. When deploying in single-primary mode, this option must be set to FALSE.

17.4.1.1 Single-Primary Mode

In this mode the group has a single-primary server that is set to read-write mode. All the other members in the group are set to read-only mode (with super-read-only=ON ). This happens automatically. The primary is typically the first server to bootstrap the group, all other servers that join automatically learn about the primary server and are set to read only.

Figure 17.5 New Primary Election

Five server instances, S1, S2, S3, S4, and S5, are deployed as an interconnected group. Server S1 is the primary. Write clients are communicating with server S1, and a read client is communicating with server S4. Server S1 then fails, breaking communication with the write clients. Server S2 then takes over as the new primary, and the write clients now communicate with server S2.

When in single-primary mode, some of the checks deployed in multi-primary mode are disabled, because the system enforces that only a single server writes to the group. For example, changes to tables that have cascading foreign keys are allowed, whereas in multi-primary mode they are not. Upon primary member failure, an automatic primary election mechanism chooses the next primary member. The method of choosing the next primary depends on the server's server_uuid variable and the group_replication_member_weight variable.

In the event that the primary member leaves a single-primary group, then an election is performed and a new primary is chosen from the remaining servers in the group. This election is performed by looking at the new view, and ordering the potential new primaries based on the value of group_replication_member_weight. Assuming the group is operating with all members running the same MySQL version, then the member with the highest value for group_replication_member_weight is elected as the new primary. In the event that multiple servers have the same group_replication_member_weight, the servers are then prioritized based on their server_uuid in lexicographical order and by picking the first one. Once a new primary is elected, it is automatically set to read-write and the other secondaries remain as secondaries, and as such, read-only.

If the group is operating with members that are running different versions of MySQL then the election process can be impacted. For example, if any member does not support group_replication_member_weight, then the primary is chosen based on server_uuid order from the members of the lower major version. Alternatively, if all members running different MySQL versions do support group_replication_member_weight, the primary is chosen based on group_replication_member_weight from the members of the lower major version.

It is a good practice to wait for the new primary to apply its replication related relay-log before re-routing the client applications to it.

17.4.1.2 Multi-Primary Mode

In multi-primary mode, there is no notion of a single primary. There is no need to engage an election procedure since there is no server playing any special role.

Figure 17.6 Client Failover

Five server instances, S1, S2, S3, S4, and S5, are deployed as an interconnected group. All of the servers are primaries. Write clients are communicating with servers S1 and S2, and a read client is communicating with server S4. Server S1 then fails, breaking communication with its write client. This client reconnects to server S3.

All servers are set to read-write mode when joining the group.

17.4.1.3 Finding the Primary

The following example shows how to find out which server is currently the primary when deployed in single-primary mode.

mysql> SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member';

17.4.2 Tuning Recovery

Whenever a new member joins a replication group, it connects to a suitable donor and fetches the data that it has missed up until the point it is declared online. This critical component in Group Replication is fault tolerant and configurable. The following section explains how recovery works and how to tune the settings

Donor Selection

A random donor is selected from the existing online members in the group. This way there is a good chance that the same server is not selected more than once when multiple members enter the group.

If the connection to the selected donor fails, a new connection is automatically attempted to a new candidate donor. Once the connection retry limit is reached the recovery procedure terminates with an error.

Note

A donor is picked randomly from the list of online members in the current view.

Enhanced Automatic Donor Switchover

The other main point of concern in recovery as a whole is to make sure that it copes with failures. Hence, Group Replication provides robust error detection mechanisms. In earlier versions of Group Replication, when reaching out to a donor, recovery could only detect connection errors due to authentication issues or some other problem. The reaction to such problematic scenarios was to switch over to a new donor thus a new connection attempt was made to a different member.

This behavior was extended to also cover other failure scenarios:

  • Purged data scenarios - If the selected donor contains some purged data that is needed for the recovery process then an error occurs. Recovery detects this error and a new donor is selected.

  • Duplicated data - If a server joining the group already contains some data that conflicts with the data coming from the selected donor during recovery then an error occurs. This could be caused by some errant transactions present in the server joining the group.

    One could argue that recovery should fail instead of switching over to another donor, but in heterogeneous groups there is chance that other members share the conflicting transactions and others do not. For that reason, upon error, recovery selects another donor from the group.

  • Other errors - If any of the recovery threads fail (receiver or applier threads fail) then an error occurs and recovery switches over to a new donor.

Note

In case of some persistent failures or even transient failures recovery automatically retries connecting to the same or a new donor.

Donor Connection Retries

The recovery data transfer relies on the binary log and existing MySQL replication framework, therefore it is possible that some transient errors could cause errors in the receiver or applier threads. In such cases, the donor switch over process has retry functionality, similar to that found in regular replication.

Number of Attempts

The number of attempts a server joining the group makes when trying to connect to a donor from the pool of donors is 10. This is configured through the group_replication_recovery_retry_count plugin variable . The following command sets the maximum number of attempts to connect to a donor to 10.

mysql> SET GLOBAL group_replication_recovery_retry_count= 10;

Note that this accounts for the global number of attempts that the server joining the group makes connecting to each one of the suitable donors.

Sleep Routines

The group_replication_recovery_reconnect_interval plugin variable defines how much time the recovery process should sleep between donor connection attempts. This variable has its default set to 60 seconds and you can change this value dynamically. The following command sets the recovery donor connection retry interval to 120 seconds.

mysql> SET GLOBAL group_replication_recovery_reconnect_interval= 120;

Note, however, that recovery does not sleep after every donor connection attempt. As the server joining the group is connecting to different servers and not to the same one over and over again, it can assume that the problem that affects server A does not affect server B. As such, recovery suspends only when it has gone through all the possible donors. Once the server joining the group has tried to connect to all the suitable donors in the group and none remains, the recovery process sleeps for the number of seconds configured by the group_replication_recovery_reconnect_interval variable.

17.4.3 Network Partitioning

The group needs to achieve consensus whenever a change that needs to be replicated happens. This is the case for regular transactions but is also required for group membership changes and some internal messaging that keeps the group consistent. Consensus requires a majority of group members to agree on a given decision. When a majority of group members is lost, the group is unable to progress and blocks because it cannot secure majority or quorum.

Quorum may be lost when there are multiple involuntary failures, causing a majority of servers to be removed abruptly from the group. For example in a group of 5 servers, if 3 of them become silent at once, the majority is compromised and thus no quorum can be achieved. In fact, the remaining two are not able to tell if the other 3 servers have crashed or whether a network partition has isolated these 2 alone and therefore the group cannot be reconfigured automatically.

On the other hand, if servers exit the group voluntarily, they instruct the group that it should reconfigure itself. In practice, this means that a server that is leaving tells others that it is going away. This means that other members can reconfigure the group properly, the consistency of the membership is maintained and the majority is recalculated. For example, in the above scenario of 5 servers where 3 leave at once, if the 3 leaving servers warn the group that they are leaving, one by one, then the membership is able to adjust itself from 5 to 2, and at the same time, securing quorum while that happens.

Note

Loss of quorum is by itself a side-effect of bad planning. Plan the group size for the number of expected failures (regardless whether they are consecutive, happen all at once or are sporadic).

The following sections explain what to do if the system partitions in such a way that no quorum is automatically achieved by the servers in the group.

Tip

A primary that has been excluded from a group after a majority loss followed by a reconfiguration can contain extra transactions that are not included in the new group. If this happens, the attempt to add back the excluded member from the group results in an error with the message This member has more executed transactions than those present in the group.

Detecting Partitions

The replication_group_members performance schema table presents the status of each server in the current view from the perspective of this server. The majority of the time the system does not run into partitioning, and therefore the table shows information that is consistent across all servers in the group. In other words, the status of each server on this table is agreed by all in the current view. However, if there is network partitioning, and quorum is lost, then the table shows the status UNREACHABLE for those servers that it cannot contact. This information is exported by the local failure detector built into Group Replication.

Figure 17.7 Losing Quorum

Five server instances, S1, S2, S3, S4, and S5, are deployed as an interconnected group, which is a stable group. When three of the servers, S3, S4, and S5, fail, the majority is lost and the group can no longer proceed without intervention.

To understand this type of network partition the following section describes a scenario where there are initially 5 servers working together correctly, and the changes that then happen to the group once only 2 servers are online. The scenario is depicted in the figure.

As such, lets assume that there is a group with these 5 servers in it:

  • Server s1 with member identifier 199b2df7-4aaf-11e6-bb16-28b2bd168d07

  • Server s2 with member identifier 199bb88e-4aaf-11e6-babe-28b2bd168d07

  • Server s3 with member identifier 1999b9fb-4aaf-11e6-bb54-28b2bd168d07

  • Server s4 with member identifier 19ab72fc-4aaf-11e6-bb51-28b2bd168d07

  • Server s5 with member identifier 19b33846-4aaf-11e6-ba81-28b2bd168d07

Initially the group is running fine and the servers are happily communicating with each other. You can verify this by logging into s1 and looking at its replication_group_members performance schema table. For example:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 1999b9fb-4aaf-11e6-bb54-28b2bd168d07 | 127.0.0.1   |       13002 | ONLINE       |
| group_replication_applier | 199b2df7-4aaf-11e6-bb16-28b2bd168d07 | 127.0.0.1   |       13001 | ONLINE       |
| group_replication_applier | 199bb88e-4aaf-11e6-babe-28b2bd168d07 | 127.0.0.1   |       13000 | ONLINE       |
| group_replication_applier | 19ab72fc-4aaf-11e6-bb51-28b2bd168d07 | 127.0.0.1   |       13003 | ONLINE       |
| group_replication_applier | 19b33846-4aaf-11e6-ba81-28b2bd168d07 | 127.0.0.1   |       13004 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

However, moments later there is a catastrophic failure and servers s3, s4 and s5 stop unexpectedly. A few seconds after this, looking again at the replication_group_members table on s1 shows that it is still online, but several others members are not. In fact, as seen below they are marked as UNREACHABLE. Moreover, the system could not reconfigure itself to change the membership, because the majority has been lost.

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 1999b9fb-4aaf-11e6-bb54-28b2bd168d07 | 127.0.0.1   |       13002 | UNREACHABLE  |
| group_replication_applier | 199b2df7-4aaf-11e6-bb16-28b2bd168d07 | 127.0.0.1   |       13001 | ONLINE       |
| group_replication_applier | 199bb88e-4aaf-11e6-babe-28b2bd168d07 | 127.0.0.1   |       13000 | ONLINE       |
| group_replication_applier | 19ab72fc-4aaf-11e6-bb51-28b2bd168d07 | 127.0.0.1   |       13003 | UNREACHABLE  |
| group_replication_applier | 19b33846-4aaf-11e6-ba81-28b2bd168d07 | 127.0.0.1   |       13004 | UNREACHABLE  |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

The table shows that s1 is now in a group that has no means of progressing without external intervention, because a majority of the servers are unreachable. In this particular case, the group membership list needs to be reset to allow the system to proceed, which is explained in this section. Alternatively, you could also choose to stop Group Replication on s1 and s2 (or stop completely s1 and s2), figure out what happened with s3, s4 and s5 and then restart Group Replication (or the servers).

Unblocking a Partition

Group replication enables you to reset the group membership list by forcing a specific configuration. For instance in the case above, where s1 and s2 are the only servers online, you could chose to force a membership configuration consisting of only s1 and s2. This requires checking some information about s1 and s2 and then using the group_replication_force_members variable.

Figure 17.8 Forcing a New Membership

Three of the servers in a group, S3, S4, and S5, have failed, so the majority is lost and the group can no longer proceed without intervention. With the intervention described in the following text, S1 and S2 are able to form a stable group by themselves.

Suppose that you are back in the situation where s1 and s2 are the only servers left in the group. Servers s3, s4 and s5 have left the group unexpectedly. To make servers s1 and s2 continue, you want to force a membership configuration that contains only s1 and s2.

Warning

This procedure uses group_replication_force_members and should be considered a last resort remedy. It must be used with extreme care and only for overriding loss of quorum. If misused, it could create an artificial split-brain scenario or block the entire system altogether.

Recall that the system is blocked and the current configuration is the following (as perceived by the local failure detector on s1):

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 1999b9fb-4aaf-11e6-bb54-28b2bd168d07 | 127.0.0.1   |       13002 | UNREACHABLE  |
| group_replication_applier | 199b2df7-4aaf-11e6-bb16-28b2bd168d07 | 127.0.0.1   |       13001 | ONLINE       |
| group_replication_applier | 199bb88e-4aaf-11e6-babe-28b2bd168d07 | 127.0.0.1   |       13000 | ONLINE       |
| group_replication_applier | 19ab72fc-4aaf-11e6-bb51-28b2bd168d07 | 127.0.0.1   |       13003 | UNREACHABLE  |
| group_replication_applier | 19b33846-4aaf-11e6-ba81-28b2bd168d07 | 127.0.0.1   |       13004 | UNREACHABLE  |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

The first thing to do is to check what is the peer address (group communication identifier) for s1 and s2. Log in to s1 and s2 and get that information as follows.

mysql> SELECT @@group_replication_local_address;
+-----------------------------------+
| @@group_replication_local_address |
+-----------------------------------+
| 127.0.0.1:10000                   |
+-----------------------------------+

Then log in to s2 and do the same thing:

mysql> SELECT @@group_replication_local_address;
+-----------------------------------+
| @@group_replication_local_address |
+-----------------------------------+
| 127.0.0.1:10001                   |
+-----------------------------------+

Once you know the group communication addresses of s1 (127.0.0.1:10000) and s2 (127.0.0.1:10001), you can use that on one of the two servers to inject a new membership configuration, thus overriding the existing one that has lost quorum. To do that on s1:

mysql> SET GLOBAL group_replication_force_members="127.0.0.1:10000,127.0.0.1:10001";

This unblocks the group by forcing a different configuration. Check replication_group_members on both s1 and s2 to verify the group membership after this change. First on s1.

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | b5ffe505-4ab6-11e6-b04b-28b2bd168d07 | 127.0.0.1   |       13000 | ONLINE       |
| group_replication_applier | b60907e7-4ab6-11e6-afb7-28b2bd168d07 | 127.0.0.1   |       13001 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

And then on s2.

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | b5ffe505-4ab6-11e6-b04b-28b2bd168d07 | 127.0.0.1   |       13000 | ONLINE       |
| group_replication_applier | b60907e7-4ab6-11e6-afb7-28b2bd168d07 | 127.0.0.1   |       13001 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+

When forcing a new membership configuration, make sure that any servers are going to be forced out of the group are indeed stopped. In the scenario depicted above, if s3, s4 and s5 are not really unreachable but instead are online, they may have formed their own functional partition (they are 3 out of 5, hence they have the majority). In that case, forcing a group membership list with s1 and s2 could create an artificial split-brain situation. Therefore it is important before forcing a new membership configuration to ensure that the servers to be excluded are indeed shutdown and if they are not, shut them down before proceeding.

17.4.4 Using MySQL Enterprise Backup with Group Replication

This section explains how to back up and subsequently restore a Group Replication member using MySQL Enterprise Backup; the same technique can be used to quickly add a new member to a group. Generally, backing up a Group Replication member is no different to backing up a stand-alone MySQL instance. The recommended process is to use MySQL Enterprise Backup image backups and a subsequent restore, for more information see Backup Operations.

The required steps can be summarized as:

  • Use MySQL Enterprise Backup to create a backup of the source server instance with simple timestamps.

  • Copy the backup to the destination server instance.

  • Use MySQL Enterprise Backup to restore the backup to the destination server instance.

The following procedure demonstrates this process. Consider the following group:

mysql> SELECT * member_host, member_port, member_state FROM performance_schema.replication_group_members;
+-------------+-------------+--------------+
| member_host | member_port | member_state |
+-------------+-------------+--------------+
| node1       |        3306 | ONLINE       |
| node2       |        3306 | ONLINE       |
| node3       |        3306 | ONLINE       |
+-------------+-------------+--------------+

In this example the group is operating in single-primary mode and the primary group member is node1. This means that node2 and node3 are secondaries, operating in read-only mode (super_read_only=ON). Using MySQL Enterprise Backup, a recent backup has been taken of node2 by issuing:

mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_`date +%d%m_%H%M` / 
		      --backup-dir=/backups/backup_'date +%d%m_%H%M' --user=root -pmYsecr3t /
		      --host=127.0.0.1 --no-history-logging backup-to-image

The --no-history-logging option is used because node2 is a secondary, and because it is read-only MySQL Enterprise Backup cannot write status and metadata tables to the instance.

Assume that the primary member, node1, encounters irreconcilable corruption. After a period of time the server instance is rebuilt but all the data on the member was lost. The most recent backup of member node2 can be used to rebuild node1. This requires copying the backup files from node2 to node1 and then using MySQL Enterprise Backup to restore the backup to node1. The exact way you copy the backup files depends on the operating system and tools available to you. In this example we assume Linux servers and use SCP to copy the files between servers:

node2/backups # scp my.mbi_2206_1429 node1:/backups

Connect to the destination server, in this case node1, and restore the backup using MySQL Enterprise Backup by issuing:

mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/`ls -1rt /backups|tail /
		      -n 1` --backup-dir=/tmp/restore_`date +%d%m_%H%M` copy-back-and-apply-log

The backup is restored to the destination server.

If your group is using multi-primary mode, extra care must be taken to prevent writes to the database during the MySQL Enterprise Backup restore stage and the Group Replication recovery phase. Depending on how the group is accessed by clients, there is a possibility of DML being executed on the newly joined instance the moment it is accessible on the network, even prior to applying the binary log before rejoining the group. To avoid this, configure the member's option file with:

group_replication_start_on_boot=OFF
super_read_only=ON
event_scheduler=OFF

This ensures that Group Replication is not started on boot, that the member defaults to read-only and that the event_scheduler is turned off while the member catches up with the group during the recovery phase. Adequate error handling must be configured on the clients to recognise that they are, temporarily, prevented from performing DML during this period.

Start the server instance and connect an SQL client. The restored backup has old binary log files and related metadata that are specific to the instance that was backed up. To reset all of that issue:

mysql> RESET MASTER;

The restored backup has the relay log files associated with the source instance, in this case node2. Therefore reset the logs, metadata, and configuration for all replication channels by issuing:

mysql> RESET SLAVE ALL;

For the restored instance to be able to be able to recover automatically using Group Replication's built-in distributed recovery (see Section 17.9.5, “Distributed Recovery”), configure the gtid_executed variable. The MySQL Enterprise Backup backup from node2 includes the backup_gtid_executed.sql file, usually at the path datadir/meta/, which contains the information required to configure node1. Disable binary logging and then use this file to configure the gtid_executed variable by issuing:

mysql> SET SQL_LOG_BIN=OFF;
mysql> SOURCE datadir/meta/backup_gtid_executed.sql
mysql> SET SQL_LOG_BIN=ON;

Configure the Section 17.2.1.3, “User Credentials” and start Group Replication, for example:

mysql> CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password' / 
		FOR CHANNEL 'group_replication_recovery';
mysql> START GROUP_REPLICATION; 

The instance attempts to join the group, executing the restored binary logs from the correct location. Once the instance gains synchrony with the group, it joins as a secondary, with super_read_only=ON. Reset the temporary configuration changes made during the restore. Turn the event scheduler back on in the running process:

mysql> SET global event_scheduler=ON;

Edit the following system variables in the instance's option file:

group_replication_start_on_boot=ON
super_read_only=ON
event_scheduler=ON

17.5 Group Replication Security

This section explains how to secure a group, securing the connections between members of a group, or by establishing a security perimeter using address whitelisting.

17.5.1 IP Address Whitelisting

The Group Replication plugin has a configuration option to determine from which hosts an incoming Group Communication connection can be accepted. This option is called group_replication_ip_whitelist. If you set this option on a server s1, then when server s2 is establishing a connection to s1 for the purpose of engaging group communication, then s1 first checks the whitelist before accepting the connection from s2. If s2 is in the whitelist, then s1 accepts the connection, otherwise s1 rejects the connection attempt by s2.

If you do not specify a whitelist explicitly, the server automatically sets the whitelist to the private networks that the server has an interface on. This means that a server, even if it has interfaces on public IPs, does not by default allow connections from external hosts.

Whenever the IP whitelist is set to AUTOMATIC, an entry is added in the error log, similar to:

2017-10-07T06:40:49.320686Z 4 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 192.0.2.21/24,198.51.100.44,203.0.113.0/24,127.0.0.1/32 to the whitelist'

You can improve the security of the group further by manually setting the list of hosts that are permitted to use group communication connections. You can specify host names (from MySQL 5.7.21), simple IP addresses, or CIDR notation, in any combination. A comma must separate each entry. For example:

mysql> STOP GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_ip_whitelist="192.0.2.21/24,198.51.100.44,203.0.113.0/24,example.org,www.example.com/24";
mysql> START GROUP_REPLICATION;

The localhost IP address (127.0.0.1) is always added to the whitelist. If not explicitly, it is implicitly and automatically added. IPv6 addresses, and host names that resolve to IPv6 addresses, are not supported.

For host names, name resolution takes place only when a connection request is made by another server. A host name that cannot be resolved is not considered for whitelist validation, and a warning message is written to the error log. Forward-confirmed reverse DNS (FCrDNS) verification is carried out for resolved host names.

Warning

Host names are inherently less secure than IP addresses in a whitelist. FCrDNS verification provides a good level of protection, but can be compromised by certain types of attack. Specify host names in your whitelist only when strictly necessary, and ensure that all components used for name resolution, such as DNS servers, are maintained under your control. You can also implement name resolution locally using the hosts file, to avoid the use of external components.

To join a replication group, a server only needs to be whitelisted on the seed member to which it makes the request to join the group. Typically, this would be the bootstrap member for the replication group, but it can be any of the servers listed by the --loose-group_replication_group_seeds option in the configuration for the server joining the group. However, note that when the replication group is reconfigured, the group members re-establish connections between themselves. If a group member is only whitelisted by servers that are no longer part of the replication group after the reconfiguration, it is unable to reconnect to the remaining servers in the replication group that do not whitelist it. To avoid this scenario entirely, specify the same whitelist for all servers that are members of the replication group.

Note

It is possible to configure different whitelists on different group members according to your security requirements, for example, in order to keep different subnets separate. If you need to configure different whitelists to meet your security requirements, ensure that there is sufficient overlap between the whitelists in the replication group to maximize the possibility of servers being able to reconnect in the absence of their original seed member.

17.5.2 Secure Socket Layer Support (SSL)

MySQL Group Replication supports both OpenSSL and YaSSL builds of MySQL Server.

Group communication connections as well as recovery connections, are secured using SSL. The following sections explain how to configure connections.

Configuring SSL for Recovery

Recovery is performed through a regular asynchronous replication connection. Once the donor is selected, the server joining the group establishes an asynchronous replication connection. This is all automatic.

However, a user that requires an SSL connection must have been created before the server joining the group connects to the donor. Typically, this is set up at the time one is provisioning a server to join the group.

donor> SET SQL_LOG_BIN=0;
donor> CREATE USER 'rec_ssl_user'@'%' REQUIRE SSL;
donor> GRANT replication slave ON *.* TO 'rec_ssl_user'@'%';
donor> SET SQL_LOG_BIN=1;

Assuming that all servers already in the group have a replication user set up to use SSL, you configure the server joining the group to use those credentials when connecting to the donor. That is done according to the values of the SSL options provided for the Group Replication plugin.

new_member> SET GLOBAL group_replication_recovery_use_ssl=1;
new_member> SET GLOBAL group_replication_recovery_ssl_ca= '.../cacert.pem';
new_member> SET GLOBAL group_replication_recovery_ssl_cert= '.../client-cert.pem';
new_member> SET GLOBAL group_replication_recovery_ssl_key= '.../client-key.pem';

And by configuring the recovery channel to use the credentials of the user that requires an SSL connection.

new_member> CHANGE MASTER TO MASTER_USER="rec_ssl_user" FOR CHANNEL "group_replication_recovery";
new_member> START GROUP_REPLICATION;

Configuring SSL for Group Communication

Secure sockets can be used to establish communication between members in a group. The configuration for this depends on the server's SSL configuration. As such, if the server has SSL configured, the Group Replication plugin also has SSL configured. For more information on the options for configuring the server SSL, see Section 6.4.2, “Command Options for Encrypted Connections”. The options which configure Group Replication are shown in the following table.

Table 17.6 SSL Options

Server Configuration

Plugin Configuration Description

ssl_key

Path of key file. To be used as client and server certificate.

ssl_cert

Path of certificate file. To be used as client and server certificate.

ssl_ca

Path of file with SSL Certificate Authorities that are trusted.

ssl_capath

Path of directory containing certificates for SSL Certificate Authorities that are trusted.

ssl_crl

Path of file containing the certificate revocation lists.

ssl_crlpath

Path of directory containing revoked certificate lists.

ssl_cipher

Permitted ciphers to use while encrypting data over the connection.

tls_version

Secure communication will use this version and its protocols.


These options are MySQL Server configuration options which Group Replication relies on for its configuration. In addition there is the following Group Replication specific option to configure SSL on the plugin itself.

Table 17.7 group_replication_ssl_mode configuration values

Value

Description

DISABLED

Establish an unencrypted connection (default).

REQUIRED

Establish a secure connection if the server supports secure connections.

VERIFY_CA

Like REQUIRED, but additionally verify the server TLS certificate against the configured Certificate Authority (CA) certificates.

VERIFY_IDENTITY

Like VERIFY_CA, but additionally verify that the server certificate matches the host to which the connection is attempted.


The following example shows an example my.cnf file section used to configure SSL on a server and how activate it for Group Replication.

[mysqld]
ssl_ca = "cacert.pem"
ssl_capath = "/.../ca_directory"
ssl_cert = "server-cert.pem"
ssl_cipher = "DHE-RSA-AEs256-SHA"
ssl_crl = "crl-server-revoked.crl"
ssl_crlpath = "/.../crl_directory"
ssl_key = "server-key.pem"
group_replication_ssl_mode= REQUIRED

The only plugin specific configuration option that is listed is group_replication_ssl_mode. This option activates the SSL communication between members of the group, by configuring the SSL framework with the ssl_* parameters that are provided to the server.

17.5.3 Virtual Private Networks (VPN)

There is nothing preventing Group Replication from operating over a virtual private network. At its core, it just relies on an IPv4 socket to establish connections between servers for the purpose of propagating messages between them.

17.6 Group Replication System Variables

These are the system variables that are specific to the Group Replication plugin. Every configuration option is prefixed with "group_replication".

Important

Although most variables are described as dynamic and can be changed while the server is running, most changes only take effect upon restarting the Group Replication plugin. Variables which can be changed without requiring a restart of the plugin are specifically noted as such in this section.

  • group_replication_allow_local_disjoint_gtids_join

    Property Value
    Command-Line Format --group-replication-allow-local-disjoint-gtids-join=value
    Introduced 5.7.17
    Deprecated 5.7.21
    System Variable group_replication_allow_local_disjoint_gtids_join
    Scope Global
    Dynamic Yes
    Type boolean
    Default OFF

    Deprecated in version 5.7.21 and scheduled for removal in a future version. Allow the current server to join the group even if it has transactions not present in the group.

    Warning

    Use caution when enabling this option as incorrect usage could lead to inconsistencies in the group.

  • group_replication_allow_local_lower_version_join

    Property Value
    Command-Line Format --group-replication-allow-local-lower-version-join=value
    Introduced 5.7.17
    System Variable group_replication_allow_local_lower_version_join
    Scope Global
    Dynamic Yes
    Type boolean
    Default OFF

    Allow the current server to join the group even if it has a lower plugin version than the group.

  • group_replication_auto_increment_increment

    Property Value
    Command-Line Format --group-replication-auto-increment-increment=value
    Introduced 5.7.17
    System Variable group_replication_auto_increment_increment
    Scope Global
    Dynamic Yes
    Type integer
    Default 7
    Minimum 1
    Maximum 65535

    Determines the interval between successive column values for transactions that execute on this server instance.

  • group_replication_bootstrap_group

    Property Value
    Command-Line Format --group-replication-bootstrap-group=value
    Introduced 5.7.17
    System Variable group_replication_bootstrap_group
    Scope Global
    Dynamic Yes
    Type boolean
    Default OFF

    Configure this server to bootstrap the group. This option must only be set on one server and only when starting the group for the first time or restarting the entire group. After the group has been bootstrapped, set this option to OFF. It should be set to OFF both dynamically and in the configuration files. Starting two servers or restarting one server with this option set while the group is running may lead to an artificial split brain situation, where two independent groups with the same name are bootstrapped.

  • group_replication_components_stop_timeout

    Property Value
    Command-Line Format --group-replication-components-stop-timeout=value
    Introduced 5.7.17
    System Variable group_replication_components_stop_timeout
    Scope Global
    Dynamic Yes
    Type integer
    Default 31536000

    Timeout, in seconds, that Group Replication waits for each of the components when shutting down.

  • group_replication_compression_threshold

    Property Value
    Command-Line Format --group-replication-compression-threshold=value
    Introduced 5.7.17
    System Variable group_replication_compression_threshold
    Scope Global
    Dynamic Yes
    Type integer
    Default 1000000
    Minimum 0
    Maximum 4294967296

    The value in bytes above which (LZ4) compression is enforced. When set to zero, deactivates compression.

  • group_replication_enforce_update_everywhere_checks

    Property Value
    Command-Line Format --group-replication-enforce-update-everywhere-checks=value
    Introduced 5.7.17
    System Variable group_replication_enforce_update_everywhere_checks
    Scope Global
    Dynamic Yes
    Type boolean
    Default OFF

    Enable or disable strict consistency checks for multi-primary update everywhere.

  • group_replication_flow_control_applier_threshold

    Property Value
    Command-Line Format --group-replication-flow-control-applier-threshold=value
    Introduced 5.7.17
    System Variable group_replication_flow_control_applier_threshold
    Scope Global
    Dynamic Yes
    Type integer
    Default 25000
    Minimum 0
    Maximum 2147483648

    Specifies the number of waiting transactions in the applier queue that trigger flow control. This variable can be changed without resetting Group Replication.

  • group_replication_flow_control_certifier_threshold

    Property Value
    Command-Line Format --group-replication-flow-control-certifier-threshold=value
    Introduced 5.7.17
    System Variable group_replication_flow_control_certifier_threshold
    Scope Global
    Dynamic Yes
    Type integer
    Default 25000
    Minimum 0
    Maximum 2147483648

    Specifies the number of waiting transactions in the certifier queue that trigger flow control. This variable can be changed without resetting Group Replication.

  • group_replication_flow_control_mode

    Property Value
    Command-Line Format --group-replication-flow-control-mode=value
    Introduced 5.7.17
    System Variable group_replication_flow_control_mode
    Scope Global
    Dynamic Yes
    Type enumeration
    Default QUOTA
    Valid Values

    DISABLED

    QUOTA

    Specifies the mode used for flow control. This variable can be changed without resetting Group Replication.

  • group_replication_force_members

    Property Value
    Command-Line Format --group-replication-force-members=value
    Introduced 5.7.17
    System Variable group_replication_force_members
    Scope Global
    Dynamic Yes
    Type string

    A list of peer addresses as a comma separated list such as host1:port1,host2:port2. This option is used to force a new group membership, in which the excluded members do not receive a new view and are blocked. You need to manually kill the excluded servers. Any invalid host names in the list could cause subsequent START GROUP_REPLICATION statements to fail because they could block group membership.

  • group_replication_group_name

    Property Value
    Command-Line Format --group-replication-group-name=value
    Introduced 5.7.17
    System Variable group_replication_group_name
    Scope Global
    Dynamic Yes
    Type string

    The name of the group which this server instance belongs to. Must be a valid UUID.

  • group_replication_group_seeds

    Property Value
    Command-Line Format --group-replication-group-seeds=value
    Introduced 5.7.17
    System Variable group_replication_group_seeds
    Scope Global
    Dynamic Yes
    Type string

    A list of group members that provide a member which joins the group with the data required for the joining member to gain synchrony with the group. The list consists of the seed member's network addresses specified as a comma separated list, such as host1:port1,host2:port2. Ususally this list consists of all members of the group, but you can choose a subset of the group members to be seeds. The list must contain at least one valid member address. Each address is validated when starting Group Replication. If the list does not contain any valid host names, issuing START GROUP_REPLICATION fails.

  • group_replication_ip_whitelist

    Property Value
    Command-Line Format --group-replication-ip-whitelist=value
    Introduced 5.7.17
    System Variable group_replication_ip_whitelist
    Scope Global
    Dynamic Yes
    Type string
    Default AUTOMATIC

    Specifies which hosts are permitted to connect to the group. By default, this system variable is set to AUTOMATIC, which permits connections from private subnetworks active on the host. Active interfaces on the host are scanned and those with addresses on private subnetworks are automatically added to the permitted list.

    Alternatively, you can specify a whitelist of permitted hosts as a comma separated list of IPv4 addresses, subnet CIDR notation, or (from MySQL 5.7.21) host names, in any combination. For example:

    192.0.2.22,198.51.100.0/24,example.org,www.example.com/24

    Address 127.0.0.1 is always permitted to connect, even when not specified explicitly. IPv6 addresses, and host names that resolve to IPv6 addresses, are not supported.

    For host names, name resolution takes place only when a connection request is made by another server. A host name that cannot be resolved is not considered for whitelist validation, and a warning message is written to the error log. Forward-confirmed reverse DNS (FCrDNS) verification is carried out for resolved host names.

    Warning

    Host names are inherently less secure than IP addresses in a whitelist. FCrDNS verification provides a good level of protection, but can be compromised by certain types of attack. Specify host names in your whitelist only when strictly necessary, and ensure that all components used for name resolution, such as DNS servers, are maintained under your control. You can also implement name resolution locally using the hosts file, to avoid the use of external components.

  • group_replication_local_address

    Property Value
    Command-Line Format --group-replication-local-address=value
    Introduced 5.7.17
    System Variable group_replication_local_address
    Scope Global
    Dynamic Yes
    Type string

    The network address which the member provides for connections from other members, specified as a host:port formatted string. This address must be reachable by all members of the group because it is used by XCOM, the internal group communication system.

    Warning

    Do not use this address for communication with the member.

    Other Group Replication members contact this member through this host:port for all internal group communication. This is not the MySQL server SQL protocol host and port.

  • group_replication_member_weight

    Property Value
    Command-Line Format --group-replication-member-weight=value
    Introduced 5.7.20
    System Variable group_replication_member_weight
    Scope Global
    Dynamic Yes
    Type integer
    Default 50
    Minimum 0
    Maximum 100

    A percentage weight that can be assigned to members to influence the chance of the member being elected as primary in the event of failover, for example when the existing primary leaves a single-primary group. Assign numeric weights to members to ensure that specific members are elected, for example during scheduled maintenance of the primary or to ensure certain hardware is prioritized in the event of failover.

    For a group with members configured as follows:

    • member-1: group_replication_member_weight=30, server_uuid=aaaa

    • member-2: group_replication_member_weight=40, server_uuid=bbbb

    • member-3: group_replication_member_weight=40, server_uuid=cccc

    • member-4: group_replication_member_weight=40, server_uuid=dddd

    during election of a new primary the members above would be sorted as member-2, member-3, member-4, and member-1. This results in member-2 being chosen as the new primary in the event of failover. For more information, see Section 17.4.1.1, “Single-Primary Mode”.

  • group_replication_poll_spin_loops

    Property Value
    Command-Line Format --group-replication-poll-spin-loops=value
    Introduced 5.7.17
    System Variable group_replication_poll_spin_loops
    Scope Global
    Dynamic Yes
    Type (64-bit platforms) integer
    Type (32-bit platforms) integer
    Default (64-bit platforms) 0
    Default (32-bit platforms) 0
    Minimum (64-bit platforms) 0
    Minimum (32-bit platforms) 0
    Maximum (64-bit platforms) 18446744073709547520
    Maximum (32-bit platforms) 4294967295

    The number of times the group communication thread waits for the communication engine mutex to be released before the thread waits for more incoming network messages.

  • group_replication_recovery_retry_count

    Property Value
    Command-Line Format --group-replication-recovery-retry-count=value
    Introduced 5.7.17
    System Variable group_replication_recovery_retry_count
    Scope Global
    Dynamic Yes
    Type integer
    Default 10
    Minimum 0
    Maximum 31536000

    The number of times that the member that is joining tries to connect to the available donors before giving up.

  • group_replication_recovery_reconnect_interval

    Property Value
    Command-Line Format --group-replication-recovery-reconnect-interval=value
    Introduced 5.7.17
    System Variable group_replication_recovery_reconnect_interval
    Scope Global
    Dynamic Yes
    Type integer
    Default 60
    Minimum 0
    Maximum 0

    The sleep time, in seconds, between reconnection attempts when no donor was found in the group.

  • group_replication_recovery_use_ssl

    Property Value
    Command-Line Format --group-replication-recovery-use-ssl=value
    Introduced 5.7.17
    System Variable group_replication_recovery_use_ssl
    Scope Global
    Dynamic Yes
    Type boolean
    Default OFF

    Whether Group Replication recovery connection should use SSL or not.

  • group_replication_recovery_ssl_ca

    Property Value
    Command-Line Format --group-replication-recovery-ssl-ca=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_ca
    Scope Global
    Dynamic Yes
    Type string

    The path to a file that contains a list of trusted SSL certificate authorities.

  • group_replication_recovery_ssl_capath

    Property Value
    Command-Line Format --group-replication-recovery-ssl-capath=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_capath
    Scope Global
    Dynamic Yes
    Type string

    The path to a directory that contains trusted SSL certificate authority certificates.

  • group_replication_recovery_ssl_cert

    Property Value
    Command-Line Format --group-replication-recovery-ssl-cert=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_cert
    Scope Global
    Dynamic Yes
    Type string

    The name of the SSL certificate file to use for establishing a secure connection.

  • group_replication_recovery_ssl_key

    Property Value
    Command-Line Format --group-replication-recovery-ssl-key=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_key
    Scope Global
    Dynamic Yes
    Type string

    The name of the SSL key file to use for establishing a secure connection.

  • group_replication_recovery_ssl_cipher

    Property Value
    Command-Line Format --group-replication-recovery-ssl-cipher=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_cipher
    Scope Global
    Dynamic Yes
    Type string

    The list of permitted ciphers for SSL encryption.

  • group_replication_recovery_ssl_crl

    Property Value
    Command-Line Format --group-replication-recovery-ssl-crl=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_crl
    Scope Global
    Dynamic Yes
    Type string

    The path to a directory that contains files containing certificate revocation lists.

  • group_replication_recovery_ssl_crlpath

    Property Value
    Command-Line Format --group-replication-recovery-ssl-crlpath=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_crlpath
    Scope Global
    Dynamic Yes
    Type string

    The path to a directory that contains files containing certificate revocation lists.

  • group_replication_recovery_ssl_verify_server_cert

    Property Value
    Command-Line Format --group-replication-recovery-ssl-verify-server-cert=value
    Introduced 5.7.17
    System Variable group_replication_recovery_ssl_verify_server_cert
    Scope Global
    Dynamic Yes
    Type boolean
    Default OFF

    Make the recovery process check the server's Common Name value in the donor sent certificate.

  • group_replication_recovery_complete_at

    Property Value
    Command-Line Format --group-replication-recovery-complete-at=value
    Introduced 5.7.17
    System Variable group_replication_recovery_complete_at
    Scope Global
    Dynamic Yes
    Type enumeration
    Default TRANSACTIONS_APPLIED
    Valid Values

    TRANSACTIONS_CERTIFIED

    TRANSACTIONS_APPLIED

    Recovery policies when handling cached transactions after state transfer. This option specifies whether a member is marked online after it has received all transactions that it missed before it joined the group (TRANSACTIONS_CERTIFIED) or after it has received and applied them (TRANSACTIONS_APPLIED).

  • group_replication_gtid_assignment_block_size

    Property Value
    Command-Line Format --group-replication-gtid-assignment-block-size=value
    Introduced 5.7.17
    System Variable group_replication_gtid_assignment_block_size
    Scope Global
    Dynamic Yes
    Type (64-bit platforms) integer
    Type (32-bit platforms) integer
    Default (64-bit platforms) 1000000
    Default (32-bit platforms) 1000000
    Minimum (64-bit platforms) 1
    Minimum (32-bit platforms) 1
    Maximum (64-bit platforms) 18446744073709547520
    Maximum (32-bit platforms) 4294967295

    The number of consecutive GTIDs that are reserved for each member. Each member consumes its blocks and reserves more when needed.

  • group_replication_single_primary_mode

    Property Value
    Command-Line Format --group-replication-single-primary-mode=value
    Introduced 5.7.17
    System Variable group_replication_single_primary_mode
    Scope Global
    Dynamic Yes
    Type boolean
    Default ON

    Instructs the group to automatically pick a single server to be the one that handles read/write workload. This server is the PRIMARY and all others are SECONDARIES.

  • group_replication_ssl_mode

    Property Value
    Command-Line Format --group-replication-ssl-mode=value
    Introduced 5.7.17
    System Variable group_replication_ssl_mode
    Scope Global
    Dynamic Yes
    Type enumeration
    Default DISABLED
    Valid Values

    DISABLED

    REQUIRED

    VERIFY_CA

    VERIFY_IDENTITY

    Specifies the security state of the connection between Group Replication members.

  • group_replication_start_on_boot

    Property Value
    Command-Line Format --group-replication-start-on-boot=value
    Introduced 5.7.17
    System Variable group_replication_start_on_boot
    Scope Global
    Dynamic Yes
    Type boolean
    Default ON

    Whether the server should start Group Replication or not during server start.

  • group_replication_transaction_size_limit

    Property Value
    Command-Line Format --group-replication-transaction-size-limit=value
    Introduced 5.7.19
    System Variable group_replication_transaction_size_limit
    Scope Global
    Dynamic Yes
    Type (>= 5.7.19) integer
    Default (>= 5.7.19) 0
    Minimum (>= 5.7.19) 0
    Maximum (>= 5.7.19) 2147483647

    Configures the maximum transaction size in bytes which the group accepts. Transactions larger than this size are rolled back. Use this option to avoid large transactions causing the group to fail. A large transaction can cause problems for a group, either in terms of memory allocation or network bandwidth consumption, which may cause the failure detector to trigger because a given member is unreachable while it is busy processing the large transaction. When set to 0 there is no limit to the size of transactions the group accepts, and there may be the risk of large transactions causing the group to fail. Adjust the value of this variable depending on the size of workload you require from the group.

  • group_replication_unreachable_majority_timeout

    Property Value
    Command-Line Format --group-replication-unreachable-majority-timeout=value
    Introduced 5.7.19
    System Variable group_replication_unreachable_majority_timeout
    Scope Global
    Dynamic Yes
    Type integer
    Default 0
    Minimum 0
    Maximum 31536000

    Configures how long members that suffer a network partition and cannot connect to the majority wait before leaving the group. By default set to 0, which means that members that find themselves in a minority due to a network partition wait forever to connect the group. In a group of 5 servers (S1,S2,S3,S4,S5), if there is a disconnection between (S1,S2) and (S3,S4,S5) there is a network partition. The first group (S1,S2) is now in a minority because it cannot contact more than half of the group. While the majority group (S3,S4,S5) remains running, the minority group waits forever for a network reconnection. Any transactions processed by the minority group are blocked until Group Replication is stopped using STOP GROUP REPLICATION on the members of the minority.

    If configured to a number of seconds, members wait for this amount of time after losing contact with the majority of members before leaving the group. All pending transactions processed by the minority are rolled back and the servers in the minority partition move to the ERROR state and set themselves to super_read_only=ON mode.

    Warning

    When you have a symmetric group, with just two members for example (S0,S2), if there is a network partition and there is no majority, after the configured timeout all members shut down and enter ERROR state.

17.7 Requirements and Limitations

This section lists and explains the requirements and limitations of Group Replication.

17.7.1 Group Replication Requirements

Server instances that you want to use for Group Replication must satisfy the following requirements.

Infrastructure

  • InnoDB Storage Engine.  Data must be stored in the InnoDB transactional storage engine. Transactions are executed optimistically and then, at commit time, are checked for conflicts. If there are conflicts, in order to maintain consistency across the group, some transactions are rolled back. This means that a transactional storage engine is required. Moreover, InnoDB provides some additional functionality that enables better management and handling of conflicts when operating together with Group Replication.

  • Primary Keys.  Every table that is to be replicated by the group must have a defined primary key, or primary key equivalent where the equivalent is a non-null unique key. Such keys are required as a unique identifier for every row within a table, enabling the system to determine which transactions conflict by identifying exactly which rows each transaction has modified.

  • IPv4 Network.  The group communication engine used by MySQL Group Replication only supports IPv4. Therefore, Group Replication requires an IPv4 network infrastructure.

  • Network Performance.  Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.

Server Instance Configuration

The following options must be configured on server instances that are members of a group.

  • Binary Log Active.  Set --log-bin[=log_file_name]. MySQL Group Replication replicates binary log contents, therefore the binary log needs to be on for it to operate. This option is enabled by default. See Section 5.4.4, “The Binary Log”.

  • Slave Updates Logged.  Set --log-slave-updates. Servers need to log binary logs that are applied through the replication applier. Servers in the group need to log all transactions that they receive and apply from the group. This is required because recovery is conducted by relying on binary logs form participants in the group. Therefore, copies of each transaction need to exist on every server, even for those transactions that were not initiated on the server itself.

  • Binary Log Row Format.  Set --binlog-format=row. Group Replication relies on row-based replication format to propagate changes consistently among the servers in the group. It relies on row-based infrastructure to be able to extract the necessary information to detect conflicts among transactions that execute concurrently in different servers in the group. See Section 16.2.1, “Replication Formats”.

  • Global Transaction Identifiers On.  Set --gtid-mode=ON. Group Replication uses global transaction identifiers to track exactly which transactions have been committed on every server instance and thus be able to infer which servers have executed transactions that could conflict with already committed transactions elsewhere. In other words, explicit transaction identifiers are a fundamental part of the framework to be able to determine which transactions may conflict. See Section 16.1.3, “Replication with Global Transaction Identifiers”.

  • Replication Information Repositories.  Set --master-info-repository=TABLE and --relay-log-info-repository=TABLE. The replication applier needs to have the master information and relay log metadata written to the mysql.slave_master_info and mysql.slave_relay_log_info system tables. This ensures the Group Replication plugin has consistent recoverability and transactional management of the replication metadata. See Section 16.2.4.2, “Slave Status Logs”.

  • Transaction Write Set Extraction.  Set --transaction-write-set-extraction=XXHASH64 so that while collecting rows to log them to the binary log, the server collects the write set as well. The write set is based on the primary keys of each row and is a simplified and compact view of a tag that uniquely identifies the row that was changed. This tag is then used for detecting conflicts.

  • Multi-threaded Appliers.  Group Replication members can be configured as multi-threaded appliers, enabling transactions to be applied in parallel. Set --slave-parallel-workers=N (where N is the number of parallel applier threads), --slave-preserve-commit-order=1, and --slave-parallel-type=LOGICAL_CLOCK. Setting --slave-parallel-workers=N enables the multi-threaded applier on the member. Group Replication relies on consistency mechanisms built around the guarantee that all participating members receive and apply committed transaction in the same order, so you must also set --slave-preserve-commit-order=1 to ensure that the final commit of parallel transactions is in the same order as the original transactions. Finally, in order to determine which transactions can be executed in parallel, the relay log must contain transaction parent information generated with --slave-parallel-type=LOGICAL_CLOCK. Attempting to add a member with --slave-parallel-workers set to greater than 0 without also setting the other two options, generates an error and the instance is prevented from joining.

17.7.2 Group Replication Limitations

The following known limitations exist for Group Replication.

Tip

Group Replication is built on GTID based replication, therefore you should also be aware of Section 16.1.3.4, “Restrictions on Replication with GTIDs”.

  • Replication Event Checksums.  Due to a design limitation of replication event checksums, Group Replication cannot currently make use of them. Therefore set --binlog-checksum=NONE.

  • Gap Locks.  The certification process does not take into account gap locks, as information about gap locks is not available outside of InnoDB. See Gap Locks for more information.

    Note

    Unless you rely on REPEATABLE READ semantics in your applications, we recommend using the READ COMMITTED isolation level with Group Replication. InnoDB does not use gap locks in READ COMMITTED, which aligns the local conflict detection within InnoDB with the distributed conflict detection performed by Group Replication.

  • Table Locks and Named Locks.  The certification process does not take into account table locks (see Section 13.3.5, “LOCK TABLES and UNLOCK TABLES Syntax”) or named locks (see GET_LOCK()).

  • SERIALIZABLE Isolation Level.  SERIALIZABLE isolation level is not supported in multi-primary groups by default. Setting a transaction isolation level to SERIALIZABLE configures Group Replication to refuse to commit the transaction.

  • Concurrent DDL versus DML Operations.  Concurrent data definition statements and data manipulation statements executing against the same object but on different servers is not supported when using multi-primary mode. During execution of Data Definition Language (DDL) statements on an object, executing concurrent Data Manipulation Language (DML) on the same object but on a different server instance has the risk of conflicting DDL executing on different instances not being detected.

  • Foreign Keys with Cascading Constraints.  Multi-primary mode groups (members all configured with group_replication_single_primary_mode=OFF) do not support tables with multi-level foreign key dependencies, specifically tables that have defined CASCADING foreign key constraints. This is because foreign key constraints that result in cascading operations executed by a multi-primary mode group can result in undetected conflicts and lead to inconsistent data across the members of the group. Therefore we recommend setting group_replication_enforce_update_everywhere_checks=ON on server instances used in multi-primary mode groups to avoid undetected conflicts.

    In single-primary mode this is not a problem as it does not allow concurrent writes to multiple members of the group and thus there is no risk of undetected conflicts.

  • Very Large Transactions.  Individual transactions that result in GTID contents which are large enough that it cannot be copied between group members over the network within a 5 second window can cause failures in the group communication. To avoid this issue try and limit the size of your transactions as much as possible. For example, split up files used with LOAD DATA INFILE into smaller chunks.

  • MySQL Enterprise Audit and MySQL Enterprise Firewall.  Prior to version 5.7.21 MySQL Enterprise Audit and MySQL Enterprise Firewall use MyISAM tables in the mysql system database. Group Replication does not support MyISAM tables.

17.8 Frequently Asked Questions

This section provides answers to frequently asked questions.

What is the maximum number of MySQL servers in a group?

A group can consist of maximum 9 servers. Attempting to add another server to a group with 9 members causes the request to join to be refused.

How are servers in a group connected?

Servers in a group connect to the other servers in the group by opening a peer-to-peer TCP connection. These connections are only used for internal communication and message passing between servers in the group. This address is configured by the group_replication_local_address variable.

What is the group_replication_bootstrap_group option used for?

The bootstrap flag instructs a member to create a group and act as the initial seed server. The second member joining the group needs to ask the member that bootstrapped the group to dynamically change the configuration in order for it to be added to the group.

A member needs to bootstrap the group in two scenarios. When the group is originally created, or when shutting down and restarting the entire group.

How do I set credentials for the recovery procedure?

You pre-configure the Group Replication recovery channel credentials using the CHANGE MASTER TO statement.

Can I scale-out my write-load using Group Replication?

Not directly, but MySQL Group replication is a shared nothing full replication solution, where all servers in the group replicate the same amount of data. Therefore if one member in the group writes N bytes to storage as the result of a transaction commit operation, then roughly N bytes are written to storage on other members as well, because the transaction is replicated everywhere.

However, given that other members do not have to do the same amount of processing that the original member had to do when it originally executed the transaction, they apply the changes faster. Transactions are replicated in a format that is used to apply row transformations only, without having to re-execute transactions again (row-based format).

Furthermore, given that changes are propagated and applied in row-based format, this means that they are received in an optimized and compact format, and likely reducing the number of IO operations required when compared to the originating member.

To summarize, you can scale-out processing, by spreading conflict free transactions throughout different members in the group. And you can likely scale-out a small fraction of your IO operations, since remote servers receive only the necessary changes to read-modify-write changes to stable storage.

Does Group Replication require more network bandwidth and CPU, when compared to simple replication and under the same workload?

Some additional load is expected because servers need to be constantly interacting with each other for synchronization purposes. It is difficult to quantify how much more data. It also depends on the size of the group (three servers puts less stress on the bandwidth requirements than nine servers in the group).

Also the memory and CPU footprint are larger, because more complex work is done for the server synchronization part and for the group messaging.

Can I deploy Group Replication across wide-area networks?

Yes, but the network connection between each member must be reliable and have suitable perfomance. Low latency, high bandwidth network connections are a requirement for optimal performance.

If network bandwidth alone is an issue, then Section 17.9.7.2, “Message Compression” can be used to lower the bandwidth required. However, if the network drops packets, leading to re-transmissions and higher end-to-end latency, throughput and latency are both negatively affected.

Warning

When the network round-trip time (RTT) between any group members is 2 seconds or more you could encounter problems as the built-in failure detection mechanism could be incorrectly triggered.

Do members automatically rejoin a group in case of temporary connectivity problems?

This depends on the reason for the connectivity problem. If the connectivity problem is transient and the reconnection is quick enough that the failure detector is not aware of it, then the server may not be removed from the group. If it is a "long" connectivity problem, then the failure detector eventually suspects a problem and the server is removed from the group.

Once a server is removed from the group, you need to join it back again. In other words, after a server is removed explicitly from the group you need to rejoin it manually (or have a script doing it automatically).

When is a member excluded from a group?

If the member becomes silent, the other members remove it from the group configuration. In practice this may happen when the member has crashed or there is a network disconnection.

The failure is detected after a given timeout elapses for a given member and a new configuration without the silent member in it is created.

What happens when one node is significantly lagging behind?

There is no method for defining policies for when to expel members automatically from the group. You need to find out why a member is lagging behind and fix that or remove the member from the group. Otherwise, if the server is so slow that it triggers the flow control, then the entire group slows down as well. The flow control can be configured according to the your needs.

Upon suspicion of a problem in the group, is there a special member responsible for triggering a reconfiguration?

No, there is no special member in the group in charge of triggering a reconfiguration.

Any member can suspect that there is a problem. All members need to (automatically) agree that a given member has failed. One member is in charge of expelling it from the group, by triggering a reconfiguration. Which member is responsible for expelling the member is not something you can control or set.

Can I use Group Replication for sharding?

Group Replication is designed to provide highly available replica sets; data and writes are duplicated on each member in the group. For scaling beyond what a single system can provide, you need an orchestration and sharding framework built around a number of Group Replication sets, where each replica set maintains and manages a given shard or partition of your total dataset. This type of setup, often called a sharded cluster, allows you to scale reads and writes linearly and without limit.

How do I use Group Replication with SELinux?

If SELinux is enabled, which you can verify using sestatus -v, then you need to enable the use of the Group Replication communication port, configured by group_replication_local_address, for mysqld so that it can bind to it and listen there. To see which ports MySQL is currently allowed to use, issue semanage port -l | grep mysqld. Assuming the port configured is 6606, add the necessary port to those permitted by SELinux by issuing semanage port -a -t mysqld_port_t -p tcp 6606.

How do I use Group Replication with iptables?

If iptables is enabled, then you need to open up the Group Replication port for communication between the machines. To see the current rules in place on each machine, issue iptables -L. Assuming the port configured is 6606, enable communication over the necessary port by issuing iptables -A INPUT -p tcp --dport 6606 -j ACCEPT.

How do I recover the relay log for a replication channel used by a group member?

The replication channels used by Group Replication behave in the same way as replication channels used in master to slave replication, and as such rely on the relay log. In the event of a change of the relay_log variable, or when the option is not set and the host name changes, there is a chance of errors. See Section 16.2.4.1, “The Slave Relay Log” for a recovery procedure in this situation. Alternatively, another way of fixing the issue specifically in Group Replication is to issue a STOP GROUP_REPLICATION statement and then a START GROUP_REPLICATION statement to restart the instance. The Group Replication plugin creates the group_replication_applier channel again.

Why does Group Replication use two bind addresses?

Group Replication uses two bind addresses in order to split network traffic between the SQL address, used by clients to communicate with the member, and the group_replication_local_address, used internally by the group members to communicate. For example, assume a server with two network interfaces assigned to the network addresses 203.0.113.1 and 198.51.100.179. In such a situation you could use 203.0.113.1:33061 for the internal group network address by setting group_replication_local_address=203.0.113.1:33061. Then you could use 198.51.100.179 for hostname and 3306 for the port. Client SQL applications would then connect to the member at 198.51.100.179:3306. This enables you to configure different rules on the different networks. Similarly, the internal group communication can be separated from the network connection used for client applications, for increased security.

How does Group Replication use network addresses and hostnames?

Group Replication uses network connections between members and therefore its functionality is directly impacted by how you configure hostnames and ports. For example, the Group Replication recovery procedure is based on asynchronous replication which uses the server's hostname and port. When a member joins a group it receives the group membership information, using the network address information that is listed at performance_schema.replication_group_members. One of the members listed in that table is selected as the donor of the missing data from the group to the new member.

This means that any value you configure using a hostname, such as the SQL network address or the group seeds address, must be a fully qualified name and resolvable by each member of the group. You can ensure this for example through DNS, or correctly configured /etc/hosts files, or other local processes. If a you want to configure the MEMBER_HOST value on a server, specify it using the --report-host option on the server before joining it to the group.

Important

The assigned value is used directly and is not affected by the --skip-name-resolve option.

To configure MEMBER_PORT on a server, specify it using the --report-port option.

17.9 Group Replication Technical Details

This section provides more technical details about MySQL Group Replication.

17.9.1 Group Replication Plugin Architecture

MySQL Group Replication is a MySQL plugin and it builds on the existing MySQL replication infrastructure, taking advantage of features such as the binary log, row-based logging, and global transaction identifiers. It integrates with current MySQL frameworks, such as the performance schema or plugin and service infrastructures. The following figure presents a block diagram depicting the overall architecture of MySQL Group Replication.

Figure 17.9 Group Replication Plugin Block Diagram

The text following the figure describes the content of the diagram.

The MySQL Group Replication plugin includes a set of APIs for capture, apply, and lifecycle, which control how the plugin interacts with MySQL Server (shown at the top of the figure). There are interfaces to make information flow from the server to the plugin and vice versa. These interfaces isolate the MySQL Server core from the Group Replication plugin, and are mostly hooks placed in the transaction execution pipeline. In one direction, from server to the plugin, there are notifications for events such as the server starting, the server recovering, the server being ready to accept connections, and the server being about to commit a transaction. In the other direction, the plugin instructs the server to perform actions such as committing or aborting ongoing transactions, or queuing transactions in the relay log.

The next layer of the Group Replication plugin architecture is a set of components that react when a notification is routed to them. The capture component is responsible for keeping track of context related to transactions that are executing. The applier component is responsible for executing remote transactions on the database. The recovery component manages distributed recovery, and is responsible for getting a server that is joining the group up to date by selecting the donor, orchestrating the catch up procedure and reacting to donor failures.

Continuing down the stack, the replication protocol module contains the specific logic of the replication protocol. It handles conflict detection, and receives and propagates transactions to the group.

The final two layers of the Group Replication plugin architecture are the Group Communication System (GCS) API, and an implementation of a Paxos-based group communication engine. The GCS API is a high level API that abstracts the properties required to build a replicated state machine (see Section 17.1, “Group Replication Background”). It therefore decouples the implementation of the messaging layer from the remaining upper layers of the plugin. The group communication engine handles communications with the members of the replication group (shown at the bottom of the figure).

17.9.2 The Group

In MySQL Group Replication, a set of servers forms a replication group. A group has a name, which takes the form of a UUID. The group is dynamic and servers can leave (either voluntarily or involuntarily) and join it at any time. The group adjusts itself whenever servers join or leave.

If a server joins the group, it automatically brings itself up to date by fetching the missing state from an existing server. This state is transferred by means of Asynchronous MySQL replication. If a server leaves the group, for instance it was taken down for maintenance, the remaining servers notice that it has left and reconfigure the group automatically. The group membership service described at Section 17.1.3.2, “Group Membership” powers all of this.

17.9.3 Data Manipulation Statements

As there are no primary servers (masters) for any particular data set, every server in the group is allowed to execute transactions at any time, even transactions that change state (RW transactions).

Any server may execute a transaction without any a priori coordination. But, at commit time, it coordinates with the rest of the servers in the group to reach a decision on the fate of that transaction. This coordination serves two purposes: (i) check whether the transaction should commit or not; (ii) and propagate the changes so that other servers can apply the transaction as well.

As a transaction is sent through an atomic broadcast, either all servers in the group receive the transaction or none do. If they receive it, then they all receive it in the same order with respect to other transactions that were sent before. Conflict detection is carried out by inspecting and comparing write sets of transactions. Thus, they are detected at the row level. Conflict resolution follows the first committer wins rule. If t1 and t2 execute concurrently at different sites, because t2 is ordered before t1, and both changed the same row, then t2 wins the conflict and t1 aborts. In other words, t1 was trying to change data that had been rendered stale by t2.

Note

If two transactions are bound to conflict more often than not, then it is a good practice to start them on the same server. They then have a chance to synchronize on the local lock manager instead of aborting later in the replication protocol.

17.9.4 Data Definition Statements

In a Group Replication topology, care needs to be taken when executing data definition statements also commonly known as data definition language (DDL). Given that MySQL does not support atomic or transactional DDL, one cannot optimistically execute DDL statements and later roll back if needs be. Consequently, the lack of atomicity does not fit directly into the optimistic replication paradigm that Group Replication is based on.

Therefore, more care needs to be taken when replicating data definition statements. Schema changes and changes to the data that the object contains need to be handled through the same server while the schema operation has not yet completed and replicated everywhere. Failure to do so can result in data inconsistency.

Note

If the group is deployed in single-primary mode, then this is not a problem, because all changes are performed through the same server, the primary.

Warning

MySQL DDL execution is not atomic or transactional. The server executes and commits without securing group agreement first. As such, you must route DDL and DML for the same object through the same server, while the DDL is executing and has not replicated everywhere yet.

17.9.5 Distributed Recovery

This section describes the process through which a member joining a group catches up with the remaining servers in the group, called distributed recovery.

17.9.5.1 Distributed Recovery Basics

This section is a high level summary. The following sections provide additional detail, by describing the phases of the procedure in more detail.

Group Replication distributed recovery can be summarized as the process through which a server gets missing transactions from the group so that it can then join the group having processed the same set of transactions as the other group members. During distributed recovery, the server joining the group buffers any transactions and membership events that happen while the server joining the group is receiving the transactions required from the group. Once the server joining the group has received all of the group's transactions, it applies the transactions that were buffered during the recovery process. At the end of this process the server then joins the group as an online member.

Phase 1

In the first phase, the server joining the group selects one of the online servers on the group to be the donor of the state that it is missing. The donor is responsible for providing the server joining the group all the data it is missing up to the moment it has joined the group. This is achieved by relying on a standard asynchronous replication channel, established between the donor and the server joining the group, see Section 16.2.3, “Replication Channels”. Through this replication channel, the donor's binary logs are replicated until the point that the view change happened when the server joining the group became part of the group. The server joining the group applies the donor's binary logs as it receives them.

While the binary log is being replicated, the server joining the group also caches every transaction that is exchanged within the group. In other words it is listening for transactions that are happening after it joined the group and while it is applying the missing state from the donor. When the first phase ends and the replication channel to the donor is closed, the server joining the group then starts phase two: the catch up.

Phase 2

In this phase, the server joining the group proceeds to the execution of the cached transactions. When the number of transactions queued for execution finally reaches zero, the member is declared online.

Resilience

The recovery procedure withstands donor failures while the server joining the group is fetching binary logs from it. In such cases, whenever a donor fails during phase 1, the server joining the group fails over to a new donor and resumes from that one. When that happens the server joining the group closes the connection to the failed server joining the group explicitly and opens a connection to a new donor. This happens automatically.

17.9.5.2 Recovering From a Point-in-time

To synchronize the server joining the group with the donor up to a specific point in time, the server joining the group and donor make use of the MySQL Global Transaction Identifiers (GTIDs) mechanism. See Section 16.1.3, “Replication with Global Transaction Identifiers”. However, GTIDS only provide a means to realize which transactions the server joining the group is missing, they do not help marking a specific point in time to which the server joining the group must catch up, nor do they help conveying certification information. This is the job of binary log view markers, which mark view changes in the binary log stream, and also contain additional metadata information, provisioning the server joining the group with missing certification related data.

View and View Changes

To explain the concept of view change markers, it is important to understand what a view and a view change are.

A view corresponds to a group of members participating actively in the current configuration, in other words at a specific point in time. They are correct and online in the system.

A view change occurs when a modification to the group configuration happens, such as a member joining or leaving. Any group membership change results in an independent view change communicated to all members at the same logical point in time.

A view identifier uniquely identifies a view. It is generated whenever a view change happens

At the group communication layer, view changes with their associated view ids are then boundaries between the data exchanged before and after a member joins. This concept is implemented through a new binary log event: the"view change log event". The view id thus becomes a marker as well for transactions transmitted before and after changes happen in the group membership.

The view identifier itself is built from two parts: (i) one that is randomly generated and (ii) a monotonically increasing integer. The first part is generated when the group is created, and remains unchanged while there is at least one member in the group. The second part is incremented every time a view change happens.

The reason for this heterogeneous pair that makes up the view id is the need to unambiguously mark group changes whenever a member joins or leaves but also whenever all members leave the group and no information remains of what view the group was in. In fact, the sole use of monotonic increasing identifiers could lead to the reuse of the same id after full group shutdowns, destroying the uniqueness of the binary log data markers that recovery depends on. To summarize, the first part identifies whenever the group was started from the beginning and the incremental part when the group changed from that point on.

17.9.5.3 View Changes

This section explains the process which controls how the view change identifier is incorporated into a binary log event and written to the log, The following steps are taken:

Begin: Stable Group

All servers are online and processing incoming transactions from the group. Some servers may be a little behind in terms of transactions replicated, but eventually they converge. The group acts as one distributed and replicated database.

Figure 17.10 Stable Group

Servers S1, S2, and S3 are members of the group. The most recent item in all of their binary logs is transaction T20.

View Change: a Member Joins

Whenever a new member joins the group and therefore a view change is performed, every online server queues a view change log event for execution. This is queued because before the view change, several transactions can be queued on the server to be applied and as such, these belong to the old view. Queuing the view change event after them guarantees a correct marking of when this happened.

Meanwhile, the server joining the group selects the donor from the list of online servers as stated by the membership service through the view abstraction. A member joins on view 4 and the online members write a View change event to the binary log.

Figure 17.11 A Member Joins

Server S4 joins the group and looks for a donor. Servers S1, S2, and S3 each queue the view change entry VC4 for their binary logs. Meanwhile, server S1 is receiving new transaction T21.

State Transfer: Catching Up

Once the server joining the group has chosen which server in the group is to be the donor, a new asynchronous replication connection is established between the two and the state transfer begins (phase 1). This interaction with the donor continues until the server joining the group's applier thread processes the view change log event that corresponds to the view change triggered when the server joining the group came into the group. In other words, the server joining the group replicates from the donor, until it gets to the marker with the view identifier which matches the view marker it is already in.

Figure 17.12 State Transfer: Catching Up

Server S4 has chosen server S2 as the donor. State transfer is executed from server S2 to server S4 until the view change entry VC4 is reached (view_id = VC4). Server S4 uses a temporary applier buffer for state transfer, and its binary log is currently empty.

As view identifiers are transmitted to all members in the group at the same logical time, the server joining the group knows at which view identifier it should stop replicating. This avoids complex GTID set calculations because the view id clearly marks which data belongs to each group view.

While the server joining the group is replicating from the donor, it is also caching incoming transactions from the group. Eventually, it stops replicating from the donor and switches to applying those that are cached.

Figure 17.13 Queued Transactions

State transfer is complete. Server S4 has applied the transactions up to T20 and written them to its binary log. Server S4 has cached transaction T21, which arrived after the view change, in a temporary applier buffer while recovering.

Finish: Caught Up

When the server joining the group recognizes a view change log event with the expected view identifier, the connection to the donor is terminated and it starts applying the cached transactions. An important point to understand is the final recovery procedure. Although it acts as a marker in the binary log, delimiting view changes, the view change log event also plays another role. It conveys the certification information as perceived by all servers when the server joining the group entered the group, in other words the last view change. Without it, the server joining the group would not have the necessary information to be able to certify (detect conflicts) subsequent transactions.

The duration of the catch up (phase 2) is not deterministic, because it depends on the workload and the rate of incoming transactions to the group. This process is completely online and the server joining the group does not block any other server in the group while it is catching up. Therefore the number of transactions the server joining the group is behind when it moves to phase 2 can, for this reason, vary and thus increase or decrease according to the workload.

When the server joining the group reaches zero queued transactions and its stored data is equal to the other members, its public state changes to online.

Figure 17.14 Instance Online

Server S4 is now an online member of the group. It has applied cached transaction T21, so its binary log shows the same items as the binary logs of the other group members, and it no longer needs the temporary applier buffer. New incoming transaction T22 is now received and applied by all group members.

17.9.5.4 Usage Advice and Limitations of Distributed Recovery

Distributed recovery does have some limitations. It is based on classic asynchronous replication and as such it may be slow if the server joining the group is not provisioned at all or is provisioned with a very old backup image. This means that if the data to transfer is too big at phase 1, the server may take a very long time to recover. As such, the recommendation is that before adding a server to the group, one should provision it with a fairly recent snapshot of a server already in the group. This minimizes the length of phase 1 and reduces the impact on the donor server, since it has to save and transfer less binary logs.

Warning

It is recommended that a server is provisioned before it is added to a group. That way, one minimizes the time spent on the recovery step.

17.9.6 Observability

There is a lot of automation built into the Group Replication plugin. Nonetheless, you might sometimes need to understand what is happening behind the scenes. This is where the instrumentation of Group Replication and Performance Schema becomes important. The entire state of the system (including the view, conflict statistics and service states) can be queried through performance_schema tables. The distributed nature of the replication protocol and the fact that server instances agree and thus synchronize on transactions and metadata makes it simpler to inspect the state of the group. For example, you can connect to a single server in the group and obtain both local and global information by issuing select statements on the Group Replication related Performance Schema tables. For more information, see Section 17.3, “Monitoring Group Replication”.

17.9.7 Group Replication Performance

This section explains how to use the available configuration options to gain the best performance from your group.

17.9.7.1 Fine Tuning the Group Communication Thread

The group communication thread (GCT) runs in a loop while the Group Replication plugin is loaded. The GCT receives messages from the group and from the plugin, handles quorum and failure detection related tasks, sends out some keep alive messages and also handles the incoming and outgoing transactions from/to the server/group. The GCT waits for incoming messages in a queue. When there are no messages, the GCT waits. By configuring this wait to be a little longer (doing an active wait) before actually going to sleep can prove to be beneficial in some cases. This is because the alternative is for the operating system to switch out the GCT from the processor and do a context switch.

To force the GCT do an active wait, use the group_replication_poll_spin_loops option, which makes the GCT loop, doing nothing relevant for the configured number of loops, before actually polling the queue for the next message.

For example:

mysql> SET GLOBAL group_replication_poll_spin_loops= 10000;

17.9.7.2 Message Compression

When network bandwidth is a bottleneck, message compression can provide up to 30-40% throughput improvement at the group communication level. This is especially important within the context of large groups of servers under load.

Table 17.8 LZ4 Compression Ratios for Different Binary Log Formats.

Ratio

Workload

ROW

STMT

mysqlslapd

4,5

4,1

sysbench

3,4

2,9


The TCP peer-to-peer nature of the interconnections between N participants on the group makes the sender send the same amount of data N times. Furthermore, binary logs are likely to exhibit a high compression ratio (see table above). This makes compression a compelling feature for workloads that contain large transaction.

Figure 17.15 Compression Support

The MySQL Group Replication plugin architecture is shown as described in an earlier topic, with the five layers of the plugin positioned between the MySQL server and the replication group. Compression and decompression are handled by the Group Communication System API, which is the fourth layer of the Group Replication plugin. The group communication engine (the fifth layer of the plugin) and the group members use the compressed transactions with the smaller data size. The MySQL Server core and the three higher layers of the Group Replication plugin (the APIs, the capture, applier, and recovery components, and the replication protocol module) use the original transactions with the larger data size.

Compression happens at the group communication engine level, before the data is handed over to the group communication thread, so it happens within the context of the mysql user session thread. Transaction payloads may be compressed before being sent out to the group and decompressed when received. Compression is conditional and depends on a configured threshold. By default compression is enabled.

In addition, there is no requirement that all servers in the group have compression enabled to be able to work together. Upon receiving a message, the member checks the message envelope to verify whether it is compressed or not. If needed, then the member decompresses the transaction, before delivering it to the upper layer.

The compression algorithm used is LZ4. Compression is enabled by default with threshold of 1000000 bytes. The compression threshold, in bytes, may be set to something larger than default. In that case, only transactions that have a payload larger than the threshold are compressed. Below is an example of how to set a compression threshold.

STOP GROUP_REPLICATION;
SET GLOBAL group_replication_compression_threshold= 2097152;
START GROUP_REPLICATION;

This sets the compression threshold to 2MB. If a transaction generates a replication message with a payload larger than 2MB, for example a binary log transaction entry larger than 2MB, then it is compressed. To disable compression set threshold to 0.

17.9.7.3 Flow Control

Group Replication ensures that a transaction only commits after a majority of the members in a group have received it and agreed on the relative order between all transactions that were sent concurrently.

This approach works well if the total number of writes to the group does not exceed the write capacity of any member in the group. If it does and some of the members have less write throughput than others, particularly less than the writer members, those members can start lagging behind of the writers.

Having some members lagging behind the group brings some problematic consequences, particularly, the reads on such members may externalize very old data. Depending on why the member is lagging behind, other members in the group may have to save more or less replication context to be able to fulfil potential data transfer requests from the slow member.

There is however a mechanism in the replication protocol to avoid having too much distance, in terms of transactions applied, between fast and slow members. This is known as the flow control mechanism. It tries to address several goals:

  1. to keep the members close enough to make buffering and de-synchronization between members a small problem;

  2. to adapt quickly to changing conditions like different workloads or more writers in the group;

  3. to give each member a fair share of the available write capacity;

  4. to not reduce throughput more than strictly necessary to avoid wasting resources.

Given the design of Group Replication, the decision whether to throttle or not may be decided taking into account two work queues: (i) the certification queue; (ii) and on the binary log applier queue. Whenever the size of one of these queues exceeds the user-defined threshold, the throttling mechanism is triggered. Only configure: (i) whether to do flow control at the certifier or at the applier level, or both; and (ii) what is the threshold for each queue.

The flow control depends on two basic mechanisms:

  1. the monitoring of members to collect some statistics on throughput and queue sizes of all group members to make educated guesses on what is the maximum write pressure each member should be subjected to;

  2. the throttling of members that are trying to write beyond their fair-share of the available capacity at each moment in time.

17.9.7.3.1 Probes and Statistics

The monitoring mechanism works by having each member deploying a set of probes to collect information about its work queues and throughput. It then propagates that information to the group periodically to share that data with the other members.

Such probes are scattered throughout the plugin stack and allow one to establish metrics, such as:

  • the certifier queue size;

  • the replication applier queue size;

  • the total number of transactions certified;

  • the total number of remote transactions applied in the member;

  • the total number of local transactions.

Once a member receives a message with statistics from another member, it calculates additional metrics regarding how many transactions were certified, applied and locally executed in the last monitoring period.

Monitoring data is shared with others in the group periodically. The monitoring period must be high enough to allow the other members to decide on the current write requests, but low enough that it has minimal impact on group bandwidth. The information is shared every second, and this period is sufficient to address both concerns.

17.9.7.3.2 Group Replication Throttling

Based on the metrics gathered across all servers in the group, a throttling mechanism kicks in and decides whether to limit the rate a member is able to execute/commit new transactions.

Therefore, metrics acquired from all members are the basis for calculating the capacity of each member: if a member has a large queue (for certification or the applier thread), then the capacity to execute new transactions should be close to ones certified or applied in the last period.

The lowest capacity of all the members in the group determines the real capacity of the group, while the number of local transactions determines how many members are writing to it, and, consequently, how many members should that available capacity be shared with.

This means that every member has an established write quota based on the available capacity, in other words a number of transactions it can safely issue for the next period. The writer-quota will be enforced by the throttling mechanism if the queue size of the certifier or the binary log applier exceeds a user-defined threshold.

The quota is reduced by the number of transactions that were delayed in the last period, and then also further reduced by 10% to allow the queue that triggered the problem to reduce its size. In order to avoid large jumps in throughput once the queue size goes beyond the threshold, the throughput is only allowed to grow by the same 10% per period after that.

The current throttling mechanism does not penalize transactions below quota, but delays finishing those transactions that exceed it until the end of the monitoring period. As a consequence, if the quota is very small for the write requests issued some transactions may have latencies close to the monitoring period.