Saturday, November 8, 2014

Open source India Conference: MySQL High Availability with Replication New Features

The session was presented at open source India 2014 (http://osidays.com/osidays/) by Shivji (me) and Manish Kumar. It talks of the new features in MySQL-5.7 Replication. It covered work on
  1. Performance enhancements in MySQL Replication 
  2. Usability improvements 
  3. More flexibility to provide more options to our users so 
  4. They can chose what is best for their application. 
  5. Semi-synchronous and MySQL Group Replication 
At then end, there are a lot of links to the blogs written on these features by the MySQL Replication engineers.

Thursday, October 16, 2014

MySQL 5.7.5- More variables in replication performance_schema tables

At MySQL, replication usability is of utmost importance to us. Replication information has long been part of SHOW commands, SHOW SLAVE STATUS occupying a major chunk of it. The other sources of replication information being:
As the replication module grows further, there is a lot more monitoring information, so much that the present interfaces seem too rigid to accommodate all the information we would like to present. So we need to organize them in a more structured manner. In MySQL-5.7.2, we introduced replication performance_schema (P_S) tables providing an SQL interface to monitor replication configuration and status partitioned into different tables, each table grouping logically related information. You can read more about the contents of these tables from official MySQL documentation.
In MySQL-5.7.5 we added some more status variables to these performance_schema tables to enable monitoring the latest replication features viz. multi-source replication in labs. Multi-source allows a MySQL slave to replicate from multiple sources (masters) directly. Talking of multi-source, one needs the replication information per source. So we added global variables that would be useful to extend to per-source scope to the replication performance\_schema tables to help monitor multi-source replication. Note that these variables still work for the single sourced replication and can still be accessed as:
Show status like 'Slave_running';
Show status like 'Slave_retried_transactions';
Show status like 'Slave_last_heartbeat';
Show status like 'Slave_received_heartbeats';
show status like 'Slave_heartbeat_period';
Note though that the status variables are now mostly useful in single-source mode ONLY. If more sources are added, the status variables still just apply to the first source. For other replication sources (masters), the only way to access these variables is to use the replication performance_schema tables as named in the table below. Here is how the names of server variables map to the names in the replication performance_schema tables:
VARIABLE NAMEP_S TABLE NAMEP_S FIELD NAME
 SLAVE_HEARTBEAT_PERIOD replication_connection_configuration HEARTBEAT_INTERVAL
 SLAVE_RECEIVED_HEARTBEATS replication_connection_status COUNT_RECEIVED_HEARTBEATS
 SLAVE_LAST_HEARTBEAT replication_connection_status LAST_HEARTBEAT_TIMESTAMP
 SLAVE_RETRIED_TRANSACTIONS replication_execute_status COUNT_TRANSACTIONS_RETRIES
The variable 'slave_running' reports whether the slave is running or not. This can be found by inspecting the two replication components (receiver and applier) separately to see if the receiver module is running or not by executing
SELECT SERVICE_STATE FROM performance_schema.replication_connection_status;
And execute module is running or not by executing
SERVICE_STATE FROM performance_schema.replication_execute_status;
Please try out multi-source replication and our new monitoring interface in the form of replication performance_schema tables. As always, your feedback is very valuable to us.

Tuesday, April 8, 2014

MySQL Developers Conference: MySQL Replication and Scalability

The slide deck contains the latest developments in MySQL Replication. It covers:
- An introduction to MySQL Replication
- Scaling with Multi-threaded slaves
- Data aggregation with Multi-source replication
- Lossless failover with semi-synchronous replication
- Replication Monitoring made easier

Tuesday, April 1, 2014

MySQL-5.7.4- Change master without stopping slave altogether

At MySQL, we have been working on simplifying the failover process making it faster, more flexible and easier to use. MySQL-5.6 came up with Global Transaction Identifiers (GTID) which was a huge leap in the direction of easing the failover process hiding the details about replication logs and positions. With MySQL-5.7.4, we are introducing a new feature that further adds to flexibility and onliness- the user can only shut down components that he needs to re-configure.

What we allow with this new feature is to execute CHANGE MASTER TO command without stopping slave altogether. We realized that stopping slave altogether is not mandatory in all cases and doing that was more of a cautious approach to switching master restricting more than what’s required at times.

Lets dive deep into this to see what can be relaxed here. For this, lets break the replication process into two modules:
M1) Receiver module (concerned with IO thread) and
M2) Applier module (concerning SQL thread or coordinator and worker threads, whichever be the case)

We can now divide options under the command CHANGE MASTER TO into three groups based on the above classification: 

G1) Options that change a receiver configuration. 
G2) Options that change an applier configuration. 
G3) Options that relate to both (1) and (2). 

For the precise division look at the picture below. Note that the illustration takes into account all the CHANGER MASTER TO options present currently (MySQL-5.7.4). 




Note that given its current usage, we could put the MASTER_AUTO_POSITION  option under group G1(i.e., receive side). Currently only the receiver module uses GTID positioning but we foresee that in future it will be good to allow the applier module to use GTID positioning. We thus keep the master_auto_position option under group G3 to keep things future-proof. Worried that obstructs the failover process again like before? Well that's not a problem as MASTER_AUTO_POSITION is a slave's configuration that you only set once. Then it affects all future times that you redirect to a new immediate master. So you don't specify it on fail-over.

With the classifications stated above, we propose a 3-point rule stated as:
R1) For CHANGE MASTER TO options under group G1, stop only receiver module (M1) using the command STOP SLAVE IO_THREAD command.
R2) For CHANGE MASTER TO options under group G2, stop only applier module (M2) using the command STOP SLAVE SQL_THREAD command.
R3) For CHANGE MASTER TO options under group G3, stop both receiver (M1) and
applier modules (M2) using the command STOP SLAVE.

HOW DOES THIS RULE WORK?
Lets explore more about our 3-point rule(R1-R3):

  • Starting with rule R1, we stated that we only need to stop the receiver thread to change receive options. What happens to the applier module? Well the applier module keeps applying pending transactions, if any. If you have a situation where the slave was lagging behind with a lot of transactions queued into the slave's logs, you can allow the applier module to catch up while you switch masters or change a configuration on the receive side keeping the master same.
  • Under rule R2, we stated that we can change configuration of applier module after stopping the applier threads ONLY, receiver module can be running while you do this. So while you fine-tune your slave applier module the receiver module keeps reading master's log and copying transactions to the slave's log. These could then be applied in-parallel by the slave when the applier module is up and running.
  • Under rule R3, you stop both receiver and applier modules. So, this is analogous to
    STOP SLAVE;
    CHANGE MASTER TO <master_def>;
    used before this feature was available.

Worried how relay log purge would be handled now? Well its pretty simple- Under rules R1 and R2, we do not purge logs implicitly on executing the CHANGE MASTER command so that the receiver or applier whichever is running just keeps processing/adding relay logs as it would do if no replication thread was stopped.

Finally note that you need not always look at the figure above to find which options are allowed which thread being stopped. You just need to ask yourself if the parameter is related to receiver thread and stop the concerned thread. And if you go wrong there are error messages to guide you on the right path. Look at the next section for the usage and the errors.

EXAMPLES OF USAGE
example 1:
Previously, to change master heartbeat period, you would do a

STOP SLAVE;
CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD= <value>;
START SLAVE;

Now, with this feature you just have to stop the receiver (io) thread as heartbeat has nothing to do with the applier thread(s).
 
STOP SLAVE IO_THREAD;
CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD= <value>;
START SLAVE IO_THREAD;

Note that the applier thread keeps executing the transactions in the relay log while you change the heartbeat period for the master-slave connection. Likewise, you could do this with all the attributes mentioned in group G1 in the figure above.

example 2:
Similarly, to change applier thread attributes, you just have to stop the applier threads.
So instead of

STOP SLAVE;
CHANGE MASTER TO MASTER_DELAY=<value>;
START SLAVE;

it is enough to do the following with this feature.

STOP SLAVE SQL_THREAD;
CHANGE MASTER TO MASTER_DELAY=<value>;
START SLAVE SQL_THREAD;

Lastly, if you go wrong there are nicely worded error message to guide you. So in the first case, if your receiver module is active and you execute a

CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD= <value>;

 you get an error saying:
This operation cannot be performed with a running slave io thread; run STOP SLAVE IO_THREAD first.


and if you forgot changing applier module when it was required, the server will say:

This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD first.


Lastly you still have the error message saying both the threads should stop appearing only for MASTER_AUTO_POSITION option now:
MASTER
This operation cannot be performed with a running slave; run STOP SLAVE first.

Lets see some examples once again:

example 3:
slave>START SLAVE;
slave>CHANGE MASTER TO MASTER_DELAY= 10;
ERROR 1900 (HY000): This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD first

example 4:
mysql> CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD= 10;
ERROR 1904 (HY000): This operation cannot be performed with a running slave io thread; run STOP SLAVE IO_THREAD first.

example 5:
mysql> CHANGE MASTER TO MASTER_AUTO_POSITION= 0;
ERROR 1198 (HY000): This operation cannot be performed with a running slave; run STOP SLAVE first

SIDE-EFFECTS?

While implementing this, we have taken special care to make sure we dont break anything for a user switching masters like:

STOP SLAVE;
CHANGE MASTER to <master_def>;
START SLAVE.

There are absolutely NO side-effects to worry you. Note that as stated before, CHANGE MASTER TO will not delete relay logs if one of the receiver or applier thread is running.

Try it out and give us your feedback. As always, we look forward to hearing from you to improve this feature. Enjoy :)

Friday, March 7, 2014

MySQL User Camp: Global Transaction Identifiers

The slde contains an introduction to the global transaction identifiers(GTIDs) in MySQL Replication. The new protocol at re-connect, skipping transactions with GTIDS, replication filters, purging logs, backup/restore ets are covered here.