/ replication

How to configure MySQL master/slave replication with MHA automatic failover

This post is the first one of a package called "complete infrastructure". Following all posts of this package, you will be able to setup, from scratch, a complete "high available" web infrastructure with :

  • 2 load balancer servers
  • 2 web servers (Drupal in my case) with real-time files synchronization
  • 2 database servers sharing a VIP in a master/slave replication, with MHA automatic failover

To find all posts of this package, just click the tag "complete infrastructure" at the end of the page.

Overview

The purpose of this installation is to have two database servers, in a master/slave replication configuration.
All requests from your web servers are always forwarded to the master server, which is replicated in real-time. Most of the time, the slave server will just act as a simple backup, and will not be used by the website.
We will configure the database server to be on a virtual IP address (VIP).
If the master goes down, we will setup automatic failover in order to promote the slave as a new master, and set the VIP on the slave server. This part includes checking for data integrity before the promotion of the new master.

schemabdd

The IP addresses mentioned on this schema are the IP of my test infrastructure. In the following configuration instructions, please be sure to change all IP addresses by yours.

All servers are running Ubuntu 12.04, but I assume it will work with Debian 7 too.

MySQL installation on MASTER server

Install MySQL server and MySQL client:

apt-get update
apt-get upgrade
apt-get install mysql-server mysql-client

Follow the instructions and choose a password for the MySQL root user.

Set the VIP on the MySQL master server:

ifconfig eth0:1 192.168.10.238

It’s recommended not to put this configuration in the “/etc/network/interfaces” file as a static configuration, because if the master server crashes, the VIP will be automatically switched to the slave server and we don’t want the master server to automatically reconfigure this VIP on its interface after the downtime, because the database will not be up to date on the server.

Connect to MySQL shell in order to create the new user “dbuser” and the database “website”: (please change ‘password’ by the password you really want):

mysql -u root -p
mysql> create database website;
mysql> grant usage on *.* to [email protected] identified by ‘password’;
mysql> grant all privileges on website.* to [email protected];
mysql> exit;

Import a database dump in the new database:

mysql -h localhost -u root -p website < path_to_the_dump/dump.sql

Edit the MySQL configuration file to allow remote connections to this server:

vi /etc/mysql/my.cnf

Find the line “bind-address = 127.0.0.1” and comment it if it’s not already done:

#bind-address = 127.0.0.1

Allow your web server to connect to the database server:

mysql -u root -p
mysql> GRANT ALL ON *.* to [email protected]'192.168.10.235' IDENTIFIED BY 'password';

In your website settings, use the VIP address in the mysql settings:

$db_url = 'mysql://dbuser:[email protected]/website';

At this point, the web server is now allowed to remotely connect to the master database server.

MySQL installation on SLAVE server

For the moment, just install MySQL server and MySQL client on the SLAVE server:

apt-get update
apt-get upgrade
apt-get install mysql-server mysql-client

MySQL replication on MASTER server

Edit the MySQL configuration file to enable master database replication:

vi /etc/mysql/my.cnf

Find the line “server-id” and set it to 1:

server-id = 1

Move on to the "log _ bin" line. This is where the real details of the replication are kept. The slave is going to copy all of the changes that are registered in the log. For this step we simply need to uncomment the line that refers to "log _ bin" and add the "expire _ logs _ days" to avoid huge amount of log files filling up the disk:

log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 7

Finally, we need to designate the database that will be replicated on the slave server. You can include more than one database by repeating this line for all of the databases you will need:

binlog_do_db = website

After you make all of the changes, go ahead and save and exit out of the configuration file, then restart MySQL service.

/etc/init.d/mysql restart

The next steps will take place in the MySQL shell, itself. We will grant privileges to a new “slave_user” who will be used for the replication (please choose a password):

mysql -u root -p
mysql> grant REPLICATION SLAVE on *.* to 'slave_user'@'%' IDENTIFIED BY 'password';
mysql> flush privileges;

To continue, please open a new SSH connection to the master server, in a new shell window.
In your current SSH window, switch to website database and lock the database to prevent any changes:

mysql> use website;
mysql> flush tables with read lock;

Then type:

mysql> show master status;

masterstatus

You will see a tab with some information. The important one is "Position". This is the position from which the slave database will start replicating. We will use this position number later.

If you make any new changes in the same SSH window, the database will automatically unlock. For this reason, you should open the new tab or window and continue with the next steps there.

Proceeding with the database still locked, export your database using mysqldump in the new window (make sure you are typing this command in the bash shell, not in MySQL).

mysqldump -u root -p --opt website > website.sql

Now, returning to your your original SSH window, unlock the databases (making them writeable again). Finish up by exiting the shell:

mysql> unlock tables;
exit;

Now you are all done with the configuration on the master side.

MySQL replication on SLAVE server

Log into your slave server, open up the MySQL shell and create the new database and user (same as before):

mysql -u root -p
mysql> create database website;
mysql> grant usage on *.* to [email protected] identified by ‘password’;
mysql> grant all privileges on website.* to [email protected];
mysql> exit;

Allow your web server to connect to the database server:

mysql -u root -p
mysql> GRANT ALL ON *.* to [email protected]'192.168.10.235' IDENTIFIED BY 'password';

Import the database that you previously exported from the master server:

mysql -h localhost -u root -p website < path_to_the_dump/website.sql

Now we need to configure the slave configuration in the same way as we did for the master:

vi /etc/mysql/my.cnf

Find the line “bind-address = 127.0.0.1” and comment it if it’s not already done:

#bind-address = 127.0.0.1

Find the line “server-id” and set it to 2:

server-id = 2

Following that, make sure that you have the following 4 criteria appropriately filled out: (You will need to add the last 2 lines which is not there by default.):

log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = website
relay-log = /var/log/mysql/mysql-relay-bin.log
relay_log_space_limit=2G

After you make all of the changes, go ahead and save and exit out of the configuration file, then restart MySQL service.

/etc/init.d/mysql restart

The next step is to enable the replication from within the MySQL shell.

Open up the MySQL shell once again and type in the following command, replacing the values to match your information which are in the previous tab (on the master):

mysql> CHANGE MASTER TO 	MASTER_HOST='192.168.10.236',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=  107;

This command accomplishes several things at the same time:

  • It designates the current server as the slave of our master
  • It provides the server the correct login credentials
  • It lets the slave server know where to start replicating from

With that, you have configured the master and slave server. Activate the slave server:

mysql> start slave;

You can see the details of the slave replication by typing in the following command. The \G rearranges the text to make it more readable.

mysql> show slave status\G

You should have something like this:

slavestatus

Manage master binary logs

With this configuration, the master server will keep the binary logs during 7 days. It means that if the replication fails, you have 7 days to fix the replication issue; otherwise you will lose some data.

If you need to clean binary logs on the master server you can do the following:

  • On the slave server:
    Run the “show slave status\G” command in the MySQL shell. Look at "Relay _ Master _ Log _ File". It represents the binary log on the Master currently used by the slave.

  • On the master server:
    Run the following command in the MySQL shell (be sure to type the name of the "relay _ master _ log _ file" from the previous step):

      mysql> purge binary logs to 'mysql-bin.000123';
    

This will erase older logs not needed anymore.

At this point, the master/slave MySQL replication is done.
If you try to change a value in the master database, you will see the change in the slave database instantly.

The next steps will setup MHA (Master High Availability) for the automatic failover if the master server goes down.

MHA configuration

A primary objective of MHA is automating master failover and slave promotion within short (usually 10-30 seconds) downtime, without suffering from replication consistency problems, without performance penalty, without complexity, and without changing existing deployments.

It runs on top of existing MySQL installations and uses standard MySQL replication both asynchronous and semi-synchronous.

The main features are:

  • Automatic failover
  • Short downtime
  • MySQL-Replication consistency
  • Easy installation
  • No change to existing deployments

MHA is delivered in two packages, MHA Node and MHA Manager. The node package will be installed on both servers, and the manager package only on the slave server. (If you install the manager on the master server, if the master goes down MHA will be completely useless…yep). More details at : https://code.google.com/p/mysql-master-ha/

Install MHA Node and additional libraries on both servers: (complete those steps in SSH on master AND slave servers):

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-node_0.54-0_all.deb
apt-get install libdbd-mysql-perl
dpkg -i mha4mysql-node_0.54-0_all.deb

Install MHA Manager and additional libraries on MySQL SLAVE server only:

wget http://mysql-master-ha.googlecode.com/files/mha4mysql-manager_0.55-0_all.deb 
apt-get install libdbd-mysql-perl
apt-get install libconfig-tiny-perl
apt-get install liblog-dispatch-perl
apt-get install libparallel-forkmanager-perl
dpkg -i mha4mysql-manager_0.55-0_all.deb

Create a configuration file on this server to setup MHA Manager:

vi /etc/mha/app1.cnf

With the following information:

[server default]
# mysql user and password
user=mhauser
password=password
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

master_ip_failover_script=/etc/mha/master_ip_failover

[server1]
hostname=BDD1

[server2]
hostname=BDD2

Be sure that both hostnames can be resolved by both servers; add entries in /etc/hosts file if needed or put IP addresses instead of hostnames.

The "master _ ip _ failover" script will be called by MHA when the master server goes down, to switch the VIP from the dead server to the new promoted master.

Grant access to “mhauser” on both servers: (this user is just used by MHA):

mysql -u root -p
mysql> grant all on *.* to 'mhauser'@'192.168.10.%'  identified by 'password';
mysql> flush privileges;

Create the "master _ ip _ failover" script:

vi /etc/mha/master_ip_failover

And fill it in with the code below: (This script will switch the VIP from the dead server to the new promoted master server. Please be sure to change the 2 addresses):

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

use Net::Ping;
use Switch;

my ($command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password);


GetOptions(
'command=s'             => \$command,
'ssh_user=s'            => \$ssh_user,
'orig_master_host=s'    => \$orig_master_host,
'orig_master_ip=s'      => \$orig_master_ip,
'orig_master_port=i'    => \$orig_master_port,
'new_master_host=s'     => \$new_master_host,
'new_master_ip=s'       => \$new_master_ip,
'new_master_port=i'     => \$new_master_port,
'new_master_user=s'     => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);


my $vip = '192.168.10.238/24';  # Virtual IP
my $master_srv = '192.168.10.236';
my $timeout = 5;
my $key = "1";
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
        print "Disabling the VIP on old master if the server is still UP: $orig_master_host \n";
        my $p=Net::Ping->new('icmp');
        &stop_vip() if $p->ping($master_srv, $timeout);
        $p->close();
        $exit_code = 0;
    };
    if ([email protected]) {
        warn "Got Error: [email protected]\n";
        exit $exit_code;
    }
    exit $exit_code;
}
elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
    eval {
        print "Enabling the VIP - $vip on the new master - $new_master_host \n";
        &start_vip();
        $exit_code = 0;
    };
    if ([email protected]) {
        warn [email protected];
        exit $exit_code;
    }
    exit $exit_code;
}
elsif ( $command eq "status" ) {
    print "Checking the Status of the script.. OK \n";
    #`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    exit 0;
}
else {
    &usage();
    exit 1;
}
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
#    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
`ifconfig eth0:$key $vip`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

Make this script executable:

chmod +x /etc/mha/master_ip_failover

MHA Manager internally invokes programs included in the MHA Node package via SSH.
MHA Node programs also send differential relay log files to other non-latest slaves via SSH (scp).
To make these procedures non-interactive, it is necessary to setup SSH public key authentication.
MHA Manager provides a simple check program "masterha _ check _ ssh" to verify non-interactive SSH connections can be established each other.

On the MySQL master server, do the following:

ssh-keygen

Hit “enter” for all prompted questions. (Do NOT setup any passphrase). This will create SSH public and private key.

Copy the public key to the slave server in the root .ssh directory:

cd /root/.ssh
scp id_rsa.pub [email protected]:/root/.ssh/uploaded_key.pub 

On the MySQL slave server, do the following:

ssh-keygen

Hit “enter” for all prompted questions. (Do NOT setup any passphrase). This will create SSH public and private key.

Copy the public key to the master server in the root .ssh directory:

cd /root/.ssh
scp id_rsa.pub [email protected]:/root/.ssh/uploaded_key.pub 

Finally, on both servers, do the following:

cd /root/.ssh
cat uploaded_key.pub >> authorized_keys
cat id_rsa.pub >> authorized_keys

This will allow both servers to connect in SSH between each other, using root and no password. This is needed by MHA to proceed further.

To test if everything is OK with SSH configuration, type the following command in the slave (MHA Manager) SSH shell:

masterha_check_ssh --conf=/etc/mha/app1.cnf

You should have a message like : "All SSH connection tests passed successfuly".

ssh

If "masterha _ check _ ssh" stops with errors or authentication requests, SSH configuration is not valid for MHA to work. You will need to fix it and try again. Most possible cause is SSH public key authentication is not set properly.

If SSH configuration is OK, check for replication configuration by typing the following command in the slave (MHA Manager) SSH shell:

masterha_check_repl --conf=/etc/mha/app1.cnf

Please be sure that you have the “MySQL Replication Health is OK.” message at the end before moving further. If not, you will need to fix errors shown in the logs.
As you can see at the end of the logs, MHA automatically detects which server is the master, and which one is the slave.

Now that you configured MySQL replication, installed both MHA Node and MHA Manager, and configured SSH public key authentication, next step is starting MHA Manager.
MHA Manager can be started by "masterha _ manager" command on the MHA Manager server.
In our case, we will start it in background as below:

nohup masterha_manager --conf=/etc/mha/app1.cnf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &

You can now check MHA status at any time with the command:

masterha_check_status --conf=/etc/mha/app1.cnf

Now that everything is correctly configured, we will test the most important part, the master automatic failover.

Type the following command in the MySQL master SSH shell (or halt/reboot the server):

/etc/init.d/mysql stop

Then, on the MySQL slave, (which is the MHA Manager server), check the MHA logs by typing:

tail -f /var/log/masterha/app1/app1.log

I will not copy/paste all the logs, but just the important messages below:

mha1

mha2

mha3

As you can see, in less than 15 seconds, MHA has verified data consistency, promoted our slave as a new master, and the VIP has been set on the new master. The website can still connect to the database! \o/

When the failover is complete, the MHA script will stop, and you will need to manually perform the steps below to reconfigure the cluster:

Reboot BDD1 server (old master).
Make a clean dump of the database (with lock) on BDD2 (new master):

	mysql -u root -p
	mysql> use website;
	mysql> flush tables with read lock;

Then in another SSH window on BDD2:

mysqldump -u root -p --opt website > website.sql

Connect to SSH on BDD1 and copy the dump from BDD2:

scp [email protected]:/path_to_the/website.sql .

Import dump into the database on BDD1:

mysql -h localhost -u root -p website < path_to_the/website.sql

Reconfigure replication on BDD1:

mysql -u root -p
mysql> show master status;

Enable VIP on BDD1:

ifconfig eth0:1 192.168.10.238

Disable VIP on BDD2:

ifconfig eth0:1 down

Disable locks on BDD2 and reconfigure replication (you will need to adjust the log file and position according to those shown on the master server):

mysql -u root -p
mysql> use website;
mysql> unlock tables;
mysql> stop slave;
mysql> CHANGE MASTER TO MASTER_HOST='192.168.10.236',MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=  107;
mysql> start slave;
mysql> show slave status\G

Remove MHA logs files on BDD2:

rm /var/log/masterha/app1/app1.failover.complete
rm /var/log/masterha/app1/app1.failover.error

Restart MHA on BDD2:

masterha_check_repl --conf=/etc/mha/app1.cnf
nohup masterha_manager --conf=/etc/mha/app1.cnf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &

The MySQL replication and MHA status are now OK again, waiting for another failure.

If you need to do a database dump on the slave server when the replication is running, you can do the following:

mysql -u root -p -e 'STOP SLAVE SQL_THREAD;'
mysqldump -u root -p website > website.sql
mysqladmin -u root -p start-slave

The “STOP SLAVE SQL_THREAD” enables the slave to continue to receive data change events from the master's binary log and store them in the relay logs using the I/O thread which is still UP, but prevents the slave from executing these events and changing its data. Within busy replication environments, permitting the I/O thread to run during backup may speed up the catch-up process when you restart the slave SQL thread.