Today's post is from one of our own talented employees at Webonise. Ameya Kulkarni is the VP of Engineering and based in our Pune, India office. Today he's talking about complexities with Docker and Mysql, and his recent experience navigating a tricky situation. He also references WWE. (fun!) Read on if you want to know how these relate.
Coast to Coast...Wait..What?
I used to be a big fan of WWE (World Wrestling Entertainment). One of my favorite moves was by Shane McMahon, called “coast-to-coast”. In this move, he would do a missile dropkick from one corner of the ring to another. To my surprise, years later, I found this to be my inspiration for designing an engineering solution.
In one of the technical projects that I led, we needed to Dockerize our entire stack in order to adhere to a 12 Factor methodology for application development.
One of the most valuable advantages of this methodology is that it ensures environment parity. We can rest assured that the code, configuration, and environment on the testing/staging environments is the same as on production. Therefore, this ensures that things that work in testing don’t break in production. In addition to this, our services becomes immutable, reproducible and environment-agnostic. We can make informed assumptions of the starting state of transient services, and we can take a testing build and deploy it on other environments knowing it will behave the same.
The project’s stack had common components such as a static frontend hosted on Nginx calling a RESTful API, which in turn used a Mysql database and a Redis server to store data. All of these components were running in their own Docker containers on their respective VMs.
The application’s network topology was designed to encapsulate and protect resources that should not be public. It used an AWS VPC with public and private subnets. The internet-exposed static HTML frontend resided in the public subnet, and the other servers resided in the private subnet.
In addition to adhering to the 12 Factor methodology, we replicated our network topology, configuration, source code, and data in two AWS regions for the sake of high availability and redundancy. We ran the live instance of the system in the US-West-1 region , henceforth called “Live,” and the standby application in the US-East-1 region, henceforth called “Standby.”
Live traffic was directed to Live, and the Standby environment was expected to replicate it over a period of time. Standby was expected to be groomed and usable in seconds (at max 2 mins) if it ever was needed in case of an outage. An outage can range from a simple hardware failure to the host server being under water due to an earthquake or flood. (This means that if for any reason if the environment goes down, we can range from a simple database service going down due to load to the entire of the California being swallowed by the Pacific Ocean) of Live.
This meant that the DB of Standby had to be mirrored with the state of the DB of Live. Because of network security and our preference for Docker, the two Mysql Server counterparts ran inside Docker containers which were running on separate VM’s inside private subnets of separate VPC of two different AWS regions.
We implemented and tested a database backup and restore protocol that operated well in these circumstances. In my experience, database backups often would be taken, but rarely tested..With this solution, we attempted to kill two birds with one stone: restoring and validating data transfer simultaneously
Options for Solution
Since we were using Docker already, I figured Docker was the best bet for the solution. I explored Dockerhub to check whether the community had created a solution ready that was just a drop-in. There were a couple of promising solutions proposed in the community forum:
1. Tutum’s excellent Mysql backup image - It created a .sql backup on the host’s volume
2. Tutum’s Dockup - This tar-balled the volumes from any container and uploaded to a S3 bucket
The above images were excellent and complementary. The first image provided the backup in the way I wanted for the .sql file, but did not upload it to an offline location (s3). The second image uploaded to s3 but did not produce a .sql file, but instead a .tar.gz of the /var/lib/mysqlfolder of the container as a backup.
I wanted a combination of both with a flexible way to control my backup and restore mechanism.
Inspired by Tutum’s projects, I set out to create a utility which would be packaged as a Docker container that could just work and present us with an offline backup and restore mechanism. The core elements of this solutions were:
Run mysqldump against a database
Upload .sql to a well known location in s3 for easier retrieval
Retrieve the dump from the well known location
Run the restore via mysql command line to restore the data
Another choice that I made was to run this entire utility as a Docker container. But since Docker best practice is to run one process per container, achieving the two above together would require a delegating management utility. That meant wrapping the two commands in some kind of wrapper. Due to its spartan nature (lack of VM) at runtime, I chose to code this wrapper in Golang.
The Go code when built, yields a binary file that is packaged via Docker and pushed to Dockerhub.
To run this as a Docker image, you need to have Docker installed on your machine
Since this is a Go binary you can run it as-is by setting the environment variables as mentioned in the Readme
You would need valid s3 credentials for an IAM user with proper bucket rights
The machine where this utility is being run needs to have access to the internet, and the database should be reachable from it
To access it just have to do the following command:
$ Docker pull kaddiya/mysql-backup-restore:2
Note : Not a fan of the latest tag. This library will be revised numerically. The library can be used in two setups.
1. Where the database instance to backup is accessible atsample.db.com and the database instance to restore is accessible atstandby.db.com
Usage for Backup
$ Docker run --env dumper_db_host=sample.db.com --env dumper_db_port=3306 --env dumper_db_user=user --env dumper_db_password="pswd" --env dumper_db_name="sample" --env s3_access_key=access_key --env s3_secret_key="secret_key" --env s3_bucket_name="sample-db-backups" --env dumper_mode=BACKUP --env dumper_s3_region="us-west-1" --env path_in_bucket="/data" kaddiya/mysql-backup-restore:2
Usage for Restore
$ Docker run --env dumper_db_host=standby.sample.db.com --env dumper_db_port=3306 --env dumper_db_user=user --env dumper_db_password="pswd" --env dumper_db_name="sample" --env s3_access_key=access_key --env s3_secret_key="secret_key" --env s3_bucket_name="sample-db-backups" --env dumper_mode=RESTORE --env dumper_s3_region="us-west-1" --env path_in_bucket="/data" kaddiya/mysql-backup-restore:2
2. Where the database instance to backup is running in a Docker container named container.db.com and the database instance to restore is running in a Docker container namedcontainer.standby.db.com
Usage for Backup
$ Docker run --env dumper_db_host=sample.db.com --env dumper_db_port=3306 --env dumper_db_user=user --env dumper_db_password="pswd" --env dumper_db_name="sample" --env s3_access_key=access_key --env s3_secret_key="secret_key" --env s3_bucket_name="sample-db-backups" --env dumper_s3_region="us-west-1" --env path_in_bucket="/data" --env dumper_mode=BACKUP --link container.db.com:container.db.com kaddiya/mysql-backup-restore:2
Usage for Restore
$ Docker run --env dumper_db_host=standby.sample.db.com --env dumper_db_port=3306 --env dumper_db_user=user --env dumper_db_password="pswd" --env dumper_db_name="sample" --env s3_access_key=access_key --env s3_secret_key="secret_key" --env s3_bucket_name="sample-db-backups" --env dumper_s3_region="us-west-1" --env path_in_bucket="/data" --env dumper_mode=RESTORE --link standby.db.com:container.standby.db.com kaddiya/mysql-backup-restore:2
A note on what’s happening here:
The library takes the liberty of creating two folders in the s3_bucket_name/path_in_bucket path.
These two folders are - latest - archived
The backup and restore mechanisms are complementary of each other. For the Yin of backup there is the Yang of restore. This library tends to accentuate this in its polymorphic way. The switch is the environment variable mode.
1.In the BACKUP mode it uploads the latest backup of the database at sample.db.com to s3_bucket_name/path_in_bucket/latest/sample-latest-backup.sql: The latest backup is stored at s3_bucket_name/path_in_bucket/archived/timestamped-sample-backup.sql
2.In the RESTORE mode it downloads the latest backup from s3_bucket_name/path_in_bucket/latest/sample-latest-backup.sql and restores it at the database instance running at standby.sample.db.com
It currently doesn’t have the ability to take in a particular filename to restore the database from a file in the archived folder. This will be useful for restoring a database to a point in time.
The examples are just a starting point on how to use this utility. With different combinations of the dumper_db_host and dumper_db_nameand mode and path_in_bucket you can achieve other things as well, such as :
Having a dump of a particular environment on a local development machine to debug an issue.
Resetting the database to a particular baseline at a point in time
This library can be used to abstract away the complexity of backup and restore of databases under any topology
Due to the containerization, execution of this library is extremely cheap. We can spawn many backups in a fixed hardware setup due to the minimal footprint of Docker. This is especially useful if we have to transfer the dump of an environment across other environments for testing purposes.
With no assumptions on the context of execution of the library, the control for execution is entirely in your hands, and you can run the backup and the restore scripts on demand, daily, hourly etc.
With this utility I was able to migrate my data on Live from one coast (US-West-1) by running the library in the BACKUP mode and restored Standby on US-East-1 by running it in the RESTORE mode.
I synchronized the execution on the VMs by introducing ntp and made the restore run two minutes after backup.
This is an approximation that is made. We can take “X” minutes where X is the definite time that a backup will be completed successfully. To make it more definitive, the backup container can send a signal to a central service which then queues up the restore job.
In case of an outage on Live, I could run the scripts on demand and have Standby groomed to the latest snapshot trivially.
(groomed meaning : ready to go/up-to-date/duplicated)
Please take this for a spin and let me know how it works for you.
Also, a big thanks to Tom Bishop for guiding me to start down this solution lane!
Ameya is the VP of Engineering at Webonise. Read more of his work on his website: https://kaddiya.in/about/ Let us know what you think-- feel free to reach out at firstname.lastname@example.org to chat further!