Managing sql scripts for a small database is simple enough, but what about when you have a large enterprise scale database? How do you manage and apply sql scripts for a database that is built up over many years, developed in parallel by many developers, and made up of hundreds of tables, views and store procedures? Database Migration frameworks such as Liquibase and Flyway have arisen in the past few years as an attempt to solve this problem.
This post is the first in a series on managing enterprise database changes and applying them using a database migration framework. In this post I will go over some fundamentals and thought processes to help you use these frameworks effectively. Other posts will introduce you to specific frameworks and demonstrate working examples. Regardless of how each one works, these concepts should be applicable to any of them.
The Old Way
Everywhere I have worked has had roughly the same process when it comes to releasing sql changes. As changes are made to the database in the dev environment, these changes are stored in some folder structure on the file system that is tracked by version control, which is used to sync the scripts between developers. Branches also help track database state across releases. When it comes time to release a developer goes through the commit messages and and makes a release document of the changes to be applied to the higher environment. If your team is really good they create and update the release documentation as they develop.
Pro’s and Con’s
- Straight forward – it is familiar to us and easy to understand
- No special processes or software required – DBA’s and new developers wont need special training
- Time consuming – Developers spend time creating and maintaining documentations. DBA’s have to manually open and execute sql scripts
- Error prone – Developers can forget to add or misorder files. DBA’s can make mistakes when applying changes
- Painful – painful to create and maintain documentation over release cycle. Painful for DBA’s to apply. Also think about the last time you had to set up a new environment, or update an older environment which is a few release behind. How much fun was that?
- Renders all your CI efforts pointless – Continuous Integration is one of the buzzwords today. Deploying application code is generally the most straight forward part of a deployment. What is the point of using a tool like Puppet to drop a war file in a web servlet container if you have 500 manual database upgrade steps that have to be executed? I would argue that you shouldn’t even bother with CI of your application until you can automate the deployment of your database.
The New Way
The new way is to use a framework to manage and apply your database changes. Instead of having developers keep track of the database state and meticulously organize sql files and write instructions for humans to follow, you allow a program to keep track of the database state to determine the changes that need be applied. The execution order is built into the files as you write them, so you essentially end up writing the release document as you write the code. Your database becomes a single artifact to be deployed instead of a bunch of disparate files scattered on a file system.
Writing Machine Friendly SQL
Up to this point your database has been updated by an intelligent, friendly (hopefully) human being. You have likely taken this for granted without even realizing it. Your sql will now be executed by a machine. You need to expect bad things to happen and take precautions. When you write a program, do you assume all input you get will be valid? No. Do you expect everything to go perfectly every time? Of course not. You should have the same mindset when writing your database changes.
- What happens if your database crashes while your changes are being applied, will your deployment tool be able to gracefully pick up where it left off?
- How will you roll back changes if you need to?
- Will this change destroy data you might need in the future?
You should already be considering alot of this, I just want to emphasize that you need to be more aware of it when using automated software to update your database. You no longer have a DBA to say “are you sure?” before applying a given change.
A good way to harden your sql is to make your changes idempotent. That is, if for some reason your sql is ran multiple times, there is no change beyond the initial application. For example, you want to drop a column colA on table A. You write a quick sql script
alter table A drop column colA. You run it and it runs great. But what happens when you run it again? You get an error that the column doesn’t exist. A DBA can confirm if this is acceptable and continue. A program will think it is the end of the world and crash. To make this idempotent, check if the column exits or not and only run the sql if the column exists. We can improve this change by first checking if the column exists before trying to drop it.
For every change ask the question “what happens if it runs twice?” It is likely impossible to make every change idempotent, but you should try your hardest to make it so. Migration frameworks often provide tools to help with this, such as Liquibase’s preconditions.
The best thing about using a framework to apply database changes is that you can now integrate running these tools with automated deployment frameworks like Puppet or Jenkins. Just like you use a tool like Puppet to deploy jar files to your server, you can now use it to deploy your database. The deployment process of your database can literally become “push your changes to the master branch.” You don’t even have to push a button if you don’t want, let alone write a comprehensive document of changes to be applied. It is a beautiful thing to see in action.
A framework might come with these integrations out of the box. For most, though, you will have to write some kind of simple script to connect the database migration framework to your deployment tool.
Pro’s and Con’s
- Drastically reduce the risk of human error – computers are way better at following instructions than humans. They literally do exactly what you tell them.
- Easily integrate deployment of database with CI tools – with a computer program applying your changes, you are a simple wrapper script away from completely automating the deployment of your database
- Reduce release documentation – your sql changes become their own release documentation.
- Learning curve – you and every developer on your team (and probably DBA’s) will have to learn to use a new tool.
- Convincing DBA’s to go along with it – DBA’s, in my experience, are paranoid and want to know what you are doing to “their” database. This might make the adoption of one of these frameworks an uphill battle.
I had a professor that said “if your application doesn’t use a database, it is a toy.” While that is obviously an over-generalization, he has a point. How many enterprise applications have you worked on that didn’t revolve heavily around a database? If the database is so integral to your application then why does it feel so separate? Database migration tools bring the management and deployment of sql into the developer’s hands.