We have talked about what it takes in general to use a database migration framework to manage and apply your database changes. In this post I will talk about one particular framework: Liquibase. Please note that this is an introduction to Liquibase in general. If you want a working example please see this post instead.
Liquibase is a framework written in Java used to manage and apply your sql files. Their website sports the tagline “source control for your database”, which might be a bit misleading. Liquibase is not a version control system like Git or Subversion. In fact it is meant to be used in tandem with a version control system. When you use Liquibase you will have a project, just like any old Java project, that contains your sql files. When you run this project Liquibase will install your changes to the database. You can also embed Liquibase (and your sql files) into an existing project, allowing your application to manage its own database. Liquibase is meant to bring the management and deployment of your sql files into the familiar developer realms of IDE’s, version control, and continuous integration.
In order for Liquibase to work its magic, you have to give it some hints. These hints come from markups to your SQL files, or by writing your SQL changes in XML, YAML, or JSON. This is the biggest pain point I have found for developers. Even if you stick with normal .sql files, you still have to add flags (in the form of comments) telling Liquibase how to work with the file. It is worth it. Once you get in the habit it’s no harder than writing normal sql.
Changesets and Changelogs
Changesets are units of work for Liquibase to apply. It is basically the sql you want applied to the database. Each changeset should be a single, independent unit of work. You should never have one changeset applying multiple changes unless it is absolutely necessary. Here is why:
Liquibase attempts to execute each changeSet in a transaction that is committed at the end, or rolled back if there is an error. Some databases will auto-commit statements which interferes with this transaction setup and could lead to an unexpected database state. Therefore, it is usually best to have just one change per changeSet unless there is a group of non-auto-committing changes that you want applied as a transaction such as inserting data.
Changelogs are how you tell Liquibase what changesets to apply and in what order. The order in which the changesets appear in the changelog is the order in which they will be executed. Think of them as your release documentation. They are composite-able, so you can include changelogs inside of changelogs. This is advisable so you dont end up with one massive master changelog. A pattern I like to follow is to group like changesets, for instances all changes for a given user story, into a single changelog file. This changelog is then included in the master changelog file for a release. This gives you the benefit of independently applicable changesets as well as better developer accessibility from grouping changes together in a single file.
<?xml version="1.0" encoding="UTF-8"?> <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd"> <changeSet id="storyA_change_1" author="mstratto"> <createTable tableName="testTable"> </changeSet> <changeSet id="storyA_change_2" author="mstratto"> <dropTable tableName="testTable"/> </changeSet> </databaseChangeLog>
Determining changes to be applied
So you have a bunch of changesets defined and a master changelog that ties them all together. But Liquibase doesn’t wildly go reapplying every change to the database every time. To determine which changes should be applied, Liquibase uses a combination of metadata it gathers and stores in the database, changeset id’s and md5 checksum hashing.
The first time you run Liquibase it creates two tables in the database: DATABASECHANGELOG and DATABASECHANGELOGLOCK. The lock table is a simple mutex table used to prevent concurrent Liquibase runs from stepping on each other. DATABASECHANGELOG is used to keep track of what changesets have been applied to a database. The initial run will obviously be against an empty table, so all changesets will be applied. On subsequent runs, however, Liquibase will only apply changesets that aren’t in the database or have the
runOnChange attribute set.
When Liquibase applies a changeset, it computes an md5 hash of the contents of that change and stores it in the DATABASECHANGELOG table. If you change a changeset after it has been applied, the md5 hash will be different than what is in the database and Liquibase will halt execution because the database may be in a state different than what the changelog is expecting. Liquibase doesnt want to shoot you in the foot that easily. If you really want a changeset to be applied even when it changes there is a
runOnChange attribute you can use. This can be useful for managing your views and stored procedures whose state is more easily tracked with a version control system than Liquibase.
The id of a changeset is actually a little more than its
id attribute, confusingly enough. Liquibase uses a combination of id/author/filepath to create a changeset id. This can be important if you try to change the author or file path of a changelog after it has been applied. If you really need to move a file around, there is a
logicalFilePath attribute of the
databaseChangelog tag which will override the physical file path when Liquibase determines the changeset id. So you can move the files and set the logical file path to whatever it was before, and Liquibase will be happy.
Liquibase comes in the form of a jar file. So you can simply run it from the command line:
liquibase --driver=com.mysql.jdbc.Driver \ --classpath=/path/to/classes \ --changeLogFile=com/example/db.changelog.xml \ --url="jdbc:mysql://localhost/example" \ --username=user \ --password=asdf \ migrate
There is also an excellent Maven build plugin and some very user friendly Spring integration. You can also use Ant, a servlet listener or a CDI environment, but I have no experience with these last three.
Liquibase has a rollback mode. You can roll back to a specific date, a number of changesets, or to a database tag (a tag is an attribute of a changeset and is often used to group changesets into a releases or application version). For simple changes, such as
create table or
add column, Liquibase can automatically generate the rollback sql for you. For more complicated changes, you can specify exactly how to rollback the change with the
rollback tag. If you use an empty
rollback tag, Liquibase will not try to rollback the change.
I haven’t been able to find a comprehensive list of changes Liquibase can auto-rollback, when in doubt specify rollback procedures. Rolling back changes is a tricky thing. If you truly rely on Liquibase’s ability to rollback changes I suggest to test it heavily before trying it in a production environment.
Preconditions are used to conditionally execute changelogs or changesets. There are many built in precondition tags such as
columnExists. You can also specify
<sqlCheck expectedResult="1">SELECT COUNT(1) FROM pg_tables WHERE TABLENAME = 'myRequiredTable'</sqlCheck>
You can also create custom precondition classes and reference them with the
<customPrecondition className="com.example.CustomTableCheck"> <param name="tableName" value="our_table"/> <param name="count" value="42"/> </customPrecondition>
Different preconditions can be chained together with AND, OR, NOT logic. Multiple tags in your precondition block are implicitly combined with the AND operator, meaning all preconditions must pass for the changeset to run. You can also specify what you want to do when a precondition fails with the
Preconditions can be a powerful tool for hardening your sql against errors. If you are dropping a column or table, it is tempting to write a simple drop command in a changeset and execute it. This will work, however, if the database object doesnt exist the first time, or if the changeset is ran again for some reason, Liquibase will break. Writing a simple precondition which checks for the object’s existence before attempting the drop will save you from headaches.
Contexts are a way to conditionally execute changesets based on what contexts have been set at runtime:
<changeSet id="prod_only_hotfix" author="mstratto" context="prod"> ... </changeSet>
You can pass any number of contexts to Liquibase at run time. You can also specify contexts as a logical expression with AND, OR, ! and parentheses. If you do not specify a context, the changeset will be ran regardless of what contexts are set. Contexts are a powerful tool when you need to do environment specific changes or for managing multiple applications/databases from the same Liquibase project.
Liquibase is a great tool for managing and applying your database changes. It has many features that can be used to harden your sql and make the application of your sql more intelligent. By using Liquibase with an automated deployment framework such as Puppet, you can bring your database into the 21st century.