Tracking Database Schema And Schema Migrations In Git Version Control

By Ben Nadel

Published 2022-03-05 in ColdFusion, Docker, SQL — Comments (1)

A decade ago, I attended a presentation from Tim Cunningham titled "Version Control on the Database: the Final Frontier". At the time, I was relatively new to version control. At InVision, we had just started using git and GitHub; and, before that, I had only dabbled briefly with SVN (Subversion). I was still struggling to wrap my head around branching and merging and rebasing. So, when Tim suggested that we should also be storing our "database" in version control, I simply wasn't ready to hear it. I mean, it sounded awesome; but, due to my lack of experience, I couldn't connect - at a practical level - with anything he was saying. Cut to 2022 and I can't imagine not tracking our database schema and schema migrations in version control!

Tim Cunningham seemed to be ahead of his time, at least in the ColdFusion community. I had never seen a presentation like that before; and, now that I think about it, I don't believe that I've seen a presentation like that since. Apparently, tracking schema migrations isn't a sexy thing to talk about. Though, I will say that Chris Toomey and Stephanie Viccari often refer to Active Record Migrations (Ruby on Rails) on the Bike Shed podcast - one of my favorite podcasts.

When I was sitting in on Tim's presentation, the missing piece for me at the time was the fact that the state of database can be expressed in code. There are two general categories for this code:

DDL (Data Definition Language): This code defines the structure of your database and the tables within it. Character sets, collation, columns, default values, indexes, foreign keys, etc - this is everything about the database, less the actual data.
DML (Data Manipulation Language): This code provides the data for the database that has been defined (and altered) using the DDLs. These are the INSERT, UPDATE, and DELETE statements that capture the state - and the changes to the that state - over time.

But, don't let the fancy terms fool you, it's all just SQL statements, nothing more. It's just a series of SQL statements, executed in a predictable order, that leaves you with the most current state of your application database.

Running these SQL statements in a predictable order can get fairly fancy-pants - at work, we use a product called Liquibase to keep all of our environments in sync. But, it doesn't have to be fancy and automated to be valuable. In my local development environment, I hand-roll my SQL statements and put them in the docker-entrypoint-initdb.d directory of my MySQL Docker container. Then, when the MySQL Docker container is created, it simply executes all of the files in this directory in alphabetical order, applying them in turn, and leaving me with a predictable database schema.

For my blog's local development environment, if I list (ls) the contents of the initdb folder, here's what I get:

% ls -l build/mysql/initdb 
total 203776
2020-01-11-001-initial-structure.sql
2020-01-11-007-display_status.sql
2020-01-11-008-member.sql
2020-01-11-009-people.sql
2020-01-11-012-site_photo.sql
2020-01-11-013-tag.sql
2020-03-21-002-blog_entry_blog_entry_jn.sql
2020-03-21-003-blog_entry_tag_jn.sql
2020-03-22-001-blog_entry.sql
2021-11-18-001-drop-old-tables.sql
2021-11-19-001-drop-exercise-list.sql
2021-11-19-002-drop-kinky-calendar.sql
2021-11-19-003-drop-is-initialized-column.sql
2021-11-19-004-drop-old-admin-table.sql
2021-11-22-001-comment-edit-token-table.sql
2021-12-05-001-change-default-charset.sql
2021-12-05-002-comment-moderation.sql
2021-12-15-001-dropping-extra-indices.sql
2022-01-06-001-drop-comment-author-columns.sql
2022-01-22-001-recreate-comment-edit-token.sql
2022-02-28-001-mysql8-charset-defaults.sql

As you can see, all of these files follow the same format:

{ yyyy-mm-dd }-{ counter }-{ description }

Since the MySQL Docker container runs these SQL files in alphabetical order, I use the date-prefix with a counter in order to make sure that these are executed in a top-down manner. Every time I teardown my local development environment, the database is destroyed. And, every time I bring it back up again, these SQL files run against the empty database and leave me with a "known good state".

ASIDE: You don't actually have to destroy your database every time you shutdown your Docker development environment. You can configure the Docker container to store the MySQL data outside of the container, which will persist it across down/up cycles.

Now, you may notice that my first DDL (Data Definition Language) file is from January 2020. Of course, if you look at my blog posts, you'll see that I started blogging in 2006. That means it took me 14 years to start properly tracking my blog's schema changes.

Yikes! That's a bit embarrassing.

But, the good news is, it means that it's never too late to start tracking database schema migrations, even in a brown-field application like mine. All I did to get started was go into my MySQL Database GUI (Graphical User Interface) of choice (Navicat for MySQL), and dump the entire structure of the database to a SQL file:

Navicat for MySQL showing a 'Structure Only' SQL dump about to be executed as the basis of the first DDL (Data Definition Language) migration file.

This feature in Navicat for MySQL generates a .sql file with a series of CREATE TABLE statements, one for each table that currently exists in the application. I took that .sql file, renamed it 2020-01-11-001-initial-structure.sql and then dropped it in my build directory.

My first DDL (Data Definition Language) file! Woot woot!

Then, I selectively went to each table and dumped its data out to a separate SQL file using the same GUI feature (selecting the Structure + Data option rather than the Structure Only option). This generated SQL files with a bunch of INSERT INTO statements. I renamed those files and dropped them into my build directory.

My first DML (Data Migration Language) files! Hecks to the yeah!

At that point, I had a set of .sql files that allowed me to recreate the current state of my database. Files that were checked into my docker repository; files that would be executed from a blank slate; files that would leave me with database structure that exactly matched my production database no matter where or when I ran them.

ASIDE: While I was committing all the table "structures" to git, I wasn't committing all the table "data" to the schema migrations. For example, I never committed the comment table data as a DML - it would be too much data. If I want the comments to be loaded locally, I'll run that as a special SQL file after the fact.

From that point on, it just became a matter of generating new database schema migration files locally first before running migrations in production. This way, I make sure to always keep my local development database in sync with my production database. In fact, my local development database has become my "source of truth" for the database schema. And, I have to manually apply migrations to production (I don't have any fancy build tooling for my blog).

A decade ago, databases felt a lot more like a black-box. So, when Tim talked about persisting them to version control, I didn't have the mental model I needed to understand how something like that would even work. But, once I realized that the database could be created and migrated with a series of .sql files, suddenly it all clicked. It all became "just code". And SQL files, like ColdFusion, JavaScript, HTML, and CSS files, can be committed to version control all the same.

Epilogue on my MySQL Docker Container

While this isn't entirely relevant in terms of tracking database schema migrations, I did refer to my "build" directory in the article. As such, it might be helpful to see how my MySQL Docker container is configured in my docker-compose.yml file locally:

version: "2.4"

services:
  # ... truncated for demo ...

  # MySQL server.
  mysql:
    image: "mysql:8.0.28"
    ports:
      - "3306:3306"
    environment:
      MYSQL_DATABASE: "bennadel"
      MYSQL_ROOT_PASSWORD: "password" # LOCAL DEVELOPMENT ONLY!!!!!!!!
    volumes:
      # Configuration scripts - the SQL files in this directory will be run when the
      # container is first created. Files will be executed in alphabetical order.
      - "./build/mysql/initdb:/docker-entrypoint-initdb.d"
      # Data persistence directory - the underlying database data will be stored on the
      # host machine in this directory. But, this directory is managed transparently
      # by Docker Desktop for Mac - I have no idea where it actually is.
      - "bennadel_mysql_data:/var/lib/mysql"
    healthcheck:
      test: "echo hello"

volumes:
  bennadel_mysql_data:

What this is doing is pulling the common mysql:8.0.28 image from Docker Hub. Then, using the environment variables, it creates a database called bennadel. In the volumes configuration, I'm mounting my local directory (on my computer), initdb, to the docker-entrypoint-initdb.d directory within the MySQL container. This is the directory that contains my .sql database schema migration files that we discussed in this blog post. And, it's these files that get executed whenever the container has to build the database (ie, when the database hasn't been persisted to the named volume).

Honestly, I'm not very good at Docker yet; so, I won't try to explain this in any more depth. But, this seems to be good enough for my personal development needs. As I stated on the Working Code podcast, episode 57, one of my big goals for 2022 is to become better at "doing containers" and deploying them to production. But for now, hopefully this is helpful enough.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4221

Reader Comments

Ben Nadel Mar 6, 2022 at 12:24 PM

16,204 Comments

One thing that I wanted to point out was that when you run a .sql file, the ALTER statements that it contains will likely lock up the target table while the modifications are being performed. Some modifications can be performed "online" (ie, without any performance implications); but, most meaningful migration have to lock the table in order to apply the changes.

This is why, in the past, I've scripted the creation of "shadow tables" and then used ColdFusion to iterate over a table and "migrate rows" from the old table to the new shadow table:

www.bennadel.com/blog/2865-performing-online-alter-table-sql-migrations-without-any-downtime.htm

That said, migration tool like Liquibase, which uses Percona Toolkit under the hood, do this kind of stuff for you. They perform all the shadow table calculations and make sure not to overwhelm the CPU or generate too much Replica lag. For huge database, this can be a massive help, especially when you need to migrate data without downtime.

So, getting started with tracking migrations is super easy, as I've outlined in this post. But, the concept can scale-up to meet your demands when vanilla .sql files are not cutting it.

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.