I Wish My Relational Database Tables Were Narrower
Naming things is one of the hardest parts of computer science. Which I believe is a big part of why data modeling is so hard: it's challenging to see how one concept can be decomposed into multiple concepts when you don't know what those smaller concepts are called. This is why my relational database tables tend to get wider over time (ie, gain additional columns). But, after many years of building data-driven application using SQL and ColdFusion, I've come to appreciate how powerful it can be to keep database tables narrower.
To be clear, I am a big fan of relational databases. I love Redis; I use MongoDB when I need to; but, when it comes to building out a new feature, my persistence mechanism of choice is a powerful, flexible, battle-tested relational database. Very few things give me as much pleasure as thinking about table schemas, considering high-performance index design, and writing SQL.
That said, I've made many mistakes in my relational database table design; many of which revolve around building tables that were too wide. I've been building applications for a long time; but, I've only started to think critically about data-access patterns and more complex index structures in the last few years.
Ideally, I'd love to have the structure of my relational database tables have a higher correlation with my data access patterns, not necessarily with my broad - and often over-generalized - conceptual models. Meaning, I would prefer if most of the columns in a given table were actually consumed whenever a given row was read out of the database.
This isn't a hard rule-of-thumb - databases are meant to be flexible and consumable in a variety of user interfaces (UI). But, if some set of table columns is rarely used, this is a good indication that those columns should have been split out into their own table and own conceptual model.
Narrower tables - tables with fewer columns - have a variety of benefits:
Narrow tables mean less data needs to be read off of disk.
Narrow tables mean less data needs to be kept in the database's "working memory".
Migrating data is generally easier with narrow tables since it is typically faster and safer to create new tables when compared to adding new columns to an existing table.
Narrow tables force you to think more critically about your data-model; and about elevating related columns into a more official entity concept.
Narrow tables may lead to fewer transactional locks.
Narrow tables may make it easier to create covering indexes for better performance.
Narrow tables may make it easier to design unique indexes that prevent dirty-data through idempotency without then need for transactional locks.
Narrow tables may lead to shorter tables (ie, tables with fewer rows) which means your indexes require less processing overhead to maintain.
Narrow tables make it easier to avoid using
NULL
values - andNULL
values should be avoided as much as possible (though not always).
Of course, narrow tables aren't all sunshine and roses. Narrow tables also mean that some of your queries become more complex with JOIN
clauses; or that your application code becomes more complex with "application joins". But, if narrow tables also map more closely to your data access patterns, then this isn't as big a downside as you might, at first, consider it.
Going forward, when I need to keep track of a new data-point in my relational database system, my first questions are going to be: Is this a new concept? And, what benefits would I get out of codifying this data in a new database table? And, only once I rule out those possibilities will I consider adding the data to an existing table.
Reader Comments
You need to look at https://dgraph.io/ - you'll never want to touch a traditional relational database again.
@Advocate,
I don't have any hands-on experience with GraphQL-oriented approaches. That said, my understanding is that GraphQL is just a layer between the client and the database; so, you still need to have a database in there. Maybe DGraph is doing something magical (I have never heard of it before); but, it doesn't seem like it obviates the needs for a database itself.
Have you looked into CQRS and possibly eventsourcing. It frees you up to build tables/documents per web view without the need for joining multiple tables. It can mean data duplication but storage is cheap and your pages become faster as you remove joins.
@Stevef,
I've read about event-sourcing, but I don't have any hands-on experience with it. In theory, I love the concept of materialized views and building denormalized tables that implicitly represent joined data and data that has been pre-manipulated for a given view's use-case. But, I've never actually done that myself.
The closest that I come to that today is creating lots of SQL queries that are view-specific. So, I still do all the
JOIN
stuff myself; but, I don't try to shoe-horn the view-specific queries into the main data-access layer for the "entity" CRUD stuff.What I mean is, that for "User Interface X", I'll have some data-access method like:
userInterfaceXGateway.getData()
... which could - in theory - be pulling data from some materialized view; but is - in reality - just pulling from the main database tables with all the
JOIN
s and whatnot.So, I love the idea. I just haven't actually ever built it myself.
@Ben, DGraph is basically a graph database which you access directly via GraphQL, and it ticks all the boxes you mention up above. Define a schema for your data and it gives you a GraphQL API that supports many kinds of search, aggregation and filtering. Relationships can be 1- or 2-way and are linked at insert time, so no joins needed - just query for the data you need through the links! They have a nice docker image (dgraph/standalone) which can be used to quickly check it out (note that it dials home). All you need to query a GraphQL API is a HTTP client like curl, so no fancy client setup needed either, but I would highly recommend graphql-explorer for the purpose!
@Advocate,
I know other people at work rave about GraphQL. I'll have to try and find time to look into it more deeply. I've tried a few times, but was turned-off by the amount of up-front work I seemed to have to do with all the schema definition. It sounds like perhaps DGraph removes some of that boiler-plate work.
It is quite hard though to work out which columns will be used in some circumstances and after sometimes how much they are used.
Once they are in that table and you think to remove them your left wondering should I be spending my time doing something else, perhaps more worthwhile. When the gain from narrowing perhaps isn't that clear.
I'm still working on being able to predict the future.
Also narrower tables means more tables which can get out of control as well. Naming can get more difficult.
@Adam,
1000% on naming stuff becoming harder when you have smaller tables with fewer columns. That, along with not being able to predict the future (I, too, am working on that) means that we have a lot of tables with names that make life a lot harder. And, the idea of renaming tables is a massive effort due to the amount of SQL that gathers around these tables over time.
My early tables were all too generic in name. In the later years of my application, we started to prefix tables with a sort of name-space so that you can see which tables are tightly coupled. But, that also leads to some rather long table-names.
It's definitely not an easy problem to solve.