Naming things is one of the hardest parts of computer science. Which I believe is a big part of why data modeling is so hard: it's challenging to see how one concept can be decomposed into multiple concepts when you don't know what those smaller concepts are called. This is why my relational database tables tend to get wider over time (ie, gain additional columns). But, after many years of building data-driven application using SQL and ColdFusion, I've come to appreciate how powerful it can be to keep database tables narrower.
To be clear, I am a big fan of relational databases. I love Redis; I use MongoDB when I need to; but, when it comes to building out a new feature, my persistence mechanism of choice is a powerful, flexible, battle-tested relational database. Very few things give me as much pleasure as thinking about table schemas, considering high-performance index design, and writing SQL.
That said, I've made many mistakes in my relational database table design; many of which revolve around building tables that were too wide. I've been building applications for a long time; but, I've only started to think critically about data-access patterns and more complex index structures in the last few years.
Ideally, I'd love to have the structure of my relational database tables have a higher correlation with my data access patterns, not necessarily with my broad - and often over-generalized - conceptual models. Meaning, I would prefer if most of the columns in a given table were actually consumed whenever a given row was read out of the database.
This isn't a hard rule-of-thumb - databases are meant to be flexible and consumable in a variety of user interfaces (UI). But, if some set of table columns is rarely used, this is a good indication that those columns should have been split out into their own table and own conceptual model.
Narrower tables - tables with fewer columns - have a variety of benefits:
Narrow tables mean less data needs to be read off of disk.
Narrow tables mean less data needs to be kept in the database's "working memory".
Migrating data is generally easier with narrow tables since it is typically faster and safer to create new tables when compared to adding new columns to an existing table.
Narrow tables force you to think more critically about your data-model; and about elevating related columns into a more official entity concept.
Narrow tables may lead to fewer transactional locks.
Narrow tables may make it easier to create covering indexes for better performance.
Narrow tables may make it easier to design unique indexes that prevent dirty-data through idempotency without then need for transactional locks.
Narrow tables may lead to shorter tables (ie, tables with fewer rows) which means your indexes require less processing overhead to maintain.
Narrow tables make it easier to avoid using
NULLvalues - and
NULLvalues should be avoided as much as possible (though not always).
Of course, narrow tables aren't all sunshine and roses. Narrow tables also mean that some of your queries become more complex with
JOIN clauses; or that your application code becomes more complex with "application joins". But, if narrow tables also map more closely to your data access patterns, then this isn't as big a downside as you might, at first, consider it.
Going forward, when I need to keep track of a new data-point in my relational database system, my first questions are going to be: Is this a new concept? And, what benefits would I get out of codifying this data in a new database table? And, only once I rule out those possibilities will I consider adding the data to an existing table.