SQL "Join Tables" Are Just "Entity Tables" With Hard-To-Name Concepts
Over the past few months, I've been incrementally modernizing the public-facing portion of this ColdFusion blog. And, all in all, it's been going quite smoothly! But, when I started to work on modernizing the administrative portion of the blog, things got a little hairy. Not only is the administrative code even older and jankier than the public code, it requires CRUD (Create, Read, Update, Delete) operations on "join tables" in the MySQL database. And, up until now, I haven't had to think about modeling these join tables in a data access layer (DAL).
My current database modeling contains two "join tables". For the moment, let's consider these "join tables" to be tables that do nothing more than relate entities in a many-to-many relationship. My two join tables are:
blog_entry_blog_entry_jn- this allows two blog entries to be related to each other. In my UI (user interface), I list out the related entries at the bottom of a blog detail page.
blog_entry_tag_jn- this allows me to categorize a blog entry as belonging to an arbitrary set of tags (ex, "ColdFusion", "SQL", "Work"). In my UI, I list out the tags associated with each blog post. And, I also allow the user to look up all the blog entries under a given tag.
As I started to flesh-out an official data access layer in my administrative module, I thought that starting with "tags" would be a "quick win" since it's so simple. But, I had forgotten about this
blog_entry_tag_jn table. And suddenly, I was mentally blocked. I wasn't sure if the "tag" DAL should be responsible for it? Or, if the "blog entry" DAL should be responsible for it?
At work, I brought this dilemma up in the Architecture Office Hours meeting (my favorite meeting of the week), and Alex Treyger - one of our Principal Engineers - pushed back against the very notion of a "join table". He argued (in as much as I understood) that such an idea was nothing more than an implementation detail of the data persistence technology.
At the time, I didn't entirely followed what he was saying. But, with some time to reflect on it, what I think what he meant was that in the application code there is no "join table" - there's just an architectural relationship between entities. And, that it's merely incidental that, in a relational database system, I happen to have chosen to serialize that concept in a separate table. And that, if I was using something other than MySQL, it's quite possible that I wouldn't even need a separate table to model the relationship (I think he brought up Object Oriented Databases as one such example).
As I was thinking about my database structure, and thinking about what Alex said, it reminded me of something I touched on a few years ago in a thought experiment wherein I split a wide a data table up into two tables for high-writes and high-reads. In that post, I said:
Another general pattern that I am seeing here is the elevation of "domain concepts". What I mean by that is the promotion of an "entity property" into a full-on "entity". In this case, the "User" property
lastRequestAtis being promoted to the domain-level concept of "Presence".
The more complex my applications become, the more I find myself doing - or wishing I had done - this (elevating entity properties). Creating a domain-level concept makes it easier to think about how data can be consumed and evolved within an application.
The saddest part about this is that I actually had a very similar thought 14 years ago when I wrote about SQL "Join Tables" vs. SQL "Entity Tables". And, in the comments to that post, Mark Bugno pushed back in way that feels reminiscent of what Alex was saying:
I disagree with you on this (and agree with your initial train of thought). There is no fundamental difference between a relationship table and entity table, in much the same way there is no difference between a circle and an ellipse - one is simply a specific instance of the other. In the geometric example, a circle is an ellipse where the foci just happen to be collocated; in the database example, a relationship table is an entity table that just happens to have no additional columns.
Every relationship (be they one-to-one, one-to-many or many-to-many) can be garnished with additional details if one so chooses. In the case of one-to-one and one-to-many relationships, we might create a table between the existing two and add the details there. The only difference with a many-to-many is that the table already exists.
Alex and Mark aren't saying the exact same thing. But, I think both of them are alluding to the idea that a "join table" isn't a "natural concept" - that it is merely a construct that I am using because of how I've organized the data in my head.
And, going back to my earlier thoughts on the "elevation of domain concept", when I split the wide table up into two narrower tables, it wasn't so much a technical shift as it was a mental shift: changing the way that I was thinking about the data that I was persisting to the database. The change in perspective precipitated the change in the schema.
In an attempt to synthesize all of these ideas into a spicy take, I'm now thinking that a "join table" is nothing more than a "entity" that is lacking a proper name. Naming stuff is hard (Martin Fowler) - we all know this. And, I think this is why "join tables" happen: because I couldn't conjure a name that clearly "meant that thing".
So, what if instead of
blog_entry_tag_jn, I had a "Tagging" entity. And, what if instead of
blog_entry_blog_entry_jn, I had a "Suggested Reading" entity. These names aren't great; but, they are - at least - an attempt to articulate the "thing" that the join tables were trying to represent.
Is it too much to conclude here that a "join table" is actually just a "code smell"? I'm gonna wear that hat for a while and see how it fits.
I always called them "xref" or "cross" tables (naming is hard, lol) but my rule of thumb was they didn't deserve to be a separate entity unless they contained at least one column that wasn't just the two foreign keys being associated. (excepting "standard" columns like
blog_category_xreftable that only contained
wouldn't be a separate entity.
employee_company_xreftable that contained
would actually deserve to be some sort of entity of its own since it tracked its own unique properties that only existed within that relationship.
As far as worrying about which DAL to put logic in, I try to be more pragmatic and do what makes sense, collapsing small, related DAL's together since I dislike hundreds of nearly empty classes in a code base. Yeah, I know-- Uncle Bob probably wouldn't agree, but I favor maintainability sometimes where it feels appropriate :)
I would hesitate to categorize join tables as a "code smell" insofar that there's nothing off or wrong about them inherently. The ability to shift your perspective and see other facets of this stone is the paramount takeaway of the thought experiment. What you do with that ability and when you use it is where the nuance lives.
I mean -- the whole purpose of a join table (aside from eliminating cross-joins) is to instantiate a conceptual relationship as an entity. A relationship doesn't need to have additional labels or attributes for it to exist as its own entity, it simply is by existing. What you want to call it or how you want to define it is up to you.
( ... why do I feel like this took a tonal shift at the end there? )
Re: where in the DAL it would go, I was trying to think back an old book that I read years ago called Streamlined Object Modeling. While I don't remember if it talked about persistence, it did talk about relationships. And I think it talked about putting the relationship into the "more specific thing". So, for something like "blog post" and "tag", I think the "more specific thing" is the blog post. As such, I think (just using a rule of thumb like that) the blog-tag join table would be owned by the "blog data access object".
But, like you're saying, just trying to be pragmatic in the thinking there. At the end of the day, a decision that allows you to move forward is better than standing still for too long.
Definitely the shift in perspective is probably the point of most personal growth here. I was so taken-aback that I didn't have a good instinct for this. So, at the very least, I'm hoping this has started to form some better mental models in my head that I can leverage next time.
Post A Comment — ❤️ I'd Love To Hear From You! ❤️
Post a Comment →