A Database Column For "Date Updated" Has No Semantic Meaning, Nor Should It

By Ben Nadel

Published 2022-10-27 in ColdFusion, SQL, User Experiernce (UX), Work — Comments (13)

When I create a new relational database table in my ColdFusion applications, my default operation is to add three columns: the primary key, a date/time for when the row is created, and a date/time for when the row is updated. Not all entities fit into this model (such as rows that can never be updated); but, for the most part, this core set of columns makes sense. The "updated" column has no semantic meaning within the application - it is simply a mechanical recording of when any part of a row is updated. The biggest mistake that I've made with this column is allowing the customers to attach meaning to this column. This never works out well.

In the data-access layer of my ColdFusion applications, the "updated" column is always included at the end of my UPDATE SQL query. For example, here's a made-up method for updating "Widgets":

component {

	/**
	* I update a widget with the given ID.
	*/
	public void function updateWidget(
		required numeric id,
		string name,
		string sku,
		numeric priceInCents,
		) {

		```
		<cfquery name="local.results" result="local.metaResults">
			/* DEBUG: widgetGateway.updateWidget(). */
			UPDATE
				widget w
			SET
				<cfif arguments.keyExists( "name" )>
					w.name = <cfqueryparam value="#name#" sqltype="varchar" />,
				</cfif>

				<cfif arguments.keyExists( "sku" )>
					w.sku = <cfqueryparam value="#sku#" sqltype="varchar" />,
				</cfif>

				<cfif arguments.keyExists( "priceInCents" )>
					w.priceInCents = <cfqueryparam value="#priceInCents#" sqltype="bigint" />,
				</cfif>

				-- The UPDATEDAT column is always updated anytime any of the other
				-- columns are updated. There is no semantic meaning to this column
				-- within the application itself.
				w.updatedAt = UTC_TIMESTAMP()
			WHERE
				w.id = <cfqueryparam value="#id#" sqltype="bigint" />
			;
		</cfquery>
		```

	}

}

As you can see, the updatedAt column is always modified with the current date/time whenever the updateWidget() method is called. It doesn't matter which of the columns are being updated. It doesn't even ensure that the column values are actually changing - it's just a mechanical update.

In retrospect, it's probably my mistake to ever include this "updated" date in the user interface (UI) of an application. I often feel pressure to make UIs "rich"; and, adding an "updated" date to the UI seems like an easy way to flesh out an interface.

But, the problem is, even if the "updated" value has no semantic meaning, by including it in the UI, the users will attach meaning to it automatically. And since I'm not educating them as to what this value means, they'll just make up whatever story they want, logical or not.

And, the moment they do that, things go wrong. Suddenly, there are complaints that the value is changing when it "shouldn't be". Of course, since each user made up their own story, each user has a different mental model for when the value should change.

One user might say:

The "updated" date should only change when the price of the widget changes.

And, another user might ask:

Can you make sure the "updated" date is only changed when one of the widgets is sold to a customer?

Both of these things can't be true. At least, not for the same database column. And, that's the root of the issue: neither of these concepts should be attached to the "updated" column - they should manifest as separate data structures with their own consistent semantic meaning.

For the user who wants to know when the "price" was changed, we might have to add a column for priceUpdatedAt. Or, perhaps it makes sense to have some sort of a "price history" associated with a given widget. Or, maybe even have both approaches wherein the history is "eventually materialized" into the priceUpdatedAt column.

And, for the user who wants to know when a widget was sold, we might have to add a column for mostRecentPurchaseAt. Or, keep some sort of a order history, etc. etc. etc.

Ultimately, you should never show data to your users if the data has no meaning to them. Because, they'll attach meaning to it whether you like it or not. And then, things are out of your control. The "updated" column in a database table has no semantic meaning - don't start treating it like it does. I've made this mistake too many times.

Why Bother Having an "Updated" Column?

If the "updated" column is just a mechanical recording of when the row was updated; and it has no semantic meaning within the application; and it probably shouldn't be shown to the user (for all the reasons outlined above); then, you might be wondering why even bother having that column in the first place?

Honestly, I don't think I can articulate a good reason. I do think that it's a helpful data-point that sheds light on the life-cycle of a row; and, I do think that it sheds some light on how your customers might be interacting with row; but, that's just a feeling in my tum-tum - I don't think I can put a solid, coherent argument together.

Mostly, I just have a sense that it is good; and, I think I might need that column some day.

Not the strongest argument.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4344

Reader Comments

JC Oct 27, 2022 at 12:55 PM

19 Comments

I can articulate a good reason:

Although an "updated_at" column may hold no sematic meaning for your users, it does hold sematic meaning for your system. Those columns are important metadata for any kind of information and/or content management strategy you may want to implement -- like auditing or archival.

Want to know if a part of your code is being used? Check when the tables it uses were last accessed? Ditto if you want to deprecate a feature. If you turn off the feature flags for it and the records are still being accessed by updates, then something else is actively touching that stuff. Keeps you from stepping on someone else's toes.

To forestall the argument -- yes, this information could be obtained from the transaction logs, but that's more cumbersome and doesn't afford itself to ingestion to other tools. Better to expose the updated_at date through some form of API you can control vice having a program come in and peek around at your xlog, IMHO.

Edit: Depending upon the kind of data your working with, holding on to records any longer than absolutely necessary could be a legal liability. Automated disposition based upon last access is not just helpful in that case -- it could be compulsory.

Ben Nadel Oct 27, 2022 at 4:38 PM

16,124 Comments

@JC,

Excellent articulation of the value-add of the column. I love the point about being able to see if a feature is still being used - that's such a great idea. I think when it's an internally-facing value, it's much safer to use. I've been most burned by this when our customers see it, and then start to tell stories (in their mind) about what it means.

Andrew Kretzer Oct 28, 2022 at 12:47 AM

17 Comments

Out of curiosity, is there a reason that you don't let the database generate that field? E.g. "CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP" in MySQL.

Ben Nadel Oct 28, 2022 at 9:47 AM

16,124 Comments

@Andrew,

For starters, I didn't know that existed 🤪 that said, I tend to shy-away from things that happen automatically in the database layer (like Triggers and cascading deletes). I tend to like to have things be visible "in the code" so there's no magic.

That said, since the roles of the "updated at" timestamp is so focused, I could see this being helpful. I'll take a closer look at the documentation, thanks!

Andrew Kretzer Oct 28, 2022 at 12:34 PM

17 Comments

@Ben,

Right on. Like you, I too have the same 3 default fields on each table, but both createdOn and updatedOn are auto populated by MySQL on INSERT and UPDATE respectively.

Ian Turton Oct 28, 2022 at 4:48 PM

14 Comments

@Andrew,

Same!

I note in passing Ben's code will set updatedAt even if nothing is updated. UPDATE CURRENT_TIMESTAMP doesn't do that.

(I'd also make the obvious-in-case-it's-not-to-some point that CURRENT_TIMESTAMP saves you having to remember to update that date anywhere else you update the table)

For Ben's sake, I use;

for date created;

NOT NULL DEFAULT CURRENT_TIMESTAMP

for last update

NULL ON UPDATE CURRENT_TIMESTAMP

Ben Nadel Oct 29, 2022 at 9:52 AM

16,124 Comments

@Ian,

Do you have any idea if CURRENT_TIMESTAMP and UTC_TIMESTAMP are doing the same thing? When I run these both in my local dev container, they return the same thing; but, my dev container timezone is UTC. I've tried to Google for the answer, but not getting much luck. I'm used to use UTC_TIMESTAMP for my queries.

SCOTT BENNETT Nov 15, 2022 at 10:48 AM

37 Comments

I used to always add dtUpdated columns in all my tables too, but did not usually show that to users. However, in recent years, I don't add that column automatically to tables anymore, because I noticed that just having an update date was not very helpful to me. Now if a table is going to hold records with data that requires me to keep track of updates, then I typically break that into an audit log table so that I can display something meaningful to the user like:

Jon Doe updated widget name from "oldwidgetname" to "newwidgetname" at 11/14/2022 1:15:12PM from IP 123.123.123.123

When a customer calls complaining that something changed and they swear they didn't do it, it's nice to have a log like that... and I typically put the audit log right there on the record so that the user can see it and not call for support at all. 😃

Ben Nadel Nov 15, 2022 at 10:53 AM

16,124 Comments

@Scott,

I've dabbled a bit with an "audit log" in an application many years ago; but, I haven't tried it since. That said, I have added many piece-meal log tables to track some "delete actions", mostly - to your point - so the Support team can point to something and say, "Actually, Mr. Pink, on your team, deleted that record yesterday."

It is a feature-set that I would be keen to try again. I did a bunch of Googling about it maybe 2-3 years ago, and found surprisingly little written on the matter. Though, maybe I was just searching for the wrong terms. Or, maybe there's just no "best practice" on the matter since there are so many different approaches.

Ian Turton Nov 15, 2022 at 10:59 AM

14 Comments

@SCOTT,

I also do both, and like you only audit a few 'major' tables in any given system. I built a component that can take an insert/update SQL statement, run it, check for triggers and create a table-specific audit record with just the changed columns.

I do find 'lastUpdated' occassionally useful (like 3 or 4 times a year), but it's not user-facing thing

Ian Turton Nov 15, 2022 at 11:03 AM

14 Comments

@Ben,

Ironically (is it, I don't know!) the component I mentioned doesn't bother with deletes because I don't do them. Can't think of the last time I did an 'app based' hard delete on a table (certainly not one anyone would care about me auditing)

Ben Nadel Nov 15, 2022 at 11:28 AM

16,124 Comments

@Ian,

While I love the idea of not doing a "hard delete", I do still want to move the "deleted" row out of the table. What I have found is that with "soft deletes", it's harder to make indexes that can enforce uniqueness. What I would love to do is create "archive table", that receive a copy of the deleted row. This way, the archive table can have fewer indices; and, the active table can have indices that have tighter constraints.

It is possible to have unique constraints with soft deletes; but, I feel like I always have to have a nullable column somewhere, like isActive (that is cleared in the delete action). And, I like to avoid nullable columns whenever possible. They always feel so 'meh' to me.

Ian Turton Nov 15, 2022 at 11:48 AM

14 Comments

@Ben,

"I like to avoid nullable columns wherever possible".

Me too, absolutely. Hate the bloody things. For var/chars I never use them, because I've never come across (others will have I suppose) the need to know whether an empty string is 'really empty' or 'don't know yet'.

But they're hard to avoid for numerics (the above ambiguity moved to zero vs. null is a real concern) and a complete bugger to avoid with dates where there's no non-fugly representation of 'not set'. To me, with dates nulls become actively useful, and don't tend to have the usual downsides when it comes to e.g. aggregate functions (cos I hardly ever aggregate them). In other words, with dates I never(ish) have to reach for the IFNULL() function.

Non-rhetorically, how do you store a date with no current value?

If you want to avoid nulls, an active flag could just be a non-nullable boolean (0 or 1). I prefer a nullable 'dateDeleted', which both acts as a flag and a useful piece of quick-see audit data (backed up on important tables as per previous post, with an audit of who shifted it from null to set)

I get the 'uniqueness' issue, but the main reason I have soft deletes is to preserve the integrity of foreign keys, and that nearly always involves auto increment type ids on the linked to table.

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.