It's OK To Be Super Explicit In Your Data Access Method Names

By Ben Nadel

Published 2022-06-14 in ColdFusion — Comments (14)

This post is mostly a note to myself - a sort of "permission slip" to relax the previous constraints of my mind. The other day, I was writing an Update method in a data access object (DAO) within my ColdFusion application. This update method was updating an optional relationship within my domain model. Meaning, the target row wasn't guaranteed to exist. As such, I decided to author the underlying SQL statement as an Upsert method that would use the ON DUPLICATE KEY UPDATE semantics. This would allow me to either update the existing row or create a new row on-the-fly. Technically, this worked like a charm; but, emotionally, I felt deeply conflicted.

ASIDE: In the following discussion, I am using "Thing" as a placeholder for any arbitrary domain concept.

The problem I was "feeling" was that the method was called updateThing(). Now, there's nothing wrong with calling a method updateThing() - I use that form of method name all the time in my data access layer. However, with all my other "update" methods, most of the arguments are optional; and, a given column is only updated if the relevant argument is provided. As such, my CFQuery UPDATE SQL statements usually look a lot like this (truncated):

component {

	public void function updateThing(
		required numeric id,
		string valueOne,
		string valueTwo,
		string valueThree
		) {

		```
		<cfquery name="local.results" result="local.metaResults">
			UPDATE
				thing
			SET
				<cfif arguments.keyExists( "valueOne" )>
					valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
				</cfif>

				<cfif arguments.keyExists( "valueTwo" )>
					valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
				</cfif>

				<cfif arguments.keyExists( "valueThree" )>
					valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />,
				</cfif>

				/* No-op assignment here to make the trailing commas consistent. */
				id = <cfqueryparam value="#id#" sqltype="bigint" />
			WHERE
				id = <cfqueryparam value="#id#" sqltype="bigint" />
		</cfquery>
		```

	}

}

With an Upsert statement, however, the arguments can't be optional because they need to be able to create a new, fully-formed row if the target row doesn't exist. As such, all Upsert method arguments have to be marked as required.

And that's where my brain started to explode. I couldn't, in good conscience, author an update() method that completely broke from the standard pattern of optional arguments. It was too much of a deviation from any reasonable expectation that another developer would have when consuming this method.

The name of the method wasn't telegraphing the intent of underlying query.

So my brain finally said to me: THEN NAME IT SOMETHING ELSE you big doofus!

And so, I changed it to upsertThing(). And my brain said, Yay! Because, now, the method was doing the thing that it said it was doing and it became clear as to why all the arguments were marked required.

component {

	public void function upsertThing(
		required numeric ownerID,
		required string valueOne,
		required string valueTwo,
		required string valueThree
		) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT INTO
				thing
			SET
				/* !!! Owner ID column might cause a key-conflict. !!! */
				ownerID = <cfqueryparam value="#ownerID#" sqltype="bigint" />,
				valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
				valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
				valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />
			ON DUPLICATE KEY UPDATE
				valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
				valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
				valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />
		</cfquery>
		```

	}

}

This was an important moment of clarity for me because, normally, my brain craves consistency. As such, I wanted this data access method to be named the same thing as other data access methods. But, it's not the same thing. And that's OK.

What I'm seeing now is that data access methods that mutate data should never be surprising. They should say exactly what they are going to do; and then do only what's printed on the tin.

So, for example, in my previous post on handling MySQL key conflicts in Lucee CFML, I had a method that would either INSERT a new row or return the id (primary key) of an existing row:

component {

	public numeric function createThing( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			ON DUPLICATE KEY UPDATE
				id = LAST_INSERT_ID( id ) -- Store existing ID into "generatedKey".
			;
		</cfquery>
		```

		return( metaResults.generatedKey );

	}

}

The problem with this method, as I now see it, is that it's not telling me what it does. It's not "creating a Thing" - it's "creating a Thing or returning the id of the existing Thing". As such, it should be renamed to literally say as much:

createThingOrReturnExistingID()

It's a bit of a mouth-full, but so what? I already firmly believe that your data access layer should have as many methods as you find helpful. So, there's nothing inherently wrong with having a few more "create" methods that are more explicit in how they differentiate from one another.

This is a personal revelation, but it's certainly not a new idea. If you look in the MongoDB Java API, for example, they have all sorts of methods that perform very specific actions: updateOne(), updateMany(), findAndModify(), findOneAndUpdate(), findOneAndReplace(), bulkWrite(), etc. So, there's much precedence for being explicit; and, my brain is finally getting on the band-wagon.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4281

Reader Comments

Dave Quested Jun 14, 2022 at 8:40 PM

8 Comments

I've added upsert to my internal dictionary. Brilliant. I completely agree with your strategy here. Say what it does. Always need to think about someone else reading your code. Naming is the hardest part of coding!

Ben Nadel Jun 15, 2022 at 9:53 AM

16,255 Comments

@Dave,

Agreed, Upsert is a great word - I learned that from the MongoDB API. Naming things is hard. And, the funny thing is, sometimes it's deceptively hard! Like, you'll name something and feel good about it; and then, only over time do you realize that you failed to capture what it actually was and why working with that "thing" feels overly complicated. Such a journey!

Dave Quested Jun 17, 2022 at 9:51 AM

8 Comments

Totally! I have many "What was I thinking calling it that!" moments a few years later. And then there's trying to find a Font Awesome icon to match the piece of functionality ;)

On aside, loving the "Working Code" podcast. I listen to it whilst running in the hills here in New Zealand. Keep it up!

Ben Nadel Jun 17, 2022 at 9:55 AM

16,255 Comments

@Dave,

Don't even get me started about icons 😜 I spent like 3-hours the other day trying to make an icon in Sketch. At work, all our icons have a 1.5px stroke, which makes it super hard to design since all the paths needs to start on a .5px location. I feel like that, alone, is a decision somebody probably regrets at this point!

Re: Working Code, thank you for the kinds words! It's been a lot of fun. Glad it's making it all the way over there to the other side of the world. Running up "hills" reminds me of t-shirt that I used to see somewhere that said, Hill Yis, which was - I think - supposed to be someone with a New Zealand accent saying "Hell Yes". But, that has nothing to do with anything. Happy Friday! (or maybe it's already Saturday for you).

Ian Turton Jun 18, 2022 at 9:36 AM

17 Comments

Assuming this is MySQL, I prefer to use the VALUES function on the update part.

Apart from being more readable, it removes the possibility that I make a change to the insert part and forget to change the update (or vice versa)

so e.g.

ON DUPLICATE KEY UPDATE
    valueOne = VALUES(valueOne),
    valueTwo = VALUES(valueTwo),
    ...

Ben Nadel Jun 18, 2022 at 9:45 AM

16,255 Comments

@Ian,

Yes, this is MySQL. It's funny, so I've actually read through the MySQL documentation on the ON DUPLICATE KEY UPDATE, and I've seen the VALUES() stuff, and I swear I've just never really understood what it was doing. Even just now, as I was about to write this comment, I went and I looked at the documentation again, and for whatever reason, my mind just glazes over 😨

I think maybe this is most useful in the multi-row insert. Here's an example in the docs -- and forgive me, I'm mostly talking to myself at this point :

INSERT INTO t1
	( a, b, c )
VALUES
	( 1, 2, 3 ),
	( 4, 5, 6 )
ON DUPLICATE KEY UPDATE
	c = VALUES( a ) + VALUES( b )
;

... but, I guess this would not work if you just did c=a+b? I'm not sure. I honestly only ever really insert one row at a time. I don't have a good mental model for multi-row inserts.

Anyway, apologies for the stream-of-conscience; just trying to wrap my head around it.

Ian Turton Jun 18, 2022 at 11:06 AM

17 Comments

@Ben,

you can use the VALUES(col_name) function to refer to column values from the INSERT portion

That's it right there. I don't know why they've gone immediately to an example of a non-obvious use for this (e.g. the single insert version of your statement I posted), but I suspect that immediate dive in to the edgy/hard to grasp is the problem you're having. I've never used them in any other way than as 'little bit of DRY plus legibility' improvement.

Just seems to me like an obvious requirement of the ON DUPLICATE KEY clause to prevent unnecessary/mistaken duplication of values e.g. Postgres has it with ON CONFLICT/EXCLUDED

It's a bit less of an improvement for me as I nearly always use queryExecute, so I could just repeat the same param name in the insert and update clauses. But I think it's optimal to defer to the db language options when building queries.

Ben Nadel Jun 20, 2022 at 9:43 AM

16,255 Comments

@Ian,

Sorry, I promise that I'm not being purposefully obtuse here. But, I appear to be able to refer to the existing row values even without that function. Meaning, in my example in the post, I have:

ON DUPLICATE KEY UPDATE
	id = LAST_INSERT_ID( id )

... where the id is from the existing row. So, I am not sure what is different if I just add VALUES() around it:

ON DUPLICATE KEY UPDATE
	id = LAST_INSERT_ID( VALUES( id ) )

Maybe it's just that the VALUES() is what is recommended by the language? And, not necessarily an actual semantic difference? That's what I mean - I am not sure what it is technically doing.

Ian Turton Jun 20, 2022 at 12:07 PM

17 Comments

@Ben,

I reckon the disconnect here is you think I'm trying to 'improve' your final query, the one that gets the id in the event that the upserts ends up being an update. I'm not - it's about the first upsert

I'm trying to fix the issue that, using your original upsert example, you have to ensure you have exactly matching pairs of cfqueryparams - one for the INSERT portion, one for the UPDATE. Not particularly onerous for tables with a small number of columns, but for larger tables there is a danger you may mismatch columns, or later on update the INSERT part of you query and forget the UPDATE etc.

The VALUES() function allows me to simply say, for any column

update the column with whatever I tried to insert into it

It has the added benefit (for me) of getting me closer to a 'truer'/more portable/easier to read SQL statement, with minimal CF footprint.

So, in other words, I much prefer this version of your original upsert statement;

public void function upsertThing(
	required numeric ownerID,
	required string valueOne,
	required string valueTwo,
	required string valueThree
	) {

	```
	<cfquery name="local.results" result="local.metaResults">
		INSERT INTO
			thing
		SET
			/* !!! Owner ID column might cause a key-conflict. !!! */
			ownerID = <cfqueryparam value="#ownerID#" sqltype="bigint" />,
			valueOne = <cfqueryparam value="#valueOne#" sqltype="varchar" />,
			valueTwo = <cfqueryparam value="#valueTwo#" sqltype="varchar" />,
			valueThree = <cfqueryparam value="#valueThree#" sqltype="varchar" />
		ON DUPLICATE KEY UPDATE
			valueOne = VALUES(valueOne),
			valueTwo = VALUES(valueTwo),
			valueThree = VALUES(valueThree)
	</cfquery>
	```

}

Ben Nadel Jun 20, 2022 at 12:09 PM

16,255 Comments

@Ian,

Ooooooh! I see what you're saying now - the VALUES() function is for the value I was going to insert, not the value that was already in the row. That's the part that wasn't clicking for me. For whatever reason, my brain just was not getting that from the documentation.

Ian Turton Jun 20, 2022 at 12:28 PM

17 Comments

@Ben,

Yes!

...and the reason you're not getting it from the docs (IMO) is they've gone straight to a slightly obfuscatory, non-obvious example. Whereas way back when I found VALUES() I came to it from a 'there must be a way for me to stop repeating all these params cos it's dangerous/tedious', so could gloss over the esoteric stuff. TBF they're entirely in the 'RDBS' domain, and not concerned with param based queries being run from outside MySQL

I hope now my comments about DRY/readability/portability/deference to underlying db language make sense!

Ben Nadel Jun 20, 2022 at 1:25 PM

16,255 Comments

@Ian,

Yes, thank you so much for your patience in hand-holding through to this realization. Much appreciated 🙌

Ben Nadel Jul 1, 2022 at 12:12 PM

16,255 Comments

@Ian,

I just used the VALUES(col) in a production query. Woot woot 💥

Ian Turton Jul 3, 2022 at 8:57 PM

17 Comments

@Ben,