Data Type Validation vs. Data Value Validation
Posted November 19, 2007 at 3:59 PM by Ben Nadel
This is more just a consolidation of some previous entry's thoughts, but I thought it was be a good solid topic unto itself. As I have been working on my Exercise List project, I have been doing a tremendous amount of thought about data validation and I've had a bit of a revelation: there is a very distinct difference between data TYPE validation and data VALUE validation.
A long time ago, when I posted about an improvement that I would like to see in ColdFusion's CFParam tag, I hadn't yet understood this divide. As such, I was not able to articulate my thoughts very well or defend my idea against nay-sayers. But now, I feel like things have become much more clear. Data type validation and data value validation are separate concerns and should be handled by different parts of the codebase.
When you look at this, you start to see how powerful and useful ColdFusion's CFParam tags really are. You can use it to run all of your data TYPE validation. My new goal, and so far, I am not seeing a problem with this, is the establish that once my CFParams have finished executing (for a form or URL submission), all data in that form should be of the correct TYPE. This does not mean that it will contain valid data in terms of the business logic, but rather, if I was expecting a numeric value, I have a numeric value or if I was expecting a list, I have a list.
When I think about the data that comes across in a FORM or URL submission, I really only see three kinds:
- Numeric data such as the ID selected in a select box or boolean (1/0) sent over in a checkbox.
- String data such as the text submitted with text input, and textarea.
- List data, which is the result of multiple fields with the same name being submitted.
I can't really see anything else that falls outside of one of these three categories. Even binary data, such as uploaded file information, still corresponds to a string value in the form (not to the resultant binary upload).
So where does CFParam come into play here? CFParam should be doing all of the data TYPE validation before the form is really even processed. Theoretically, data types should never be an issue unless a user has tampered with the form submission process. Meaning, all text input should submit strings; all select boxes (that have ID values) should return numbers. It is only through user tomfoolery that any of these data types should ever fail type validation. As such, the argument against "expensive" exception handling is moot. Data type exceptions will be presented only to the users that invite them through their own mischief.
While no standard user should have a data type exception, it does not mean that we should not handle them gracefully. As such, all CFParams that can throw data type exceptions, should either be handled by the application error handler, or even better yet (IMO), handled by an individual CFTry / CFCatch block:
- <!--- Catch TYPE validation error. --->
- <cfset FORM.user_id = 0 />
Here, we are enforcing that an ID (probably from some sort of select box) must be numeric; and, if it is not numeric, then we are forcing it to zero (which in my systems, is usually the same as if no selection was made in the first place). This has nothing to do with whether or not the user ID is valid in the system (ie. that is corresponds to a record in our database), it is only saying that the ID value must be of Type, numeric.
Here's where I think a lot of people get confused (myself included): text inputs that require numeric data. Take for example, the price of an eCommerce product. This is entered in a text input box, but from a business standpoint, it should only be a numeric value. People see this, and might think that they need this type of CFParam:
The problem here is that the price might not contain a valid number. What if a user leaves it blank? What if a user entered alpha characters? This would cause the form field to fail Type validation and to throw an Exception. This is why, I believe, people think that CFParam is a poor choice for validation and that Exception handling in this case is very expensive, and all that good stuff. The problem is, they are basing this feeling off of a scenario that is not valid.
See, the price field is NOT numeric. It's a string. Text inputs and textareas can ONLY submit string values (although I guess it could be argued that ALL form fields can only submit string values, but let's not go there). There is nothing about a text input interface (outside of client side scripting) that confines the user to a given set of characters. As such, there is no way we should ever think of validating its type as Numeric. Is can only and will only ever be a string value.
Once we get the price field into the server side environment, it is only our business logic that determines what the valid VALUES of that field are. It is only our business logic that states that price must contain a numeric value. It is only the job of the business logic to validate the VALUE of a variable.
Ok, so you might look at this and ask, How can we ever say that a form value is anything but a string? It has to do with the user interface. Select boxes, hidden values, check boxes, and radio boxes don't allow the user to enter a value; they only allow the user to select which value will be submitted with the form. As such, the TYPE of data submitted by these form fields is locked down. Similarly, if a user were to select multiple checkboxes which would submit a comma delimited list of values, the list type is still set in stone as the user should not be messing with the submitted values but rather only selecting which of the values will be submitted.
I hope that some of this is making sense. I am not the most articulate person - I am a code monkey. But, I hope that maybe you are seeing a bit of revelation that I saw; that TYPE and VALUE validation are two very different concerns. And, I hope that by seeing this separation, you are seeing that ColdFusion's CFParam tag is a wonderful tag. But, you might wonder, if the business logic is what determines the valid Values of a variable, why bother doing type checking with CFParam at all? The simple, powerful answer - environmental confidence. You will never have to worry about the type of data that you are dealing with. If you are expect a numeric ID from a select, you will be confident that you are dealing with a numeric value. Additionally, I hope that you will also see that data type exceptions will only be the edge case and will only happen to users who invite it with acts of naughtiness.
What Other People Are Searching For
Good to see this follow-up post. A lot of folks forget to post updates when they've been challenged on an issue arising from a previous blog post (or mailing list post). If you don't post an update, everyone assumes your position hasn't changed :)
I only use cfparam to type check when the type is "known" and an invalidly typed value can only be caused by "user tomfoolery" but mostly I just let the exception be handled by the site wide handler (that's what you get, Mr Naughty User, for trying to mess with the URL/form post!).
Regardless of cfparam, you still have to check known numerics for range (for example if you use a numeric URL variable as a key to read data from the DB, you still have to check you actually got a matching record back).
Where I use cfparam most is defaulting form data for display where I use session variables to hold user data between a form post and a redisplay with errors. So the pattern is:
In the display page:
cfparam name="session.formdata.somefield" default="#someDefault#"
cfinput ... value="#session.formdata.somefield#"
In the action page:
... do validation ...
if errors, session.formdata = form (and redirect to display the form)
else redirect to acknowledgement page
I think my position has changed slightly from yesterday. When I posted this entry, I stated that it was more graceful to handle the malformed variables in a CFTry / CFCatch as opposed to letting the application error handler take care of it. I am wavering on this?!? As you are saying, "that's what you get, Mr Naughty", why do I have this strong feeling like I have to protect people from their own evil? If someone's gonna mess with my application in malicious way, there's no way that I should be trying to allow that page to recover - they broke it, they get no free rides.
I think I am gonna take on more of your mindset - just let the site-wide handler deal with malicious behavior.
I am curious as to why you are storing your form data into the session scope and then just deleting it after the page has finished processing. I am not sure what benefit the session scope gives you. Are you persisting the form data across different page calls?
In the shower this morning, as I fought the urge to lie down and sleep, I came up with a great analogy for my CFParaming mind set; all you have to do is think of the form action page like it's a User Defined Function. In a way, it sort of is (bear with me); you submit data to it, it encapsulates flow logic and processing, and it returns data (via the page flow). Looking at it that way, you can imaging a ColdFusion template as such:
... <!--- Define arguments. --->
... <cfparam name="FORM.fname" type="string" default="" />
... <cfparam name="FORM.lname" type="string" default="" />
... <cfparam name="FORM.email" type="string" default="" />
Obviously, there are no CFFunction tags on the page, but I am including them here as a visual aide. When we create user defined functions, we have arguments and those arguments are traditionally typed (but not always). And, if you can visualize the action page as a kind of UDF, then it follows that your CFParams are like the UDF's CFArgument tags and should be typed.
Maybe that's crazy talk, but it made a lot of sense a 6am this morning :)
Yes, I post the form and if there are errors I save the form data to session scope and *redirect* to the form display page. I use session scope here as a 'flash' scope (a la Ruby - and also as Model-Glue does and maybe Mach-II?).