Exercise List: More Thoughts On Domain Data Validation
Click here to view the current Online Demo
Click here to view / download the current Code Base
Previously, I've talked about doing all data "value" validation in the Save() method of my Service objects and then just returning a collection of error flags. I now have one more reason to think that this is a good tactic - SQL errors. See, data validation happens on so many levels: client validation, data type validation, data value validation, and then lastly, SQL constraint validation. I might have code that says a value is valid, but then the SQL server throws an error because the insert or update fails to meet some constraint (such as duplicate primary keys).
This type of error will only ever be thrown if the Service object has invalid data but thinks that the data it is dealing with is valid. Therefore, this type of data validation flag will have to be handled by the CFTry / CFCatch tags that surround (or will surround) my SQL interaction.
I'm not explaining my thoughts very well, but what I'm trying to say is that if some of the data validation can only happen at the database interaction level, we might as well limit the data validation to only the method that interacts with the database. If we spread our validation out across various API methods, then we might provide false positives (of valid data) to the programmer. Meaning, if the programmer depended on ServiceObject.Validate( Bean ) to test for valid data, there is no way that that validation could be 100% accurate since it does not take into account SQL server level data validation.
Long story-short, the only way to provide fully accurate validation is to do it in the ServiceObject.Save() method.
And, while writing this, I am actually reminded of another excellent point that is quite relevant. A while back, Frank Yang, CTO / CIO of RecycleBank, was discussing duplicate data validation with me; specifically, we were discussing the problem of not allowing duplicate email addresses in a given table. To handle the situation, I would check for duplicates as part of the form validation. If no duplicates were found, I would then proceed to process the form. He argued that this was a gross misuse of processing power - why waste time programmatically checking for duplicates when the SQL server is highly optimized to handle such checks. My counter argument to this was that if we relied on the SQL server to do the duplicate checking, we would have to handle form validation in two places: once before processing and a second time to catch any constraint violation errors that were thrown by the SQL server.
At the time, I felt like it would be a pain to handle multiple bouts of validation (especially since it was not easy within the framework of the existing application); however, now, within the context of object oriented programming and the validation of data at the point of SQL interaction, I can really see where Frank Yank's suggestion would make a lot of sense. Think about it - dealing with duplicate emails is far more the "edge case" than the inserting of unique email addresses. As such, dealing with that validation should also be the "edge case" (ie. catching an exception) and should not be allotted consistent processing time.
That all being said, I think I am going to move ahead with a Richard Kroll style of data validation (as commented on in my previous post). I like the way that his validation flags relate the Service Object to the controller code, but not in so tightly a way that it would break if the internal rules of the Service Object changed (it wouldn't work, of course, but no data would be corrupted and no errors would be thrown).
Reader Comments
Hi,
Besides whether Save() is the right place to make your validations, I'm not comfortable with the point you make about issues detected at the DB level. Your DB constraints are part of your business logic. Take the case of the primary key. Your model "knows" about the object's id, so it (or the service layer) must also know that this id must be unique. You have to provide the mechanism to ensure that *before* it becomes a DB constraint violation issue. If you're using something like an Oracle sequence to assign the PK at insertion time, you're pretty much safe; but if you're not, you should reserve the next available id value yourself, or if it was provided by the preceding layer, verify that it isn't already in use in the table. Whichever mechanism you use, at the point you get to execute your SQL statement / procedure call, you have to be pretty confident that everything should work. This means that you have already made your validations. When any of these validations fail, a user-friendly message should be ultimately produced, since these are "normal" paths of your application, situations that you recover from. This doesn't mean that the actual DB constraints aren't necessary or that nothing will fail anyway. Yes, it's good to have the constraints, and yes, things can still go wrong when you get to the DB. HOWEVER. The application has already guarded itself from anything that can reasonably be expected to happen. Anything beyond that is pretty much considered a crash. You don't want to rely on SQL errors as something to gracefully recover from. They usually imply there is a bug or you didn't validate enough. So in that case let some DAO exception propagate all the way up. At the top level you should have a handler for all yet-unhandled exceptions that might have been thrown, usually with a standard error page informing the user that the app blew up, possibly with exception info (of course you should log the stack trace, and possibly launch some failure alert of some sort).
So, in conclusion, you must differentiate validations that are part of your normal business safeguards, and situations that imply that something is not working. The difference is usually about the user screwing up and your app screwing up. Your approach to validating in Save() seems to be to concentrate in one place the things that can "go wrong" without making this distinction; however, these two types of failures need not be handled the same way. Or rather, they should not.
Let me know what you think.
@German,
I think we also have to remember that these are edge cases I am referring to with the DB exploding. And, also, keep in mind that I am not talking about doing all validation as part of the SQL integration. Within the Save() method, no SQL integration would take place if the data is not *expected* to be valid first. Something like this:
--- Save() ---
Errors = THIS.Validate( bean )
if (NOT Errors.HasErrors())
.. try
.... SQL stuff
.. catch
.... Errors.AddError( "Uknow error" )
end if
return Errors
Obviously, this is not 100% thought out yet, but I didn't want you to get the idea that SQL was always being executed in order to validate the data. I can also make this Validate() method public , so the programmer can utilize before calling the Save() method... however, I am not sure what benefit this adds. 99.9% of the time, if the form data is valid, I will call Save() immediately after that.
From a general flow standpoint, I get no benefit from splitting the Validate() from the Save().
And, as far as allowing the SQL exception to bubble up to be a general application exception, I am not sure how I feel about that. Yes, there needs to be SOMETHING in place to allow SQL exceptions to be tracked so that we can narrow down and remove bugs. And to be honest, I am not even sure HOW a SQL contraint error comes back and how I will know what kind it is.... that stuff I am figuring out as I go, but again, these are edge cases.
And, also remember that potentially thousands of pages can be access simultaneously. Even if I did validate pre-Save(), I might have 10 users at the same time trying to select the same email address. Chances are, several of them would come back as valid, only to then fail constraint requirements at the SQL level.... so, it has to be handled, and it STILL needs to be a friendly, "Email address is not available".
Also, I am new to this, so none of this is set in stone.
I would imagine that something like this is better as you will catch any db errors then rollback any changes?
<cftransaction action="begin">
<cftry>
<cfquery>
INSERT INTO table (col1)
VALUES (<cfqueryparam cfsqltype="cf_sql_integer" value1" />
);
</cfquery>
<!--- if any database errors occur don't continue --->
<cfcatch type="database">
<cftransaction action="rollback" />
<cfset commit = false>
<cfthrow object="#CFCATCH#" />
</cfcatch>
</cftry>
<!--- if no errors have ocurred then commit the database transaction --->
<cfif commit>
<cftransaction action="commit" />
</cfif>
</cftransaction>
@Shuns,
Something like that will be in place, but I want to see what it's like to handle certain database errors and then reThrow other ones. I might find out that it's a bad idea, but I want to explore.
I'm actually a Java guy, so this code looks a bit weird to me :)
Anyway, I believe it's common practice to catch DB exceptions, react, and rethrow them as DAOException, PersistanceException or something like that, so that the app needs not know about specific exceptions thrown by the data access library in use. Maybe a DAO base class can take care of this kind of stuff.
On the email already in use example, well, it all boils down to drawing the line to consider the situations you intend to recover from. I would probably not include that case. To do it, it would really help if you could get a meaningful exception to recognize what happened. This is not usually the case. At least with certain technologies you get a generic SQL exception with a somewhat descriptive message inside (returned by the DB engine itself).
@German,
So, you are saying that in the case of email uniqueness, you would run some sort of duplicate check via a Validate() method first, then pass stuff off to Save(). Save() itself would then try to insert the email, but if it violates the email constraint, it throws an error that we trap as an application error, not as a validation error?
I can be more comfortable with that idea. Something seems a little wrong about it, but this is all new to me, so all of it feels a little bit wrong :)
What are your thoughts on performing validation in the Save() method and then returning the Errors collection object? I have gotten push-back on the idea by others.
Ben,
To your first question: yes, that's how I'd do it. That's where I'd draw the line. This decision takes into account the facts that the situation is extremely unlikely, and that it can be very tricky trying to sort out what exactly happened at the SQL level. You can still make an effort to handle these unhandled exceptions the nicest possible way from the UI point of view, like returning to home or previous page and displaying a message like "oops, the operation failed, please try again". And of course, logging stuff behind the scenes and firing alerts so you can trace whatever went wrong.
Validation in Save(): I'm not sure to buy it. I found this blog while searching for the "right" way to handle model validation, which is tricky and I don't think I have the final answer yet, if there's any. Right now I'm inclined to having a service layer (with business logic, including validation) and a model which includes basic validation that is common to all business needs. I don't quite like the idea of having save functionality in the model itself, but don't take this as an authoritative opinion. However I suggest a good read of the following blog post (and 2 followups) which I've found interesting:
http://uri.jteam.nl/?p=6
Cheers
@German,
Interesting blog post. What it makes me realize is that whatever form error collection object that I finally come up with needs to be able to have messages that aren't just service-layer validation related. Right now, my error collection can really only add a message if it corresponds to a validation error that was found by the service layer; the two-password field case is an excellent example of why this will never work.
Other than that, I am not sure what else to take away from it. It seems like they are saying that some of the validation should go in the Domain Model itself. I guess to have something like that, you would have your controller that would validate, then call validate on the service layer, which would validate and then turn around and call validate on the domain object.... so much validation :)
@German,
Got some good points out of that link you gave me:
www.bennadel.com/index.cfm?dax=blog:1042.view
Thanks.