Exercise List: Thinking About Data Validation - Who, What, And Where?

By Ben Nadel on November 13, 2007

At this point in my journey of learning object oriented programming in ColdFusion, I have a lot of the domain logic encapsulated into our Domain and Service objects, but our validation logic is still hanging out in the controller code. Because form data validation is essential for the application to work properly, I am 100% sure that this is something that should be encapsulated to a large degree. Think about it this way - if you needed to create an exercise record from two different places in the application, you would only have one set of objects, but you would have to duplicate the data validation code. One of the largest benefits of object oriented programming is ensure that processes have maintained integrity no matter where they are called from within the application. If validation logic has to be duplicated in each area that uses the model, then clearly, we cannot ensure anything except large maintenance costs.

As such, I think it is essential that validation logic be encapsulated somehow. But how?!? And where?!? And, even more importantly, what? There are two types of data validation that need to occur and I am not sure that both types should be in the same place. When it comes to a piece of data, we need to think both about the type of data that it is and then about its value. Take for instance, the Read() method. The Read() method takes an integer. On one hand, we need to ensure that we pass in an Integer value otherwise the ColdFusion CFArgument tag will throw a validation error. Then, on the other hand, we need to make sure that the ID is a proper value.

If I wanted to move all the data validation logic into the Domain Models then I would have to basically get rid of all the type checking that I have in my CFArguments tags within my setter / mutator methods. Otherwise, if I tried to pass an empty string into a method that expected an ID, ColdFusion would throw an argument validation error. Somehow, I feel like taking away all parameter validation is NOT the answer. As such, I must assume that DATA TYPE validation must remain in the business logic but outside of the domain model.

Assuming that the data type validation will remain external to the domain model, that means that it is the responsibility of the calling code to pass in data of the correct type. This makes sense. The domain model API creates a contract between the programmer and the domain model object. If a method says that it requires a numeric ID, then it is up to the programmer to pass in a numeric ID. Think about an Automated Teller Machine (ATM). The "api" states that I am supposed to put a card of some type into the machine. That is the TYPE of data that is requires. If, however, I try to pour my tasty seltzer into the machine's card slot, then, only bad things can happen. Is that the fault of the ATM? I say, No. It's my fault because I violated the agreement that I made with the ATM to put only items of type CARD into the slot. This would be the equivalent of passing string data into a method that expected numeric data.

Now, had I put my Vitamin Shoppe card into the ATM, it is the ATM's job to reject it because, while it is the right TYPE of data (a card of some sort), its value is not correct. This would be the equivalent of passing a negative number into a method that required positive integer values.

I am very comfortable with this concept of separating the two different data validation responsibilities. But, who's responsibility is it to validate those data values? Right now, I'm really feeling like this is a "Service" to be provided to the programmer. As such, it should probably be part of the Service objects for the related domain model objects. So, for example, if I was dealing with an Exercise.cfc ColdFusion component, I would expect that the ExerciseService.cfc ColdFusion component would have a method called Validate() that would take one argument which was an Exercise.cfc instance. Validate() would then return some sort of array or "error collection" object which could then be used in the XHTML page view.

This raises another question then; if the domain model validation is a service that is called by the programmer, there is nothing to ensure that only valid data is passed to the Save() methods which persist the domain data in the database. Just because the error collection is returned to the user, there is nothing to stop the user from ignoring the errors and still passing the domain object instance to the Save() method. At this point, the Validate() method doesn't really guarantee anything. So, how do I make sure that no invalid data will be stored in a way that doesn't depend on the end user to pass in only valid data.

My first thought was to have the Save() method also call the Validate() method to make sure there were no data value errors before the data was committed to the database. But, then, I would be calling the Validate() method twice for every page request (potentially), once by the user and once by the Service object itself. Again, one of the main benefits of object oriented programming is that we don't have to duplicate effort. Remember the whole DRY principle - Don't Repeat Yourself. By calling Validate() twice, we are going against everything that we are aiming for.

So, how can I make sure that Validate() is only called once? Right now, the only thing that I can think of is that Validate() is a private method of the Service object and is called only from within the Save() method. The Save() method would then return the collection of data validation errors if there were any. The calling code in the controller would then look something like this:

<cfse objErrors = objExerciseService.Save( objExercise ) />

Here, the controller code is calling the Save() method and then collection any errors that the Save() method might generate. This approach makes sure that the Validate() method is both called when it is essential and only once such that we are not repeating database calls and validation effort.

This seems pretty good, but then we run into another problem: how do we translate the validation errors into errors that are meaningful to the user. For instance, the Validate() method might tell us that the "Name" property of the Exercise.cfc instance is not valid, but how do we translate that into the message "Please enter an exercise name" which would be displayed to the end user. This translation cannot be done in the ExerciseService.cfc or even in the Exercise.cfc because neither of them can have any real sense of the view. In fact, the names might not even line up; the Exercise.cfc might have a "Name" property, but on the view, we are calling it an "Exercise Label". As such, readable messages cannot be generated by the domain model or the service objects.

Before we even figure that problem out, it raises yet another point - the "error collection" returned from the Save() method cannot return messages as the message are view-related not domain related. As such, the "error collection" can really only be some collection of invalid property names. These are the only things that can really translate from the domain model to the view without issue.

But now, this raises even another issue (oh dear)! If the error collection is just a list of properties, how would we alert something like "That exercise name is already in use". We could return "Name" as being invalid in the Exercise.cfc, but at that point, we cannot differentiate between "Please enter an exercise name" and "That exercise name is already in use". So, it seems that returning a list of invalid properties is not sufficient; we need to return both properties and some sort of reason as to why they are invalid. But, of course, at the same time, we need to be able to translate this into something that is usable in the view!

This is starting to make my head spin. Data validation seems to be harder than the leap it took me to understand the domain model itself. I don't think that this is something that I can just figure out on my own. Time to hit the books and blogs looking for tips.

Please, if anyone can point me in the right direction here, that would be awesome! I would totally make it worth your while ;)

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/1034

Reader Comments

Glen Lipka Nov 13, 2007 at 11:42 AM

55 Comments

What about client-side validation too? Maybe with Ajax on blur of a field?

Some interesting plugins for the client-side validation:
http://digitalbush.com/projects/masked-input-plugin
http://bassistance.de/jquery-plugins/jquery-plugin-validation/

Glen

Rich Kroll Nov 13, 2007 at 11:53 AM

10 Comments

Ben,
You really are diving in with both feet! The conclusion you came to is what I utilize (service layer validation) for server side validation. If there is an error on the server side, I return an error collection containing the messages to return to the user. Each of these messages uses a resource bundle key to get the actual text (exercise.nameInUse = "That exercise name is already in use"). I place the message directly into the error collection prior to returning it to the view under the "key" of the property. I push the responsibility of message display onto the view (this gives the ability to provide contextually relevant error messages).

Henry Nov 13, 2007 at 1:35 PM

3 Comments

I'm surprised Exception is not mentioned throughtout the article. Mr. Sean A Corfield himself stated that "...conceptually CF has less overhead with exceptions that even Java has." - http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:120487

I would add a .validate() method in the Bean/Business Object (BO). The validate() method returns void but throw exceptions. The UI is responsible of catching the exception and display the cfthrow.message or a more friendly message. You may choose to add a validateBean() method or something of that nature in the service layer if u want your service layer to be acted as a facade.

What do you think?

Ben Nadel Nov 13, 2007 at 2:23 PM

15,688 Comments

@Glen,

Client side scripting for form validation adds a user-friendly layer, especially for the stuff that does not need to validated on the server (ie. numeric fields, checking length, date formatting, etc.). However, at this point, I would want to avoid any client-server communication in terms of validation since that raises the whole AJAX framework issues, which I am not ready to tackle. But certainly, it couldn't hurt to add some client side stuff as jQuery plug-ins make it easy.

@Rich,

Interesting approach. I like that. It's almost like you provide custom "exception" types as the struct keys. What do you think of this idea - I make the possible exception types constants of the Service object. Something like:

<cfset THIS.Exceptions.InvalidLength = 1 />
<cfset THIS.Exceptions.InvalidValue = 2 />
<cfset THIS.Exceptions.ConflictingValue = 3 />
... etc. ...

Then, the validation method can return a Struct that has the property values set to these exception values:

This way, the view is given fairly good information about what properties errorred out and why they may have done so. Of course, you could come up with a bunch of exceptions that are very generic that could be inherited from the base CFC or something and then the individual Service objects could add additional specific error types as needed.

I think this might take your idea, which is very good, and slightly decouple the data from the message. I am not sure if this should / needs to be de-coupled, but feel like they should.

@Henry,

I am not sure I follow exactly. I think we have to be careful. Validation is not really an exception case. Imagine if the call IsXML() always threw an exception if it was false - seems like it would be very confusing. Plus, if we throw errors on invalid data validation then we could only collect one validation rule at a time (since each exception would essentially exit out of the exception routine).

Rich Kroll Nov 13, 2007 at 2:47 PM

10 Comments

@ben:

That could work, the only problem I have is the feel that you have now tightly bound your view to the implementation of the constants in your service layer. Here is a code snippet of what i was talking about, which may give you a clearer insight into what I am talking about:

Then in the view:
<input name="exerciseName" value="#Exercise.getName()#">
<cfif Errors.hasError("nameLength")>* #Errors.getError("nameLength")#</cfif>

This can be done as i've shown with an error object, or can be implemented via structures and structKeyExists for the view check. This decouples the view from the service. To take it a step further, you can swap out:

with:

this will allow you to utilize multiple text strings (localization usually) within the view tier just by adding additional resource bundles such as:

exercise_US -
exercise.nameLength=A Name must be provided for the exercise.

exercise_GB -
exercise.nameLength=A Name must be provided, you bloody fool!

Ben Nadel Nov 13, 2007 at 2:56 PM

15,688 Comments

@Rich,

Ok, I see what you are saying. I am sold. You are still slightly tied to the variable "naming", but at least when you go through your intermediary errors object, you are accessing it via string keys, rather than variable that need to exist. This is nice, very nice. Thanks.

Ben Nadel Nov 13, 2007 at 2:57 PM

15,688 Comments

@Rich,

Plus, just because the service layer provides a default message, there is nothing to say the view layer has to use it; it can still use a different message based on the error "key" if it sees fit.

Kurt Bonnet Nov 14, 2007 at 2:40 PM

24 Comments

I'm with you Ben. I like to keep my validation outside my domain objects AND my controllers. I've taken to the approach that you metioned where there' s a validateXxx method in the service layer that accepts an instance of the CFC you want to validate. I also perform a validation check on the CFC instance before I send it off to my DAO to be saved. This works pretty well for making sure only valid CFC instances get persisted, but this isn't the most helpful validation at the cilent level.

Since you're a pretty creative guy who likes to try things out I'd like to share an idea I've been kicking around with you. I've been thinking about developing a validation and form-processing "framework/library/whatever" that can perform validations at all levels of the request lifecycle. What I mean is I want to declare my validations for my domain objects in one place and be able to have those validations translated to work in all layers of an application. For instance, translating the validations to ActionScript so if I'm using a Flex client the Flex client doesn't need to do a server request to validate the data; or to translate them to JavaScript so if an HTML client is being used client side validation can be performed automatically. I'd also like the framework to perform server-side validation by validating FORM submissions and CFC instances. I'll explain the difference between these validation types more in a little.

Before I go any farther, SOME of this can already be accomplished by using the Validat framework developed by Jeff Chastain of alagad. See web site here: http://trac.alagad.com/Validat

And I'd also like to mention that some of the concepts mentioned above have already been implemented in other validation engines like the Apache Commons Validator framework.

What I'd like to do is expand on the implentations and concepts provided by these frameworks. First I'd like to diffentiate between two types of validations. FORM validations and DOMAIN object validations.

1.) Domain object validations are pretty straightforward. You're validating stuff like "first name can't be blank, and can't be more than 30 characters in length", and that an email address is really an validly formatted email address." You can also validate much more complex stuff that can only be done on the server.

2.) Form validation is usually a pretty custom thing. Sure you can re-use routines for checking data types, lengths, etc... but the fields on a form may not map directly to the properties of a domain object, causing you to not be able to re-use your domain object validation for your form fields. Or in the case of very complex and/or long forms, a single form may only represent a segment of a domain object and not the entire thing.

I'll get to some examples in a minute, but what I'm getting at here is it would be beneficial to have a validation framework/library/whatever that:

a.) Allows you to declaritively define what needs to be validated
b.) Allows you to easily re-used existing/common validation rules and add new custom rules
c.) Allows you to specify if a validation can only be done on the server side
d.) Allows you to specify rule implementations/translations in various languages (ActionScript, JavaScript, heck any language)
e.) Allows you to create validation/rule inheritence hierarchies so you can over-ride and add additional validations to a validation/rule set definition.
f.) Allows you to apply only the validations for a domain object that you need to apply (for instance only run the validations for the first, middle, and last name fields for a Contact object; ignore the address, address2, city, state, zip fields for now.)

So lets take the example of a Customer CFC. The CFC has the following properties:

FirstName: String
LastName: String
MiddleInitial: String
Address: String
City: String
State: String
Zip: String
Phone: String
Email: String
FavoriteNumber: Numeric

The declarative validation for the Customer.cfc could be stored in an XML file like so:

</rules>

</validationSet>

Assuming we had a validation engine that could parse the above XML and run the validations on an instance of the Customer.cfc we could simply call: validator.validate(myCustomerInstance) to perform validations.

Now for the client side validation. There needs to be a way to bind the client-side form to the various parts of the domain object so we can determine which validtions can be performed on the client side. To define the FORM validation for the client, we'd also do that declaratively.

By specifying the markup above you would basically be saying, "my form has all the fields included in the domain object for Customer and they are named the exact same, and all possible client-side validations for the Customer object should be applied to the form "someForm" "

Your HTML form would look something like:

And the coldfusion script processing the form submission could make a call like so to validate the form contents on the server-side as well:

validator.validate("customerForm", form) where "form" is the form scope or some other structure.

Now lets say you have "Enter Customer Record Wizard", you could potentially have something like:

Your HTML for customerForm1 would look something like:

When you submit the form, only the validations for the first, middle, and last names would get applied because the FORM validation has limited the validation rules to just the rules for those 3 properties.

Your HTML for customerForm2 would look something like:

Now lets say you create a new CFC (SpecialCustomer.cfc) that extends your Cutomer.cfc. You wouldn't want to re-define all the validations for Customer.cfc for SpecialCustomer.cfc so the validation engine should be able to look at the validation definition below and perform introspection on the SpecialCustomer CFC to see that it extends the Customer.cfc and have it automatically merge the two validation sets together so that the SpecialCustomer CFC has all the same validations as the Customer.cfc and it's "somethingSpecial" validation as well.

There are TONS of ways that this could be enhanced and implemented. For instance, you could add mappings in the formValidations section to provide html field aliases to the business properties in case the form field names don't match up with the business object properties. Plus there could be ways to consolidate the validation definitions so they're not as bulky and are included in the actual metadata for a CFC so that it could be extracted via introspection. There could also be ways to create the XML via code generation of some sort, etc.... You could adapt this to allow for internationalization/customization of validation error messages....

This is all pretty rough, and I hope it helps get across the ideas that I have. Does anybody else have opinions on the above?

Ben Nadel Nov 14, 2007 at 6:26 PM

15,688 Comments

@Kurt,

That looks very cool. Clearly, you have really thought that out extremely well. It goes a little bit above my head in terms of being able to wrap it all conceptually, but I think that I get the gist of what you are trying to do. It looks the challenging thing would be to provide all the different "context" validators such as the Javascript form validator or the Flash validator or the ColdFusion validator. I suppose though, if you just build to some sort of interface, then anyone would be able to build a plug-in to extend this functionality in the given context.... I am merely throwing out some buzz words there, but I think I see what you are doing.

I am interested to see what other have to say.

Steve Savage Dec 17, 2007 at 4:10 PM

31 Comments

My approach is similar to what Kurt suggested, except instead of an XML file I configure the error checking rules directly in my database.

For my home baked Content Management System I've been slowly working on, I have a table called "t_entity_attributes" that lists all possible attributes that can be associated with the entities described within the CMS.

Example entities are "user", "image", "movie", "entry", "message", "item" etc.
Example attributes are "title", "description", "width", "firstname", "price" etc.

The main fields in the t_entity_attributes table are:
attribute_name, attribute_name_hashed, attribute_type, attribute_pattern, attribute_error_id

When a form is submitted to add/update an entity I basically do the following:

Get the instance of my form validation cfc (exists as a singleton in my application scope) and pass it the entity name and the FORM structure.

e.g. o_result = getInstance("validator").validateForm("resource",FORM)

Inside the validator I check each field to see if it represents a valid attribute:

query SELECT * t_entity_attributes WHERE attribute_name_hashed = '#hash(FORM.fieldname)#'
(I SELECT using the hashed value to prevent SQL inject)

if I get a match, I then do the validation checks
- attribute_name isValid for the entity
- fieldvalue isValid attribute_type (e.g. string, credit card number etc.)
- fieldvalue doesMatch attribute_pattern (e.g. a regular expression)

If there is an error/no match:
- I create an error object and pass it the attribute_error_id, or default error_id + fieldname, to get the correct error message from my messages table.
- The collection of error objects are returned as part of the result

e.g. o_result.errors

If there is no error:
- the attribute's value is added to an instance of the object that represents the entity:
e.g. o_entiy.addAttribute(attribute_name,attribute_value);

- if all the attributes validate correctly, I can pass the entity object to the next step in the process, whatever that may be (e.g. write to database, file, log screen, email etc)

Why did I do all of this?

This allows me to have an admin interface that lets me add new attributes (and in some cases entities) to the system without modifying my coldfusion code.

Though I still typically have to tweak the CSS and FORM HTML for the new fields.

Ben Nadel Dec 18, 2007 at 9:16 AM

15,688 Comments

@Steve,

That is very interesting stuff. I like the idea of allowing the "structure" of your content management system to be more data row dependent that ColdFusion code dependent (and I don't see a problem with having to update forms to accommodate these changes). I am not sure if this kind of liquidity is something I would want to see as a best practice, or if it is something that happens to work nicely for Content Management Systems. My gut tells me that this is more CMS related.

I am curious as why you Hash() the field names? What benefit does that supply?

Steve Savage Dec 18, 2007 at 10:51 AM

31 Comments

@Ben

I've used the same datamodel for an inventory system, think of the entities as store items. with attributes such as colour, price, model, manufacturer, etc.

This approach has worked for me, BUT has a heavy reliance on components (entity, iterator, DAO) to actually manipulate the data, so you performance may vary.

I forgot to mention before, in association with the entity_attributes table, I have an entity_attributes_allowed_values table that is used to limit attributes to a predefined list of values e.g. allowed colours.

For the hash.

I'm just very wary of taking any string sent by a user and using it directly in an SQL statement (even when using CFQUERYPARAM), especially when I'm accepting the string through a page available to the general public, and when I plan to use the string as part of a lookup statement.

So my quick method in the past for protecting my database is hashing incoming strings used for lookups , e.g. user name (for log in), or in this case a field name, and having a hashed version of the string I'm looking for in my database to search against.

Since I'm now comparing two hashed strings in my SQL statement, and a hashed string can only be numbers and letters, the ability of hackers to exploit my code using SQL injects or some Coldfusion Bug I'm not aware of "should" be greatly decreased.

Again there are other ways to protect against these problems (cfqueryparam etc.) So I may drop this approach in the future.

Ben Nadel Dec 18, 2007 at 11:38 AM

15,688 Comments

@Steve,

Cool. That hashing makes sense. Thanks for the explanation.

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.