When I was sitting in on Steve Nelson's Google Gears presentation at CFUNITED, we started talking about ways to track audits to table data. One of the ideas discussed was to keep an audit table for each table we wish to track. I have never done it this way; I generally used a centralized audit table with an XML diff-field that other tables then join to. As such, I wanted to take a few minutes to explore this idea.
We are running this on a very basic contact table:
- id [pkey]
- first_name [VARCHAR]
- last_name [VARCHAR]
- title [VARCHAR]
- company [VARCHAR]
Then, we have our contact_audit table which stores all the updates to the above contact table:
- pkey [PKEY]
- date_audited [DATETIME]
- id [int]
- first_name [VARCHAR]
- last_name [VARCHAR]
- title [VARCHAR]
- company [VARCHAR]
Every time we make an insert or an update to the contact table, we are inserting that data into the contact_audit as well. This way, we keep a snapshot of every form submission for a given contact record. Let's take a look at the code; the demo here is fairly straightforward - we have a contact form that either inserts or updates a record and then creates an audit.
<!--- Param form data. ---> <cfparam name="FORM.id" type="numeric" default="0" /> <cfparam name="FORM.first_name" type="string" default="" /> <cfparam name="FORM.last_name" type="string" default="" /> <cfparam name="FORM.title" type="string" default="" /> <cfparam name="FORM.company" type="string" default="" /> <cfparam name="FORM.submitted" type="boolean" default="false" /> <!--- Check to see if the form as been submitted. ---> <cfif FORM.submitted> <!--- Normally, this is where we would validate data, but for this demo, we are not going to worry about valid data. ---> <!--- Check to see if are inserting or updating data. ---> <cfif FORM.id> <!--- Update the record. ---> <cfquery name="qUpdate" datasource="#REQUEST.DSN#"> UPDATE contact SET first_name = <cfqueryparam value="#FORM.first_name#" cfsqltype="cf_sql_varchar" />, last_name = <cfqueryparam value="#FORM.last_name#" cfsqltype="cf_sql_varchar" />, title = <cfqueryparam value="#FORM.title#" cfsqltype="cf_sql_varchar" />, company = <cfqueryparam value="#FORM.company#" cfsqltype="cf_sql_varchar" /> WHERE id = <cfqueryparam value="#FORM.id#" cfsqltype="cf_sql_integer" /> </cfquery> <cfelse> <!--- Insert the new record. ---> <cfquery name="qInsert" datasource="#REQUEST.DSN#"> INSERT INTO contact ( first_name, last_name, title, company ) VALUES ( <cfqueryparam value="#FORM.first_name#" cfsqltype="cf_sql_varchar" />, <cfqueryparam value="#FORM.last_name#" cfsqltype="cf_sql_varchar" />, <cfqueryparam value="#FORM.title#" cfsqltype="cf_sql_varchar" />, <cfqueryparam value="#FORM.company#" cfsqltype="cf_sql_varchar" /> ); <!--- Return the new ID. ---> SELECT ( LAST_INSERT_ID() ) AS id ; </cfquery> <!--- Store the new id into our form variable. ---> <cfset FORM.id = qInsert.id /> </cfif> <!--- ASSERT: At this point, we have either created or updated the given record and the ID of that record is stored in FORM.id. ---> <!--- Insert the new audit record. Rather than inserting the row data manually, we are going to select the insert data directly out of our contact table. ---> <cfquery name="qInsertAudit" datasource="#REQUEST.DSN#"> INSERT INTO contact_audit ( date_audited, id, first_name, last_name, title, company )( SELECT <cfqueryparam value="#Now()#" cfsqltype="cf_sql_timestamp" />, c.id, c.first_name, c.last_name, c.title, c.company FROM contact c WHERE c.id = <cfqueryparam value="#FORM.id#" cfsqltype="cf_sql_integer" /> ); </cfquery> </cfif> <!--- Query for all record audits for this contact. ---> <cfquery name="qAudit" datasource="#REQUEST.DSN#"> SELECT a.pkey, a.date_audited, a.id, a.first_name, a.last_name, a.title, a.company FROM contact_audit a WHERE a.id = <cfqueryparam value="#FORM.id#" cfsqltype="cf_sql_integer" /> ORDER BY a.pkey DESC </cfquery> <cfoutput> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>ColdFusion Audit Table Demo</title> </head> <body> <h1> ColdFusion Audit Table Demo </h1> <form action="#CGI.script_name#" method="post"> <!--- Send submission flag. ---> <input type="hidden" name="submitted" value="true" /> <!--- Send back the ID of the record we are working with. This will be zero the first time. ---> <input type="hidden" name="id" value="#FORM.id#" /> <p> Name:<br /> <input type="text" name="first_name" value="#FORM.first_name#" size="16" maxlength="20" /> <input type="text" name="last_name" value="#FORM.last_name#" size="19" maxlength="30" /> </p> <p> Title:<br /> <input type="text" name="title" value="#FORM.title#" size="40" maxlength="50" /> </p> <p> Company:<br /> <input type="text" name="company" value="#FORM.company#" size="40" maxlength="50" /> </p> <p> <input type="submit" value="Submit" /> </p> </form> <br /> <h2> Contacts Audits </h2> <table cellspacing="2" cellpadding="4" border="1"> <thead> <tr> <th> First Name </th> <th> Last Name </th> <th> Title </th> <th> Company </th> <th> Date </th> </tr> </thead> <tbody> <!--- Loop over audit records. ---> <cfloop query="qAudit"> <tr> <td> #qAudit.first_name#<br /> </td> <td> #qAudit.last_name#<br /> </td> <td> #qAudit.title#<br /> </td> <td> #qAudit.company#<br /> </td> <td> #DateFormat( qAudit.date_audited, "mm/dd/yy" )# #TimeFormat( qAudit.date_audited, "hh:mm TT" )# </td> </tr> </cfloop> </tbody> </table> </body> </html> </cfoutput>
NOTE: I don't usually break my CFQueryParam tags like that; this formatting is just to fit inside of the content area of the blog.
As you can see, not a whole lot going on here; I either insert or update the contact record. Then, I simply move the contact record over into the audit table. This is a really easy way to keep track of a data table over time. But, to me, there are some down sides. Firstly, by looking at a single row, you have no sense of what was updated since every column is stored - did a column value get created or did it stay the same? You need to look at the current audit record as well as the previous audit record to see this kind of data transformation. Secondly, we have to store a lot of duplicate information - we are storing field values that haven't changed. Thirdly, we have no way to deal with field name changes over time. Imagine if a column was added or removed from the contact table - by adding or removing a column from our audit table, we are essentially corrupting our audit data.
That being said, this is an extremely easy way to keep track of data changes over time. Perhaps in my next post, I will discuss the way that I keep track of data audits. It has more overhead but I believe that it is more in the spirit of an audit.
Here we usually put a trigger on our tables to insert an audit record into an audit or history table. The trigger just handles if it was an Insert, update, or delete etc. This way I dont have to physically write the insert code in my Coldfusion, the DB handles all of it. It basically inserts the whole row into that table, so yes you have the drawback of not really knowing what changed without looking at the previous record, but I dont find that as too big of an issue.
We have the need to do the same for many of our tables as well - but chose to go the database route (let it do what its good at, right?). For any table that we wish to track changes to, we create a complimentary table (using a standard prefix, and then the name of the table being tracked - i.e. changes to Contact would be tracked in zLogContact). This table has all of the same fields as the table being tracked, along with a new primary key id field, a LogDate field, and a IsDeleted bit field.
Besides creating the log table for any table that we want to track, a trigger is also added. Basically, this allows you to say "for any committed update or delete action, do ___". This solution keeps us from worrying too much about transaction states, or remembering to grab the data before we do an insert - and it lets us keep a separation of concerns (our app doesn't need to care if we are auditing changes to a given table or not).
Just some food for thought...
I have never used a trigger, so I am not familiar with this technique. My SQL understanding is fairly basic - I have not gotten into things like triggers and asserts and stored procedures. I only know how to rock CRUD and indexes (and even my indexing skills are pretty poor).
While I like the concept of separation of concerns, I am not sure that it is/should be the database's responsibility to log data changes. I believe this to be part of the business logic, not the persistence logic. What if the audits were going to be stored to flat files so as not to take up DB space or something?
Remember, the primary goal and concern of the database is to persist and to retrieve data. To me, it seems that the more responsibility that we give the database, the more fuzzy that lines becomes.
And what if our back end wasn't even a database? What if it was TXT files or XML files? What it if was just objects in RAM that never got persisted? Then we would have no triggers at all.
Now, I am not saying that triggers are bad - I frankly have no experience with them and cannot say. I am playing devil's advocate and all I am saying is that you might want to consider what is actually the concern of the database and what is the concern of the business logic.
I would say that I am going to play devil's advocate with you, but since you called that role, I will go for the guy in white on your left shoulder :)
What are we talking about here? Your solution mentions storing the data that used to be in a given record, so that the record itself can be updated with new data (or deleted, which - unless I am missing something - your example doesn't handle). So you are storing data you say? Sounds like exactly what you just said should be stored in the db :)
No, my example does not handle deletion because there are no deletion flags. I usually use a logical delete (is_deleted = 1), which would be tracked via the audit record as well. I don't store any special delete data. Using a different technique, depending on the application requirements, I do store a name and reason for deletion.
Actually that brings me to an interesting point - I have an application right now that prompts the user for a reason for deleting a record. This is required for security reasons. This data is then converted into an XML string and stored in the audit record. This data is separate from the original record; as such, it can not be automatically copied over from the original record into the audit table.
To me, this is clearly part of the business logic, not of the persistence logic. Yes, they are all getting persisted eventually, as you point out, but the logic around their persistence is the concern of the application - the method of persistence is the concern of the database.
NOTE: It should be stated that I am fairly new to data audit tracking, which is why this blog post discussed something I had not heard before. As such, please take my comments as an exploration, not as an assertion.
I have done the audit persistence both ways. The downside I have encountered when creating a log table to mirror each audited table is that in large applications I have been faced with creating 30-40 (or more) extra tables just for auditing. At that point, making changes, like the column name change you mentioned, Ben, can become even more of a chore, especially if faced with a refactoring effort that may impact more than 1 table.
The other benefit I see to keeping a single table for all auditing speaks to your concern about tracking all that duplicate data. Not a huge deal in an app with few records and/or few changes, but otherwise your audit SELECT datasets start to get fairly large, especially if there are 1 or more NTEXT fields in a given table ... now duplicated x N number of record changes, whether or not the NTEXT data was ever changed. I can think of a project mgmt engine, with some large descriptions of the project and/or some client information, and as the status gets changed (Discovery -> Documentation -> Design -> Development -> Maintenance, for example), a whole new audit record is duplicated even though only data in your smallest 15 or 20 char Status column was changed.
Going back to the single table which only tracks Table, ID, Column, Old Value, and New Value, there is certainly a different set of overhead in terms of finding the audits that match any given record, but you've substantially slimmed down the data getting captured and you know exactly what changed, how it changed, and when.
Just my 2c
You really should read up on triggers, esp. if there is any chance ever that something or someone other than your code will ever alter the database. It's the default answer to this problem.
Even if there is no chance, what if you update the table from more than one place (yeah, bad practise, but maybe you need to write a quick fix template) in you code ?
Without a doubt, I should definitely read up on triggers as well as a number of other more advanced things in database management. It's just been a slow journey.
That's why I like the idea of storing only what changed since the last update, not all the data. I think this makes the audit more accurate and more useful from an application stand point. More often people what to see "what changed" rather than the data at any given point in time.
"storing only what changed since the last update, not all the data"
It depends what your use case for the audit trail is.
Storing the while row each times makes figuring out 'what was the record at date X' quick, where as only storing what changed makes 'what changed last' quicker.
Agreed. And, although slightly longer to reconstruct from a table of chnages, it is still possible to create that "point in time" snapshot. As you point out, that's going to be more rare than the "record of changes" view for most users.
The other thing I like about a single audit table is that I can create a single widget (Audit.cfc or whatever) and then just call that for any object that ever needs to persist data changes ... no extra database work to ever add or remove a table from auditing. Either the data manager calls the Audit.log() or it doesn't, which I find to be a simple and flexible encapsulation. Same then with retrieval: one interface to show changes / change history, regardless of the data of origin.
I think it always comes down to the particular requirements for given application.
I normally package my audit logic in my CF code, but if high performance is a goal, I would absolutely move the auditing to where it can be handled the fastest. The suggested trigger method would be extremely fast, but now you're tied to that particular db engine. You could also throw cfthread into the CF code and give a faster perceived response time, but this method still places a higher load on the application server.
I find your idea of creating a complete audit record interesting though. I've always used a single audit table per db and only store the data element that actually changed. Basically table_name, column_name, before_data, after_data, record_id, and action.
Dumping the entire record is a lot less code.
I think this is a good point - since Auditing is so much behind the scenes, the way in which it is implemented is not in any way a "one size fits all" kind of glove.
I think it is often times easy to lose sight of this point, especially when you are new to something and want to know what the "right" answer is.
" trigger method would be extremely fast, but now you're tied to that particular db engine "
Your particular SQL to create the trigger might be, but it's easy enough to rewrite for another engine.
The actual table made by the trigger is of course perfectly portable.
"I believe this to be part of the business logic, not the persistence logic."
You raise an interesting design question, Ben. Separating business logic from persistence is clearly a winning strategy, but does that necessarily imply that the database should be a dumb container?
I've worked on a few ginormous Oracle and SQL Server projects with hardcore dedicated DBAs. The "best practices" they recommend always seem to involve building business rules into the database. For example, using check constaints to ensure that column X only holds a certain set of values.
I found these systems very reliable, self-documenting, and easy to maintain. Has anyone else run into (non-theoretical) problems with this approach?
Yeah - the errors that percolate back up are often fairly generic and unhelpful.
That said, the GUI shouldn't be letting those invalid values through anyway so it only really occured during testing.
I think we are running into the same problems we had with designing the "right" Project management protocols at work. The size of the project does / should influence the way that things are done.
Admittedly, I have really only works on relatively small projects, so that is the world that I know and am used to. This will affect what I feel is "right".
I use the audit tables for different reports. One report is used as more of a historical report. I can look and see everything that went on by date, month, year for a given record or set of records. We also use the audit tables to determine which applications a user has logged into during a session. (Ben logged into application A, then C, then B...)
A situation ive run into with an application I maintain is when an admin mistakenly deletes a record, I then can go into the db and put that record back how it was. There are other things I have in place to handle these situations (backups etc) as well.
As several people have already mentioned, triggers are usually a default choice for auditing. An alternative approach will be to use an Aspect (AOP) to record changes to a business object over time.
I've used the single-audit table method with a decent amount of success as well. I actually wrote a CF script to look up the column names in the information_schema for each table and automatically write the trigger for you...much easier that way. Let me know if you're interested in seeing the code.
I agree that it's not immediately obvious what's going on when the error says that check constraint CHK_13245 has been violated. :) But I do appreciate the lack of ambiguity - there is only one constraint with that particular unhelpful name.
I am not trying to prescribing the best approach for anything. (Way too much of that in the blogosphere already!) But I am interested in what people think about the fundamental question.
Newbies are told to abstract their persistence mechanism. The reasoning behind this is beyond reproach, and I take no issue with it. But ought we to ignore the many advantages that can be gained from codifying business logic within the database itself?
Sorry, I didn't mean to imply you were saying that one approach was absolutely better than another; I was just relating this conversation back to other trials we have been through together :)
Ain't no thang, maestro. :) I learned much through our trials!
I'm with the "trigger" method. I hadn't really used them with any regularity until my previous company, but certainly saw the use for them having had to audit several tables. My thought about using a trigger vs. using CF was that once the data had been transferred over to the dB, that it should handle what should be done with backing up the data, rather than having to take the previous data from the table, update/insert with the new data, then insert the old data to the audit table. A trigger keeps track of the old data and what was done to it (CUD), and then you can take that old data and push it into the audit table (even adding what was done if you like).
As to inserting what was edited vs. inserting the whole row, I'm not so sure about that one. I've only done it entering a whole record, but I can definitely see the advantages of just entering the single record.
Hi Ben. Do you know about SQLAudit? SQL Audit(http://sourceforge.net/projects/sqlaudit) is a very very nice application to build SQL table trail audit. We're using in our CF application and we're very happy. With a very easy UI you can choose your SQL table and SQL Audit will build the "twin" audit table, triggers and "principal" table . SQL Audit is free and open source.
My experience with auditing also involved needing to track who made the change, in addition to when the change occurred. This is one of the problem areas currently with Middle-tiered based applications vice Client-Server based applications. Typically, in the database, one could use the Current User in a trigger to log this information. Unfortunately, with Middle-tiered applications, one uses a common account to connect. Therefore, the middle tier has to pass the end-user information for audit purposes.
What would be interesting is to hear what others are doing. Working in Health Care, one of the HIPAA requirements I run into is having to know who accessed patient related data. Ideally, this would be built into the database connection, like using Oracle's Proxy Authentication.
Any smarter way found yet, or still the old school which mentioned in this article 5 years ago?
I am in a need to pass the enduser information for audit purpose.