Ask Ben: Averaging Date/Time Stamps In SQL
How can I average dates in a SQL call?
There are several types of date/time averaging in SQL. You can either average the date/time stamp as a whole, the date part of the date/time stamp, or the time portion of the date/time stamp. While I don't necessarily feel like all of those are worth discussing (due to usefulness), since I am not sure about what part you are asking, I will cover all three.
The secret to date/time averaging is that date/time stamps can be represented as a floating point number. I have covered this a number of times on this blog so I won't go into too much detail, but the idea is that as a floating point number, the integer part represents the number of days since the beginning of time (as the SQL server defines it) and the decimal part represents the time or rather, the fraction of days. SQL does not make this conversion for you; you have to CAST the date/time stamp as a FLOAT value:
CAST( date_time_value AS FLOAT )
Once you do this, it's easy to see how averaging can take place. You are probably used to running aggregates on numeric values. The FLOAT produced by a date/time cast is no different.
For the examples below, let's assume you have a web statistics database and you have a table that has user sessions. The field "date_created" is the date/time that a user has started a session on your web site. For this first example, let's just average the date/time session stamps as a whole:
<cfquery name="qFullAverage" datasource="..." username="..." password="..."> SELECT CAST( AVG( CAST( date_created AS FLOAT ) ) AS DATETIME ) FROM web_stats_session </cfquery>
As you can see, first we cast the date_created field to a FLOAT. Then, we run an average on the float fields. Keep in mind, this converts every date_created value to a FLOAT in the entire table and then runs the average on it. Once we have the average, we have to cast the FLOAT value back to DATETIME so that we can see it as a user-friendly date.
Averaging all date/time objects, to me, has little practical value. Averaging the date portion of a date/time stamp has a little more value, but not much more. However, if you want to average just the date part, it is basically the same operation. The only difference is that we have to strip out the time part, which as you will recall, is the decimal part of the floating point number. Now, here's a mathematical question: do you need to strip out the decimal part prior to averaging? My gut instinct tells me that it doesn't matter, but I am not a math whiz. Regardless, we definitely have to do it after the AVG:
<cfquery name="qDateOnlyAverage" datasource="..." username="..." password="..."> SELECT CAST( FLOOR( AVG( -- This inner FLOOR() call is optional (I think). -- The averaging of the integer parts should -- not be affected. Put it in if you want. -- FLOOR( CAST( date_created AS FLOAT ) --) ) ) AS DATETIME ) FROM web_stats_session </cfquery>
As you can see, we do pretty much the same thing as in the first example. The difference here is that once we are done averaging the FLOAT values, we FLOOR() the number to remove the time portion before we CAST it back to a DATETIME value.
Averaging times to me seems like the most practical of all the possible scenarios. Averaging time gives you more insight into trending in an informative manner. When averaging the times there is no doubt like above: we have to remove the integer part of the date/time FLOAT value BEFORE performing the average. To do this, we have to subtract the integer part from the full floating point value. After that, it's pretty much the same:
<cfquery name="qTimeOnlyAverage" datasource="..." username="..." password="..."> SELECT CAST( AVG( -- Get full floating point value. CAST( date_created AS FLOAT ) - -- Subtract the integer part. FLOOR( CAST( date_created AS FLOAT ) ) ) AS DATETIME ) FROM web_stats_session </cfquery>
Notice that in this example, we did not FLOOR() the resultant average. We want to keep the time part.
The trick to handling date/time stamps in SQL fast and effectively is remembering that you can represent them as FLOAT values. Numbers are faster to work with and make people more comfortable. After all, you're all used to dealing with numbers in SQL. Just don't forget to CAST things back to date/time.
Want to use code from this post? Check out the license.
I really appreciated your info on this because it was through using your CAST to FLOAT (or DECIMAL) and back out again as TIMESTAMP that I realized CF has this built in to the AVG function. Just AVG a list of dates and use DATEFORMAT on the output side and you'll get the same thing. :)
Word up. The more I mess around with Dates and Times, the more I realize how powerful they are and how manipulation of them can be really awesome. Good stuff all around.
Hey Ben, Just wanna say thanks for this article, it helped me a lot today, in fact, i am bookmarking this site right now hehe.
Frank, just a note, I ate the dust completedly when I read your comment like 3 times, and I still didn't get a word. Like, what does CF mean? Also, DATEPART() can be used as output too?
Glad to help. Hope you enjoy the site and thanks for bookmarking :)
Frank was refering to ColdFusion (CF). It's a server-side scripting language and is what most of my blog enteries are about. ColdFusion can average columns in a query once they come back from the database. If you are just using SQL, don't worry about what he said.
From Argentina, Thank you very much for your tips, very useful
This is a nice article. Got what i was looking for in an instant. Thks!
Thanks for th tip. Saved my day worth of work.
Excellent article - crystal clear
I spent whole day to find out average of date time and failed... and at last, I got answer from your site...
Very Useful.. Thanks a Ton
Cool guys, glad you liked this approach :)
Ben, Thanks for this, I've been trying to get a quick and easy way to avg my times and this was easy to understand and very concise
This was the number 1 result on google when searching for "sql average date"... and rightly so! Question answered, well done Ben!
I had issues with the code running against data that overlapped midnight. I adjusted the code which seemed to give more accurate numbers. For instance, if the data was 11PM,1AM,2AM,2AM, you would expect an average around midnight or 1AM. Your math returns 28/4=7 or 7AM. My code breaks when avg across noon, so use accordingly.