I have to admit, I was never formally trained as a database administrator. Other than an introductory database course that I took about 7 years ago, most of what I know about databases I have learned on my own or by talking to other people. As such, I am not the best database schema developer nor do I know much about SQL database optimization. As such, it shouldn't be that surprising that yesterday I found myself trying to optimize a page that was taking 4-5 seconds to process each record (for an estimated total of 45-minutes... if it didn't keep timing out).
Normally, I might look at this page and its fairly long and complex SQL algorithms and think to myself, 45 minutes is long, but maybe that's as good as its gonna get.
Well, Thank God I attended Nate Nelson's CFUNITED sessions on Database Performance Tuning and Beyond Basic SQL. Nate spent about a quarter of the time driving home just how important indexes are when it comes to database performance. He also repeatedly stressing the fact that databases are really amazing and that they should be super fast; that you should rarely expect a well built database to perform slowly. It was this message that I kept hearing in my head as I kept pouring over my huge ass SQL queries.
Instead of settling for less than even mediocre database performance, I took one of his [Nate's] optimization tips and I went through the various tables involved and I indexed all of the foreign keys. This is the first time I have ever indexed a foreign key. Heck, it's the first time that I've ever indexed anything but the primary key fields.
Re-running the page, I actually got giddy with excitement and drew many sideways glances with my uncontrollable school-girl giggling. The page, which had been taking an estimated 45 minutes to process had just finished processing in 16 seconds.
Let me make that as clear as possible: Taking Nate Nelson's SQL optimization advice and adding foreign key indexes to my tables, the page went from taking 45-minutes down to 16 seconds.
And so it was that on July 11th in the year of 2007, Ben Nadel saw the power of indexes and it was good.
| || || |
| || |
| || || |
Looking For A New Job?
- Full Stack Angular/Rails Engineer @ newly funded Health-IT startup at Dorsata
- Part-time remote DBA needed at Bridge
- Senior Developer Wanted for International Cloud Company at Giva, Inc.
- ColdFusion Developer - jQuery, PrimeUI, ORACLE DB at ComSpec International, Inc.
- Jr. ColdFusion Developer - Full Time / Telecommuting is ok. at InterCoastal Net Designs
That is insane, I cannot believe just how much the increase is. I also have never bothered to index foreign keys either so obviously should do so :D
BTW nice pic :P
Yeah, its pretty crazy! It went from 2700 (estimated) seconds down to 16 seconds. That's .006% of the original time. It's funny, when I was listening to Nate speak, he really was pushing the indexing which got a little bit repetitive, but in hindsight, it was exactly the right move.
Now, I need to learn more about indexes. Apparently they come with costs (both for upkeep time and for storage I assume). I gotta figure out where the good balance is.
Umm also do what exactly do you mean by indexing the foreign keys? Does that mean that you are adding a new "foreign key" or a new "index" that sorts the foreign key's column?
Yeah I have used secondary indexes before and the upkeep time I think is each new record will mean that the index will have to be recalculated.
I have a primary key, which is usually an auto-incrementer. Then I have other row-related columns. Then, I usually have one or two (or more) foreign keys, which are just IDs that correspond to other primary keys of other tables.
So, for example, if I had an phone record:
ID (primary key)
Number (the phone number)
Type (type - cell, home, work, etc)
ContactID (the primary key of the related contact)
The ID, which is a primary key, is automatically indexed. What I did was then put an index on the ContactID column. That way, if / when I join the Phone table to the Contact table (on the Phone.ContactID = Contact.ID columns), the Database does super fast look ups on both columns.... rather than just a super fast look up on Contact.ID and a full table scan on the Phone.ContactID table.
Does that clear it up at all? I had more complicated joins and several tables, but same idea.
Yeah I get it, I have done that before on occasions. I got confused I thought that you were refering to creating a foreign key constraint son foreign keys which I haven't done
Good info mate :)
I have never set up constraints... I always do that in my business logic / code, which I am told is a no-no.
Yeah I agree :)
This book really blew my mind as far as SQL goes, I don't know how I ever got a long without it!
Holy crap! :)
First, it's amazing that you had the patience to let something run for 45 minutes, so congrats on that, heheh. But cutting the execution time down to 16 seconds just by putting indexes on your FK's is amazing... I never thought a bunch of unindexed queries would run that badly, let alone optimize so well.
First congratulations. Indexing is very important.
In SQL Server 2005, the functionality of non-clustered indexes is extended by adding non-key columns to the leaf level of the non-clustered index. Non-key columns, can help to create cover indexes.By including non-key columns, you can create non-clustered indexes that cover more queries. The Database Engine does not consider non-key columns when calculating the number of index key columns or index key size. Non-key columns can be included in non-clustered index to avoid exceeding the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes. Another advantage is that using non-key column in index we can have index data types not allowed as index key columns generally.
That's cool! I love it when things like that happen. I was wondering how you got that much improvement on a foreign key index, but then in your comments section it made sense. The more unique the values are in a column, the more an index will be beneficial. There is a cost associated with maintaining indexes though, so don't get carried away. But, joining or frequent searches on a column make them good candidates to consider for an index.
Oh yeah, I was gonna also say I used to have abs like that a couple of years ago. Working on gettin em back. Maybe another month. Sweet pic!
Welcome grasshopper, we've been waiting for you.
Now you need to learn the value of clustered indexes and indexed views and you can squeeze even more out of your database.
Just be careful when using indexes that you don't go ape and start indexing even single column in your table.
A good and very basic rule of thumb is this:
If a column is used in a WHERE or and ON clause it should be indexed.
This is a strange coincidence, that on my birthday you also become enlightened with the power of SQL... maybe something more to add to my Synchronicity post? http://www.markdrew.co.uk/blog/index.cfm/2007/7/9/Eclipse-ColdFusion-and-Mark-Drew-Synchronised
There are ways also to see what indexes may be needed or what indexed are not really needed. Because often too many indexes is same bad as no indexes at all.
I Think you may be interested in this:
Thanks for the book recommendation (and one that's not too expensive ;)). I will try to check that out.
I didn't have a chance to wait 45 minutes. The page was set to timeout after 10 minutes. However, it was taking 4-5 seconds to process each record from an imported text file. Taking that, and extrapolating out, I estimate that, based on a known number of records, it would take 45 minutes at 4-5 seconds a record to finish processing.
I hardly understand what you just said :) But, I just found your site through Ben Forta's blog, so I am sure I will be reading up on all of your SQL insight.
Yeah, I know there are costs, but I know nothing about the trade-offs at this point. I can tell you that this page was run many times a day so it needs to be super fast.
Glad to finally have arrived ;) Yeah, I am starting to see the whole ON clause optimization. I always just assumed that since these were numeric fields, they would be really fast anyway... live and learn.
Happy (belated) Birthday!
I will check out those links. This stuff looks very exciting (in a really geeky, not exciting to most people sort of way) :)
Agreed on indexes I remember 3 years back one of the application was taking atleast 5 minutes to show the results from one of the query and once my senior db admin added couple of indexes it took seconds to run that query ....always use them now but as mentioned too many of them can do more harm than good so you need to be sure what to index :).