Why My Queries Hate Application Service Layers
Posted June 18, 2009 at 10:02 AM by Ben Nadel
Last week, I was trying to take an old, procedural style ColdFusion application and factor out queries into some sort of Service layer. I wasn't trying to create an Object Oriented ColdFusion application - I was merely trying to centralize some business logic such that it wasn't so distributed throughout the application. At a glance, this seems like a rather simple concept; and, when I've applied it to simple applications like OOPhoto, adding a Service layer was a snap. But, this is quite a complex application that I'm working with and I quickly found adding a Service layer to be an extremely painful task.
The first thing that tripped me up was figuring out what the "gesture" of the query was. Meaning, what was the query doing and how could I name the Service layer method to reflect this intention. Furthermore, how could I define the method in such a way that made it reusable in different situations since it's only in the reusable nature of centralized queries that make them worthwhile (otherwise, you might as well just have them inline).
It was this "reuse" part that really slammed me into a wall. As I examined my queries, I realized that they were almost all different in some way. And I don't mean that they had slightly different WHERE clauses - I mean that they had completely different JOIN clauses and calculated columns based on the page in which they were used. I was really stumped. The queries seemed to be right, but they all seemed to be a bit too different to reuse in any way.
This vexed me greatly as I started to become concerned that this same issue would challenge me on future projects. Then, after staring blankly at a query for no less than 20 minutes, it finally occurred to me why my queries were so distinct and non-reusable:
The queries themselves were used for a combination of both data retrieval and data validation in a contextual way.
It's the contextual data retrieval and validation aspects that was making them so page specific and non-reusable. Let me give you an example: Suppose I am on a page where a user get's to download a Asset with the given ID for a given Project. Well, rather than just getting the Asset at the given ID, I get the Asset at the given ID with it associated project details (project name) in the context of the given Project (untested code):
- ( p.name ) AS project_name
- asset a
- INNER JOIN
- project p
- a.id = #asset_id#
- p.id = #project_id#
- a.project_id = p.id
Yes, I am simply getting back the given Asset at the given ID with this query. But, the query doesn't just hit the Asset table; rather, it hits the asset table in the context of a Project-based JOIN on the project ID and returns project information as well. Meaning, it uses the query not only to get the Asset and the Project, but also to enforce the fact the given Asset belongs to the given project. In the end, I know that if the query doesn't return any rows, any of the following might be true:
- Asset ID was incorrect.
- Project ID was incorrect.
- Given Asset was not part of the given Project.
This was really useful because it made my post-query validation a snap. But, it is this type of contextual query that is making my Service layer a nightmare to create (I have, in fact, stopped for the time being). Because the query performs several actions (two explicit and one implicit), it really ties each query quite tightly to the page in which it is used, making it almost entirely non-reusable across the application.
So, how would I fix this going forward in future applications? I am not 100% sure that this is correct, but I suppose that I would have to separate multi-table data retrieval and pull the context validation out of the query and perform validation in a secondary step. For example (pseudo code):
- qProject >>
- project p
- p.id = #project_id#
- qAsset >>
- asset a
- a.id = #asset_id#
- Validation >>
- (NOT qProject.RecordCount) OR
- (NOT qAsset.RecordCount) OR
- (qProject.id NEQ qAsset.project_id)
- // Throw error that data was not valid.
Now that the asset query is just getting back the asset information and the project query is just getting back the project information and the context validation is performed after both queries have run, suddenly the two separated, cohesive queries become much more easily definable and reusable across the application.
Thinking back now, I have a serious habit of making my SQL queries complex with all kinds of JOINs to build data retrieval and data validation into a single step. Understanding this, I now realize that upgrading my existing applications to be less "spaghettified" is going to not only require factoring out queries, but also completely changing the way in which they were used. That's probably not going to happen; looks like a forward-only type of change.
As a final note, I just want to say that I really like the complex queries. They seemed to have so much more "intent". These little, table-access queries just seem so... I don't know... not exciting. But, I guess it's just that thinking that has come back to bite me.
Ben, I've been struggling with similar issues in my current application: a large number of my queries aren't reusable because they are specific to single use/display. I'm hopeful that this post might spark some discussion, but I really don't agree with your closing design. Breaking a single query with a join into two or more queries plus a CF validation bit is horribly inefficient, and that will come back to haunt you.
In my current application I started with lots of flexible general use queries and would do things to reuse them like:
x = getSites()
y = getSections(ValueList(x.siteID))
And in almost every case I've had to go back and convert these into joins for efficiency. in other words, while it might be a prettier solution to be able to reuse chunks, it can't come at the cost of performance.
But, Ben, if your function or view needs data from more than one table, a well-written query with the proper joins gets you the needed data with one call to the database rather than two or three separate calls. Avoiding multi-table queries (which is what you seem to be contemplating here) is only going to increase the number of database transactions in your application and slow down its performance.
Yeah, I am hoping there is some good discussion here. Hopefully others can either sympathize with this or tell me why it's junk :)
I agree with you 100%. The problem, however, that I was experiencing is that this "efficiency" made my queries almost completely non-reusable. So, perhaps its just a trade off?
I have struggled with this one a bit myself in creating an OO application and trying to keep the queries as reusable as possible. But my biggest hurdle has been more of a query performance issue.
For example if I have an article table and at some point I want to retrieve articles based on date, and at another point I want to retrieve articles based on author, I could write a single simple query that retrieves all the data I need and then use the application to filter out the data I want. But the database and systems guys compel me to write these as two separate queries for performance sake since over all the queries would be quicker if those clauses were defined within the sql syntax.
This is a simplified example of much more complex data retrieval requirements that actually cause us to regard performance highly and unfortunately also causes us to write many queries for thing like "articles".
I understand completely. If you want articles by author, depending on your structure, you could simply JOIN article to author table on a given author id. That's why I think sometimes methods like:
... made with the appropriate JOIN. Of course, whether this can be reused is another issue. But then again, not every query can be reused. However, that said, I think sometimes, the complex queries can be made more reusable if they were split up more.... but at a trade off.
Yep, and so I end up with many getArticlesBy*() functions. And yes I realize this is pretty much an inherent issue we must live with, but I am for sure looking forward to the day where query performance on a database server makes this issue much more moot.
For your example, I would suggest writing dynamic queries where the filter is determined at runtime. So, for example you could have a getByfilter() method that determines whether to return articles based on author or date (or whatever) depending on what arguments you pass in. So, for instance, articleManager.getByFilter(date="< #now()#") or articleManager.getByFilter(author="Brian").
The issue with that, as I am sure Brian will attest to, is that a query that filters on "Date" might have a completely different structure than one that is optimized to filter on "Author". For example, the Date-based filter might not bother using a JOIN on authors.
@Tony seems to have hit it on the head! In .NET development, there is an add-on called nHybernate. Once you set up some XML mapping in your application, you just make calls to a DAO and pass in parameters that are used by the DAO to automatically run just the right query and return just the right data.
I don't see anything wrong with writing CFC's that do basically the same thing--execute any of a number of possible optimized queries based on params/filters passed in. Then, it's the METHOD and/or CFC that is reusable and we don't have to focus so much on the SQL until we find that one or more of the queries the METHOD runs has performance issues. Then we just update the SQL but all our calls to the METHOD or CFC don't have to be changed when the queries do.
Why don't you wrap that 1st query in a function and return a struct with the values needed and handle the logic in there?
I was thinking about this same subject recently when working on one of my projects.
It reminds me of some of the early comments about procedural vs oo programming. Procedural is generally faster because oo has more overhead that causes it to be inefficient. For example: instantiating a whole object just to access one property is pretty inefficient.
Anyway, I think it's a question of balance and of thinking about how much we want to optimize. As things stand, there are inline queries throughout the code. Bringing all the queries into a central location is better, imo, because it's easier to make changes when all the code is one place. Next, if some of the queries can be reused as they are, that's an extra bonus. And if some can be generalized to better handle reuse, then that's a third benefit. But I think it's a case by case situation. For some very complex queries where performance is important, it will make sense to keep them very specific. Kind of like how game programming is much more low level than other programming because it needs to be very efficient in its execution. In my experience, abstaction leads to performance penalties. We've accepted this with programming but we're not so used to encountering this with queries.
Anyone have examples of the contrary?
You could also use something like DataMGR by Steve Bryant and write little wrapper functions for your queries. His project is a Data Access Layer, I've never used it, but i've looked at it quite a few times. I had built something similar myself prior to learning about his. His project does a lot more than what i need, otherwise I would use it myself.
Something to think about!
Yes Ben this is exactly the case.
Is this now where we enter into ORM territory? And speaking of nHybernate, is this similar to the Hibernate that is slated for cf9?
I'm not sure about CF9 but nHybernate is definitely all about ORM.
At my current office I work with a guy who has been writing Java (no CF, just Java) for about 10 years. We have talked about this quite often.
His explanation is that we often centralize our apps around the database instead of the other way around. If we are programmers, not dba's, then why do we try to have the db do so much?
In other words- use your strengths. If your strength is CF, then use CF to handle the logic. Let the application become huge, not the queries.
It will become so much more maintainable that way.
Sure, and I would love to do it this way but since I work for an organization where not only i am told to do it in this certain manner, but it makes sense to write very specific queries because of the sheer traffic our sites get to where we need all the database performance we can sqeeze out of our db cluster is very important. I doubt the average java program has database traffic/performace issues to the degree web applications do.
I don't think the issue is whether or not we can wrap queries up in a service layer. You can always take code written in one place and move it behind a method call. But, there is a question of naming. For example, @Hatem, what would I name the first query method? It gets the asset AND the project using the Asset ID and the Project ID. The only thing fitting would be something like:
... but that's a bit crazy :) You can't go with GetByFilter() because AssetID and ProjectID are not optional filter items - they are essential to the intent of the query.
... not really sure how to even move this into a query gracefully.
I like the concept... but even in that mindset, I think we run into similar issues. For example, see my above comment about naming. How can we name the intent of such a query?
It would look more like this:
That method could, ultimately, handle multiple signatures. You might create a case for:
and another for:
Basically, write the Method so that you can get what you are asking for via whatever you MIGHT pass in.
@Ben, I assume that there is more to the app and that the assets and the projects is not where it stops. I would probably create several services or cfcs and be specific as to what I would put in each one. In this case, if i placed this in a cfc specific to assets, maybe I would name it getAssetDetail(), assuming that I don't have to return assets any other way. I would imagine the project info is required to return an asset in this case.
Long method names are totally fine, the one you mentioned was a bit longer than what I would like, but in some cases you have to have a descriptive method name!
My concern there is that the intent of the method becomes a bit "surprising". I would prefer explicit methods over multi-signature methods. Plus, I think with that kind of approach, optimizing becomes awkward; everytime you add another method signature, you have to start looking at existing, internal solutions to see if they can be updated to use new signature or if a completely new query needs to be run internally.
I would definitely be all for creating various CFCs to handle the application "verticals". However, I am not sure the intent of the method is really to simply get the Asset Detail as it does get project information as well. I think we could easily think of a use to get asset details that are not related to projects (ex. a page that list 5 most recent asset uploads with download link). This would not be in the context of a project, simply for asset retrieval.
Maybe its a silly example... maybe the long, wordy name is really the only way to express the true intent of the underlying request. I am OK with that - it's just that it doesn't make the query reusable (much).
Put the complexity into the Method, rather than the flow control/logic.
The method name should be simple:
The params passed in should be a set of optional params--obviously, SOMETHING should be passed in--but the signature of what is passed should be used by the method to determine which query to run.
This isn't supported very well in CF but I still think it's worth pursuing in many cases. It makes for less maintenance of the business logic.
I do a combination of what Hatem and David suggest. Actually this is what I am developing at the moment, and feel free everyone to comment on this. For each "object" of data i need, for instance, user or article, I have a separate package named as such, with each containing a CRUD, and a Gateway object. If it isn't obvious, the CRUD object handles creates,reads,updates,and writes to single rows. The Gateway object handles reads of entire recordsets.
Now, withing my gateway since it already inherently belongs to "article" my method looks like so:
qryByAuthor( authorid ); or
qryByDate( date ) ;
So in your case Ben instead of:
The method could be
In this case could the ID portion be assumed?
I disagree with you. A method should have only one purpose. As soon as you do as you suggest your method now has multiple purposes. In other words the method should used the passed in data to perform a task, rather than use the passed in data to decide which task to perform. That logic belongs outside of the method call, deciding which method to call.
I think that makes sense. However, even with this, all we've done is moved the query into a method. We have done nothing to make the method more general and therefore more reusable, which I think is still a point of contention.
What I don't want to end up with is a server layer or gateway layer that has 40 methods, each of which only gets used once in the application. I mean, if that's the way it needs to be, that's the way it needs to be; but, is that how it is because my original query did *too* much?
Definitely an interesting problem to solve because, of course, there are several reasonable ways to implement it.
I would probably go with getAssetDetail() but have optional named arguments for withProject=true etc so the intent would be readable through the names of the arguments.
This would still allow reuse of code and expressive code.
I often have data layer methods that conditionally JOIN across different tables.
I am pretty sure that java apps get plenty of traffic. Think LinkedIn. Any site that has the extension .do is a java app behind the scenes.
It is always much harder to come to a complete app and refactor, so in some ways, you are in a difficult place. It seems like the app was designed around a database instead of the other way around. So, while it would be great to think that there is a lot that you can do, you are in a difficult place and might have to keep a lot of logic in the query.
But going forward, it is great to use an ORM, such as Hibernate, that way you are thinking in terms of OBJECTS not in terms of queries or databases. This allows the application to flow a lot more smoothly.
That sounds pretty good. Conditional joins can be nice; but, the one thing I don't like about them is that filter logic then usually needs to be duplicated to some respect. For instance, if I have a JOIN, naturally, I would move as much filter logic into the ON clause of the JOIN. However, if I don't have a JOIN, some of that filter logic needs to be moved into a WHERE clause. So, there's just a lot of checking to see what should go where and whether or not different clauses need to be defined. Not the end of the world, but something that has always seems error-prone.
It's interesting; I don't think I think in terms of the database - I think in terms of data "points". For example, I don't need a "project record" - I need a "project name". Maybe its the same thing, maybe it's a slightly different take on it.
Remind me again: what exactly is so great about reusable queries?
That question must be answered with another question, which is I think the driving force behind the first:
What so wrong with inline queries?
I think the answer to your question has its foundation in the answer to this question. I am still a journeyman and not one who can truly answer such a question.
My gut tells me that these two go hand in hand. That an inline query would not be *bad* unless it involved duplication. Therefore, to refactor, I assume would require something to be reusable (otherwise, there would be no point to refactoring).
Perhaps there is an assumption somewhere that's wrong - a straw man or something.
It limits the number of times that you have to write the same query with different params, lol!
Being a proponent of maintainable code you must agree that if you only ever had one Select, one Update, one Insert and one Delete query you have reached maintainability nirvana. The SQL may be littered with 50 conditionals, but who cares? Heck you could even send in an action argument and use a cfswitch with four cases: Select, Update, Insert, Delete. Now you're down to one cfquery tag for your entire app! Sweet!
@Ben, yes, the JOIN ON / WHERE logic can get a bit hairy if you have lots of options in a method. It's a fine line to walk figuring out what should go in a generic method vs when to add a custom method for a complex query.
@Hal, that's a good question. If a query really only used in one place and there's no similar query elsewhere in the app, it doesn't need to be shoehorned into reuse. If you have several similar queries sprinkled all over the application, don't you think it makes maintenance easier to centralize those?
I don't think that methods should necessarily have one purpose (as Brian stated). Even in Java, you have method overloading, where the method's purpose is determined by its arguments. In CF we can do this with duck typing.
I've lately gotten into using base classes for my data access and use an approach similar to what Sean suggests where the intent of the method is reflected in its arguments. If you want to have more "expressive" methods, you can add in some syntactic sugar by having facade methods. So getByAuthor(id) calls getByFilter(authorID=id) behind the scenes. As you abstract more of this out, you do end up with more "standardized" queries. But for me the need for really complex un-reusable queries is more of an edge case. Ben -- in response to my last comment, you said "the Date-based filter might not bother using a JOIN on authors." That's actually something else that can be parametrized. My base getByFilter method can take an optional argument which is a comma-delimited list of fields to return (even those joined by a foreign key). So getArticleByAuthor(id) calls getByFilter(authorid=id) and getArticleByDate(date) calls getByFilter(date=date, fields="articleID,date,summary,body"). There you go -- no superfluous author field. These kinds of dynamic queries can get pretty complex, but the beauty of it is that you only have to write them once.
@Matt, I hope no one actually tries that! You illustrate well the trap that folks can fall into if they take the drive for reuse too far.
Any given application contains an inherent amount of complexity. You can put it all in one file/method (and have the ultimate procedure app) or distribute every single little piece of it into separate objects (and have the ultimate OO app). The complexity is still there. In the former, it's explicit in the logic. In the latter, it's more implicit in the relationships between all those objects (and that's something folks can have a really hard time with when they're relatively new to OO). The trick - as always - is striking a balance and finding the best set of trade offs for your application and your team's skill set.
I'd like to submit my vote for the "straw man" theory!
@Matt I'm glad we see eye-to-eye on this. In fact, I've been secretly working on a major framework that will make all others obsolete. It's the ultimate in reusability, having a single command: do(). Now, granted, there are a LOT of arguments for this, but one can (hopefully) see the brilliance of my plan. I suspect that you had your tongue firmly in cheek when you wrote your comment. ;-)
In fact, I think that code with conditionals scattered through it is a sign of UNmaintainability. Over the years, I've learned to be extremely wary of conditionals creeping into my code: they're almost always a sure sign that I'm on the wrong track.
SQL is all about taking encapsulated, separate tables (forgive me for using bad terminology, Dr. Codd) and "flattening" them. It's wonderfully clever but the price paid is reusability of atomic pieces of code. It's a price I'm well willing to pay: I want optimized queries that run fast. I can get reusability by having separate queries rather than a join, but THAT price -- slow execution -- is too great.
Yes, if multiple queries really have the same (or very similar) functionality, aggregating them into one or a few makes good sense to me.
I've found it very helpful when making optimization implementations to ask myself the question, "Are these things that seem similar NECESSARILY the same or merely ACCIDENTALLY the same?" Answering that question correctly has saved me a lot of grief.
Then, I suppose, beauty truly is in the eye of the beholder. It is perhaps due to my initial experience of being a craftsman, but I find no appeal to these big "do-it-all" approaches, whether in methods or queries. My father used to say, of certain things, "It's too clever -- by half." That pretty well expresses my feelings about superqueries.
What's so wrong with inline queries? That IS an excellent question, Ben. I think the main reason against having them is that, IF that same query is needed elsewhere, you have duplication of code -- never good. But if the queries really have different purposes and only accidentally have some of the same SQL in both, factoring that commonality out provides extremely marginal benefits but comes, as I said, at a very high price vis-a-vis performance.
I've definitely been following these comments and I have to say this is all very interesting stuff to see everyone's different approach to accomplishing the same basic tasks. Getting data from the database in the easiest way and fastest way possible is really is what this is all about.
Trying to get back your blog post, I would like to ask you what is the purpose of your service layer? If all you are trying to do is pull your queries out from inline code and get them all in the same easily accessible area, does it really need to be any more than that in the case of this procedural app?
Wouldn't another good reason to pull that query out be to have your queries in a central location for easier editing in many circumstances? Obviously small applications it wouldn't matter, but the larger they get, the more buried your queries my become.
Good point, Brian.
Is it possible you're focused on solving the less valuable part of the reuse problem?
The service method doesn't need to be used to return different things, it could be used to return exactly the same thing you are returning now and you would still potentially have reuse.
For example, if you ever want to make a new view for the data then you would be better off with the data call in the service layer rather than inline.
Also, potentially important, the UI layer of your application can't be reused because there's an inline query sitting in it.
Admittedly, you probably won't be reusing your UI logic somewhere else, although once you start pulling out all the inline stuff it's amazing how similar data views are. Still, this is the same sort of stuff I think about when I'm refactoring my service layer.
If I wanted to get more complex, after moving the inline call to the service layer, the first thing I would probably do is to move it again to a data layer.
Then when I create the service I pass in some sort of information about the project. Depending on that information the service would load a different data access object. Since the service already knows about my project from when I instantiate the service now the only parameter I need in my service method is assetId and the method GetAsset(assetId) is likely a pretty reusable service call.
Many posts later and I that I'm still in the same boat at least. I think the discussion has basically said that putting all queries together is good for mainainability, but reuse is only important if it doesn't impact performance. Thats pretty much the rule I follow. I try to make my queries flexible (customizable column returns, order bys, and limits) but just make a new query when I need really different stuff.
Well, actually the main purpose of my approach as outlined here is to deal with the more repetitive SQL -- the boilerplate CRUD stuff I don't want to cut-and-paste everywhere and change the field and table names. Do I try to use my generalized methods as much as I can? Yes, because it makes me more efficient. But I'm not naive to think that it's always practical. I have nothing against highly specific methods or queries when they're called for. So I wouldn't characterize my approach in general as a "do all" approach.
I have to agree with Hal. Normalization of a database means that it will be spread over multiple tables and each query is going to join on these tables differently. This is what makes OLTP efficient.
@Hal, I love your wording of "accidentally similar" query definitions.
One thing to keep in mind is database coding is different to procedural and OO coding; and attempting to apply their best practices to it may not work.
My personal belief is to encapsulate queries in logical CFCs (not table based), for example: Client, Administrator, Manager; instead of User. This approach allows for reuse of small queries (like filling lists) and maintainability of code. If there is a need for multiple queries, then there is a need.
I would like to point out that SQL it self is an extracted layer. You don't have to know how it selects/updates/inserts/deletes your data to use it. ;)
Ben, i'm not sure if Hal explicitly answered your question, "What's so wrong with inline queries?". But I think what he's saying is that if they're inline, you *prevent* reuse. If they're not inline, you have the *potential* for reuse.
I can think of one other reason to keep them separate, and I've been running into this one with increasing frequency the more unit testing I do: if your queries are completely contained in functions, then you can mock them in unit tests. This pattern has served me very well..... I have a function that "does things", i.e. contains "business logic". It gets its data, and operates on that data. For example, let's say I have a function named "preventRollbackOnOutstandingOrders" or some such thing. Now, the purpose of this function is to "do stuff" when there are outstanding orders. Even in this case, rather than have the query to retrieve outstanding orders inside of that function, i'll pull it out: "getOutstandingOrders". and then my preventRollback.... function will call getOutstandingOrders() and then do whatever it needs to do with that result.
this might seem stupid. but here's what it gets me. in my unit tests, I can now easily test all teh logic in preventRollback... because I can mock the getOutstandingOrders query to return all the different kinds of resultsets i need to prepare for in the real world. (shameless plug: I addressed this at cfobjective in the "writing easy-to-test code presentation. You can download it here: http://mxunit.org/doc/index.cfm#cfmeetup).
In a nutshell, for me, keeping the queries separate gives me *freedom*
You've got abstraction into layers vs. inline queries, and then you have reusability vs. non-reusability. They are two different things.
My vote would be to put stuff where it "belongs", but don't worry so much if you have a method that contains your complex query, may or may not have a longer name, and may only be used in one place.
Reuse is not the holy grail. If you can't find the reuse in a resonable time frame, it's probably not a candidate. Otherwise, you risk spend an eon chasing it down. Just because a query SEEMS similar to others and it SEEMS like you might be repeating yourself, chances are if you've had to put this much thought into what to do with it, it's probably a one-off query that runs great, and is used for a single purpose. Put it inside it's own function, grouped with similar functions, in whatever layer you think it belongs, and leave it alone! You can always refactor if you discover a means of reuse later, and as long as you are continually thinking about your app, refactoring should always be on the table, at each step.
The idea of splitting it up into multiple queries is nonsense... and it's the type of stuff that makes for really piss-poor performance. You'd be abstracting away just about everything that is great about your database tier. I get seriously irked when I see that in an app... probably more than any other poor programming practice I've had to recode. And I've been playing code janitor for a long time.
Everyone worries so much about having "dumb objects" (or whatever the term is nowadays), but I think it's worse to treat your database tier as if *IT* were dumb. It's certainly not. And it's not at all bad to have a few single purpose, single use functions. It's going to happen in just about any application... so don't worry about it.
At least not until Hal finishes his Do() app. Then we're all home free! ;)
Get on that, Hal.
P.S. If you don't like long method names, why not shorten it somewhat and then put the description of what the method does in the HINT or DESCRIPTION attributes of the cffunction tag? Documentation is the key difference between amateur and professional, and this would seem a perfect place for it. Between that and the cfargument tags, I'm guessing you won't ever have trouble figuring out what that particular function does. And you get the added benefit of generating your good documentation automatically using CFCDoc, which picks up all those hints/descriptions. Once a method name reaches critical mass, this is truly a good fallback option, IMO.
I am sorry that I used the phrase Service Layer or even talked about data-access layers; not because that was not my intent, but rather because I think they are very loaded phrases and I think the conversation here has gotten somewhat off track from my original point (which perhaps I did not express very clearly).
When I write a procedural application with queries on a per-page level (usually in "Action" files), it is very easy and satisfying to write a beefy, complex query that:
1. Gets ONLY the data required by the give page (with no extra columns returned).
2. Has additional calculated columns that are ONLY used in this page.
3. Performs validation of data across all aspects of the possible JOIN clauses.
That's really nice to do because:
1. It requires a minimum number of calls to the database.
2. It does not return any extraneous data.
Now, clearly, this is thinking in terms of data (NOT THE DATABASE - let's please draw that distinction - I don't care about the database - it only influences my JOINs - I care only about the data that I get back). This is fine because it is efficient on a page-level basis.
BUT, my real, original question / thought was, does optimizing each query for its singular use hinder me too severely in ability to reuse logic. And, if so, is that even a problem?
You already know this -- there's no right answer to your questions.
I think the right answer is, start out with an efficient, well constructed query. Look for obvious places where you can make it dynamic (i.e. which columns to return), but if you find yourself making the query more inefficient simply for the sake of "reuse", consider that your last resort. If the query really is complex and built the way it is because it's the best way to get back the data that is needed in the vie, then I don't consider that a problem at all. It's where you PUT the query, and how you choose to group it with similar functionality somewhere in your app that will lead to maintainability and the potential for reuse down the line.
If you are building complex views that use data that is assembled from the database in a complex way, that may just be the nature of your app. Optimizing your queries for singular use will make your users happy when the app screams, and putting those queries in a logical place will make YOU or any future developer working on the code just as happy. If you have logic that's based on the results of a query, and that logic applies to more than one query, abstract it out if you can; otherwise, document it's existence and move on.
As you can tell by now, I'm against making inefficient queries, so anything you do to your nice efficient query had better be for a darn good reason. I guess only you can decide though, what the threshold is; where you cross over into "a good reason".
Size and anticipated load of the app come into play too, but I don't think "small" is a good enough reason to abandon good SQL just because it can't be reused.
Hmmm, I always dislike the "no right answer", but I've come to accept that it's just a way of life :(
Here's an interesting concept though, that this brings up - a person who is a SQL guru will probably think about their data access in a completely different way than someone who knows the basics. I am not saying that I am a SQL guru by any means (I just love JOINs), but it seems that the more someone knows about SQL, the more they can view every situation as an "opportunity" for optimization that may or may not completely conflict with "design for reuse".
I agree, someone more proficient in SQL will most likely start out with a more efficient query.
Someone that is not may start out by actually doing what you proposed -- splitting out the function into separate datasets, perhaps because it's all they know to do, and the side "benefit" would be the code reuse.
Then, they might argue that their way is "better" simply because of the code reuse they inadvertently realized.
If there's any place that optimization should be explored, it's in the access to an RDBMS. I've seen the arguments, ad nauseum, over such drivel as "which is faster, CFIF or CFCASE", or other such things with so little overall processing time to gain, while the database tier is left to substantial abuse, with HUGE repercussions. So, even though I myself am not a "guru" of SQL, I continually strive to make the database access as efficient as possible up-front because a mistake there is much more costly to me in development than whether something was reusable, or even "optimally maintainable". I'd much rather have to go back and do the small stuff, than futz with replacing several queries with one good one when the app load started to demand it.
If *every* query is in conflict with code reuse, then you have a truly unique situation. Any developer that is at the level to be talking about layers of abstraction, and the type of code reuse you're discussing, is already "thinking right" in my view, and is capable of seeing whether or not a query can be made more generic or whether it's a one-off request for a specifically formatted dataset. You make the best decision you can about how to structure your queries, hopefully with "leveraging the best features of the database tier" (or at least "not ignoring") at the top of the list, and those that can be reused will fall into place.
I suppose I've already made this too clear, and am repeating myself, but I think it would be a real shame for someone to "know" which SQL statement would be best, and choose not to use it for the mere sake of "reusability", and a true disservice to the users, the hardware and software that's been purchased to do the job, and the folks that put so much effort into creating whatever outstanding RDBMS you've chosen (and today, ALL of them are pretty outstanding). If you are more advanced with SQL, then your queries SHOULD reflect that, and I don't think they should be the first thing to be abandoned, in order to bend to the way someone thinks their app should "look" or be organized. But some people do, I'm sure.
If you love JOINs, what about all the OTHER cool things that an enterprise-tier database can do? CUBE, ROLLUP, stored procs, views, and the ever-important execution analysis... if you start down that slippery slope of bending your "best choice" SQL with absolute commitment to your app design goals, could you be eventually ignoring or circumventing those features too? Do we want anyone else that's learning this stuff to believe that they don't HAVE to become better at SQL, or leverage what their DB tier offers? I don't think so.
Call it "data centric" or whatever other derogatory terms will come along, but if you're pulling data from your database, do it the best way possible for how it's going to be used, and if it's reusable great. If not, no big deal. It's not just an "opportunity" for optimization -- the db tier and the language are *designed* to give you the optimization, all you have to do is leverage it. You can either embrace that or circumvent it. It's just one area that I feel NOT leveraging it is much more expensive if you find you have to later.
UGH... how am i ever going to learn to write without being so long-winded...
Long winded or not, I like what you have to say. Very though provoking. Thank you for your feedback.
I don't think that *every* SQL statement that I have is optimized so highly for the given page; however, almost ANY query that joins a table(s) has potential for page-specific differentiation. Even something simple like:
contact / contact_information
When you think about a "contact" in "object speak", you probably think in terms of contact and their contact information. However, this is a JOIN with overhead; so, pages that don't need contact information, just contact name, I would never bother joining to contact_information has it is not needed. The same could be said about most join relationships (sometimes the JOIN'd information is simply irrelevant even if it is part of a "whole" concept).
Lots to think about here! Perhaps my preoccupation with things like not pulling down extra columns is a silly form of optimization, especially with small data sets.
"When you think about a "contact" in "object speak", you probably think in terms of contact and their contact information. However, this is a JOIN with overhead; so, pages that don't need contact information, just contact name, I would never bother joining to contact_information has it is not needed."
This is the exact reason that you use an ORM. You don't have to worry about what type of query to run (as much). Because they often use lazy loading, they only run the queries when needed. Of course, this can get expensive at times, which is why it can be overridden, but it will also save A TON of development time.
It has been said time and time again that hardware is cheap. Developers are not.
As one of the devs that I work with says- "Premature optimization is the root of all evil." Don't worry about optimizing your query until you know that you need to.
A great book (with a free html version) is Getting Real by 37 Signals. http://gettingreal.37signals.com/toc.php
Thanks for the book link; I'll definitely be checking it out.
I was not away that ORMs do / could lazy-load JOIN-based information. That seems pretty cool; seems like a super complicated process (depending on the number of joins that might be there).
They DON'T do lazy loading using joins... they make multiple, delayed database calls. Great if you're using objects, but not really very efficient.
I don't feel this type of optimization is premature, because lack of this type of optimization is more expensive than any other.
IMO, Preventable bad database interactivity for only application structure's sake is the REAL root of all evil, and it's expensive to have to go back and fix. Optimization that is right there at your fingertips, that you force yourself to ignore for the sake of a framework or methodology... that's like second on the list.
Just my 2c of course... there WILL be those that Object (pun intended) and I don't discount their position, I simply don't agree with it.
"IMO, Preventable bad database interactivity for only application structure's sake is the REAL root of all evil, and it's expensive to have to go back and fix. Optimization that is right there at your fingertips, that you force yourself to ignore for the sake of a framework or methodology... that's like second on the list."
HA... those two list items are the same damned thing. Guess I'd better get my coffee on.
ORMs like Hibernate can be asked to handle object joins in either a lazy or non-lazy manner, even on a specific query. That's nice because it means you can say contact / contact_information is lazy but still override it with HQL (Hibernate Query Language) when you want a non-lazy version:
The nice thing about this is that you can design and program your application with an object-centric approach and then apply query optimization where you need it.
So it doesn't have to be all-application or all-SQL in terms of building performance in from the start, you can have your ORM cake and eat it too.