I Just Can't Understand Object Oriented Programming (OOP) And Join Tables
Posted August 17, 2007 at 8:52 AM by Ben Nadel
I am, more or less, a procedural coder. Sure, I use ColdFusion components (CFCs), but rarely in a way that the righteous OOP people will say even qualify as "Object Oriented Programming." I understand the idea behind domain models and things like model-view-controller, but I just can't quite seem to connect all the dots that would allow me to implement this type of programming in my day-to-day work.
One of the huge mental hurdles that I have is the JOIN table. I understand objects that correspond to a single table, or a single entity spread across multiple tables; that sort of information is easy for me to update through a data access object. But what about these JOIN tables? I can't seem to fit them in anywhere that makes me comfortable as a man. Take for example a common problem that I come across: the parent-child relationship. In a join table the defines this hierarchy, we might have a parent_id, a child_id, and then additional columns that further describe the relationship, such as the sort order of the children:
| || || |
| || |
| || || |
The above tables are tables that I generally have to use when describing the relationship between two or more areas of practice in a given legal industry. What makes this even more interesting is that this join table doesn't join two different tables - it joins one table back to itself (cool stuff!).
If I wanted to get all the children records for a given practice ares, I might use a SQL query like this:
- practice p
- INNER JOIN
- pratice_practice_jn ppjn
- p.id = ppjn.child_id
- ppjn.parent_id = <cfqueryparam .... />
- ORDER BY
- ppjn.sort ASC,
- p.name ASC
Ok, so that's how it could get used, let's talk about where these values get updated. Currently, I would have something like a sortable table that submits a list of IDs in a given order. Taking that list, I would then do something like this:
- <!--- Delete existing joins to this parent. --->
- DELETE FROM
- parent_id = <cfqueryparam .... />
- <!--- Create new joins. --->
- to="#ListLen( FORM.lst_id )#"
- INSERT INTO practice_practice_jn
- ) VALUES (
- <cfqueryparam value="#intSort#" />,
- <cfqueryparam .... />,
- <cfqueryparam value="#ListGetAt( FORM.lst_id, intSort )#" />
If I were to switch over to an object oriented approach, where the heck would this kind of update go? I don't imagine it would go in the Practice business object, because it's not really related to a single practice (unless you are thinking it is related to the parent practice). But even so, what then? Would you have something like:
- <cfset Practice.SetChildSort( FORM.lst_id ) />
That's doesn't feel right. And what if the JOIN table had more than just sort information? For example, joining attorneys to practice areas (attorney_practice_jn), we would have practice_id, an attorney_id, a sort column, and maybe a flag for "department chair." Well, I guess in that case, our update SQL statement would have to change because we can't just delete joins for the sort update (otherwise we would lose the chair information).
Is this where something like a "Practice Service" object would come into play?
Clearly, I don't want to be instantiating a Practice object for EACH ID value in my returned list - this would seem very silly since reaching directly into the database is so much more efficient. Plus, object creation in ColdFusion is expensive.
Clearly I am very confused and frustrated with the creating of and the maintaining of join information using ColdFusion and object oriented programming. I really hope someone looks at this and says, "Oh, you are just missing one little bit of info here" and then I have this Eureka moment and all is good and happy.
Please help me help myself - I will make it worth your while.
What Other People Are Searching For
- Wanted: Full-Time ColdFusion Developer at Intoria Internet Architects
- Cold Fusion Senior Developer at Edge Information Management
- Back-End Web Developer-Information Technologist at Michigan State University
- ColdFusion Developer at Nonfat Media
- Mid-to-Senior Level Web Application Developer at SiteVision, Inc.
I agree with not being able to wrap your head around JOINs. That why I always use implicit JOINs like this:
p.id = ppjn.child_id AND
ppjn.parent_id = <cfqueryparam .... />
Same results. And proper OOP's beyond me as well.
Ben, this is *exactly* why I wrote my four part objects and composition series. It was this question that people asked me repeatedly i relation to the code generator (many of whom were dealing with linking tables as separate "objects" - which they are not). While the series covers a number of general OO principles and tries to relate them to CF, its primary goal was explaining the relationships portion (hence the composition). Hopefully they will be helpful to you:
Part 1 - No Frameworks - http://www.remotesynthesis.com/blog/index.cfm/2007/6/1/Objects-and-Composition-in-CFCs
Part 2 - Integrating ColdSpring - http://www.remotesynthesis.com/blog/index.cfm/2007/6/5/Objects-and-Composition-in-CFCs--Part-2-ColdSpring
Part 3 - ColdSpring and Transfer - http://www.remotesynthesis.com/blog/index.cfm/2007/6/20/Objects-and-Composition-Part-3--Using-Transfer
Part 4 - Connecting to Flex - http://www.remotesynthesis.com/blog/index.cfm/2007/7/20/Objects-and-Composition-Connecting-to-Flex
Usually, I would create a practice object, and a practice gateway object. The gateway object is used as a practice service. This is where you would have methods like getPracticeByID(id), savePractice(obj), deletePractice(obj), and so forth. This is where you would put your method for updating the relationship between parent and child. The practice object itself would be fairly basic and really should know nothing about the database. Hope that helps.
I can vouch that Brians series was EXCELENT, I was pretty sorted with OOP concepts but the ideas behind relations still had me really confused, after reading Brians stuff it made so much more sense.
It'll make you feel a lot more comfortable.
I guess I would do something like this:
Create value object Practice. It has property childs which is an array.
Then create PracticeDAO which can be asked for single Practice with id and knows how to populate childs array with Practice objects as well. Sort order is determined by position in array and never communicated beoynd DAO.
If you need additional properties from join table or just want to return sort order as well you would create ChildPractice which extends Practice and adds all the properties you need.
When it's updated DAO is provided with complete Practice object that has all the information needed. I would probably delete join table rows for this practive and write them all again using array position as sort order.
To keep performance reasonable I would inject Cache into PracticeDAO that keeps copies of all Practive objects in memory so they don't need to be created every time they are needed. That's very fast as long as there is enough memory for it - usually it's not a problem. And because all the modifications to database go trough this same component you always know when to refresh cache.
In my opinion this kind of approach scales to couple of thousand objects quite nicely.
There are other approaches that could be more appropriate, specifically if you need scalability or have 5k+ items to deal with.
I hope this helps.
I'm not sure I fully understand your problem - is it the practice of joining (inner/outer/full) many tables, or within one table? Or is it just joining in general? Sorry to ask, but you'd be surprised at how many developers out there don't know the difference between an inner and outer join. It's a mental hurdle, but when you get over it, you'll wonder how you ever coded SQL in the past.
As far as OOD is concerned - I was lucky in that I took an OO class at NYU a number of years ago (pure theory, Booch methodology) and it was as confusing as hell - not sure how I passed, it was all just a big blur! While I don't generally develop in pure OO (that's starting to change now that I'm getting further into Flex) where the class helped me was "thinking" in OO.
The method of taking a large abstract concept, and applying structure to it, is what I learned. Now, you may already be good at that, but having the vocabulary to go with it helps.
Not sure I've helped with that input - just trying to relay my experiences, maybe it's something you can relate to.
I tried reading your series when it first came out, but I think it was just a little bit too much above my head. I will try to just sit down and read through them all and really absorb this weekend. Thanks! And, from what others are saying, they sound most excellent.
I appreciate the description, but I feel like my hurdle is not with the theory (well, yes somewhat), but rather with the implementation.
Thank you for the very thorough explanation. I was hoping that it didn't involve composing a list of objects into another as I feel like so LITTLE information actually has to be updated. I guess that is part of why I don't underdstand all the "Best practices" in OOP. I look at the SQL statements that I do procedurally to update a JOIN table and it seems like such little overhead (a simple CFLOOP). Have composed children arrays seems like a vast amount of overhead to do such a little amount of work.... of course, this is coming from someone who has yet to "see the light."
I will check out the Booch method. Sounds interesting. I think I have a pretty good hold on how to leverage JOINs, but I am always learning new stuff. I think, and maybe I am wrong, I just don't understand how to build the "proper" methods and objects that handle this in an OOP setting.
Also, do NOT assume that your object model will line up with your data model. It often will not! I think you're looking at this in too much of a data-centric way. In this case I wonder if you might have better luck stepping away from the database and considering other options. What you are modeling here does not seem to be Practice. Do you see why? You aren't manipulating a single Practice, you're manipulating a Practice Hierarchy or Tree. And suddenly doesn't this make more sense, to ask the PracticeTree for the full nested and sorted set of Practices? And doesn't it make more sense to tell the PracticeTree to save the same set? You might still have a PracticeService, and sometimes it will interact with a sinlge Practice, other times it will interact with the PracticeTree. You're hiding the difference from the rest of the system.
Now I'm not positive this is the perfect solution for you (since we've just scratched the surface what you actually need). But the point is, absolutely do not feel compelled to create a one to one correlation between the data model and the object model.
You've only got one object. The practice.
I don't think you need your join table at all. Just add a parentid column to your practice table. Using that, if you're after all the children records for a given practice, you can do something like:
where practice.parentid = <cfqueryparam... />
If you need the details of the parent practice as well as the children, you can just join the table to itself like this...
select p1..., p2...
from practice p1, practice p2
where p2.parentid = p1.id
and p1.id = <cfqueryparam... />
It'd get increasingly complicated if you need to find children of children but then you could read the whole table into memory and build a tree, or use the recursive functionality in MySQL5 (or most versions of Oracle - no idea which platform you're using).
If the join table your using is only dependent to the practice table, there's nothing wrong with having them as a single service. If there are multiple dependents or if your business rules don't fit quite right, you'd likely want to break it out into its own service.
I would organize something like this the below. This is just my way, you'd have to adjust for your personal preference. In addition, it also depends on your business model so you mileage my vary. Hopefully it stays formatted (crosses fingers):
get/set for each property in the table
setDAO() <!-- for coldspring autowire -->
setGateway() <!-- for coldspring autowire -->
if practiceID eq 0
return new bean
return populated bean
return 0 if sucess practiceID if failed
I agree - updating a single practice feels very wrong. I was merely talking about that because I have no idea how to be doing this update? The PracticeTree ideas seems very interesting, but I have not heard/see this before. Do you have any resources about this type of an idea?
Well that didn't turn out too pretty. Hopefully you get the idea.
That aside, you should add a preview feature to keep me from posting garbage like that. ;)
The single parent_id cannot work because a practice can be grouped in multiple ways - its a many-to-many relationship.
You said that you diden't quite understand many of the "Best practices" of OOP, and why people use them, even considering the large overhead all the extra work / memory it entails. I believe that many developers are moving to OO because it's the "in" thing to do, not because they truly understand WHY its a good move.
The founding principles behind OO are that of making flexible, maintainable software. A headache most developer have dealt with is the pain of changing something, or adding a new feature. Fast forward two years, how much harder is it to add that new feature after endless compromises have been made within the application?
If the development team is following those "best practices", the features and additions made along the way don't cause as many compromises, and as such, make long term maintenance much easier. Design patterns and best practices of OO have come about as developers have painted themselves into corners and (through trial and error) found the best ways to avoid it the next time around.
I applaud you for your efforts, and hope that you soon "see the light".
Ben, no there's no real resources for something like a PracticeTree. There is nothing special about it, it is just another object that tries to model something in your system. Again, remember the point of OO is not to mimic your database, but to try to model the pieces of your application in a real-world way.
Forget the database for a minute. In fact, forget the database for a day, or a week, or even a month. Look at what your application needs to DO. Ignore how it will be stored and queried, all of that can come later. Think about the key elements in your system and how they need to interact with each other. This is what OO is supposed to be. Once you have the object model designed, THEN worry about the database. Sometimes the best way to store things in the database will match up with your object model. Sometimes it won't. That's fine. That what data layers are for. Hopefully this makes sense.
I don't think I know enough about the OOP objects to stop thinking about the database yet. However, I also want to be careful, because shouldn't I be worried about creating a machine to move a mole hill (so to speak)? Like, I look at the SQL statement I do right now to update my joins and it is wicked simple. Should the OOP version of that also be wicked simple? Or is that just something I shouldn't even be concerned with?
There is no "OOP version" of your query. It should remain exactly the same. If you want to but it into a Gateway or something that is fine. OOP has nothing to do with the database. This might be the hardest thing to come to terms with when coming from a more data-centric mindset. You don't HAVE to create a PracticeTree CFC unless it makes sense within your application and your object model. That was just one idea I was throwing out. If the only time you ever mess with the hierarchy of practices, it might be just fine to have your controller call
and then have your service call
and the gateway just runs and returns the same query you're using now.
Don't be so worried about the "right" place to put the query (as long as it is in some sort of data layer component). If everything that needs the tree from the outside calls the PracticeService, then you can change the internals how how the PracticeService gets that data as often as you like. The changes will be totally hidden from the rest of the system.
Yo Ben! When I was first learning OO ColdFusion I stumbled across this site: http://cfoop.org/ . I transposed the examples so that they would all work together using the Northwind database that comes with SQL Server 2000.
Hmmm. So much to let absorb :) Thanks for all the feedback.
Not that I'm an expert or anything (as many of the other posts seemed to have covered quite a bit of ground with this), but I think you may need to just evaluate how one practice relates to another practice.
I've done something similar to this with categories and subcategories (which should work as a very dumbed down version of your practice and "sub" practices).
I have a top level category - Main
This has an id of 1
From here everything else points to its parent_id
There is no subcategory table, just a category table with parents and children
The data would be set up like this (using your categories up top as a reference, somewhat)
id, parent_id, category
1, 0, Main
2, 1, Web log
3, 1, Coldfusion
4, 1, Projects
5, 2, august 2007
6, 2, july 2007
You don't have to show main (select category from categories where parent_id = 1), but then you can just reference all subcategories by using a little recursive function call.
In the above example, you shouldn't even need the child_id, as you can get the children just by referencing the id of the record, and selecting all items with that id as the parent_id.
This should then work for your "practices" example.
practice_id, practice_parent_id, practice
1, 0, all practices
2, 1, first practice
3, 1, second practice
4, 2, sub practice of first practice
5, 4, sub practice of sub practice of first practice
6, 3, sub practice of second practice
You could still add a sort field, but numbering would depend upon practice_parent_id of a practice.
I'm hoping this answers the question and I'm not completely on a wrong tangent (what with all of the other answers posted here).
Maybe I am not understanding your data structure, but can it handle a category that has multiple parents? With practice areas, this is something you would need to have. For instance, the practice area "Government Litigation" might be a sub practice of BOTH "Regulatory Law" and "Government Law".
It's nice to see someone put this problem out there. I think there are two flavors that you state: one where the relationship between classes is not attributed and one where it is. These are pretty different kinds of relationships. I have a great tome "Object Oriented Methods and Principles" (800pages) which it seems you can now get used on Amazon for two bucks http://www.amazon.com/Object-Oriented-Methods-Principles-Practice-3rd/dp/020161913X/ref=sr_1_1/103-2669525-8679027?ie=UTF8&s=books&qid=1187368407&sr=8-1
(it cost me $50)
I'll send you an excerpt that describes your problem (pg 280), but let me summarize a bit. The light that Graham exposed to me in that tome of a book: all relations should be unidirectional. Let's say that you had your Attorney /Practice example in which the relation has the attribute "department chair". Now you have a new class on your hands: AttorneyPractice with property "department chair". AttorneyPractices have Attorneys and Practices. Attorneys have AttorneyPractices and Practices have AttorneyPractices. AttorneyPractices can describe the relation between Attorneys and Practices with regard to "department chair". Interesting how this lines up with your typical Many-to-Many database model, huh? The idea is that you can store mappings to other objects in your object without breaking encapsulation, but if you start to store associations with other objects inside your object you're in trouble.
Your "sort" is an attribute of the relationship between two Practices and changes this from a mapping to an association. This becomes more apparent with a name like "importance". Without this attribute, your Practice could just map to its children by having a children property containing an array of Practices. If you are worried about instantiating them all, don't: make the instantiation lazy. Make getChildren return custom iterator that only gets each instance when it is asked to. That way you could get a count of children without instantiating all the children with Practice.getChildren().count(). On the flipside, you also want Practices to know who their parent is. You could have setChildren() take the FORM.lst_id you want to pass, and save that into the children property. When you Practice.save(), the save() method knows to iterate over the contents of children and store them into the practice_practice_jn table.
The important thing here which indicates that you should consider encapsulating the relationship is that "sort" is not a property of either of the related Practices. It is a property of the relationship. By adding the "sort"ing, you consider creating a PracticeRelation class to avoid having the related Practices both know the same thing about the relationship. In this particular case, it doesn't have to cause massive complications as the value of this property of the relationship can be expressed implicitly in the order of the results returned. That is, you can have getChildren and setChildren ascribe meaning to the list order and store/retreive the value of SORT to/from the database without this value ever being readable in the model.
So, this is a complicated way of saying I think it makes sense for Practice to know about and manage its mappings to other Practices. I think you are on the right track.
OK, slight revision due to new-information-to-me :) I don't see why you couldn't have multiple parents with my method, you just need to revise the data structure.
You should still only need the parent_id though (I still don't think you would need a child_id, just a matter of joining a little differently)
1, all practices
2, first practice
3, second practice
4, third practice
5, fourth practice
6, fifth practice
7, sixth practice
8, seventh practice
All parents of an item would be
select practice AS parent_practice
from practice p
inner join practice_relations pr on p.practice_id = pr.parent_id
where pr.practice_id = #child_id_here#
all children of an item would be
select practice AS child_practice
from practice p
inner join practice_relations pr on p.practice_id = pr.practice_id
where pr.parent_id = #parent_id_here#
so using 5 as the example would return
Not a whole lot I can add, but just to reinforce what Brian was saying. Forget databases. Think about basic data structures - trees, lists, doubly linked lists, directed graphs etc etc etc. Then think about the OO representations of those.
The relational view of the world is neither procedural nor object-oriented and is not a good fit for any mainstream programming language except SQL. So you can build your app around SQL queries - which is a perfectly valid way to go, IMHO - but the moment you want CF to do some heavy lifting, whether procedural or OO, you need alternative data structures.
It's easy to forget it in the web programming world, but in most software engineering contexts relational data modelling is seen as a niche specialty, and the idea of designing an application around a relational data model would be seen as quite bizarre.
That loop for multiple insert statements is nasty. I would rather create a query on the fly using querynew function or send this to be done on the db side using a procedure and then run only one insert statement. I believe would make a big performance difference specially if there are too many records to insert. Just my 2 cents.
i would love to see the code for that.
That would depend on the DB type. For mysql could be something like:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
So, as you can see you don't need multiple insert statements to add multiple rows.
That has to be the sexy thing I've seen. I wish I could do that in MSSQL.
Yeah, but even being able to run multiple "inserts" with a single INSERT statement, you would still need a loop somewhere to build the SQL statement. I am not sure that it would be any different?
I am pretty sure with you take a look at mssql documentation (books online) you will find something.
One thing is loop on your code to build the statement. This may take miliseconds. And another is make many statement to connect to a DB and execute the statement. That, depending on the amount of records may take hours.
Agreed on the multiple connections, but I think (and I may be wrong here) , I am making only one connection to the database. My SQL statement performs multiple inserts, but it is all passed to the SQL Server in one connection. If I had multiple CFQuery tags, yes, that would be lame, but I only use the on CFQuery tag = one connection = super fast?
Two quick points: new list looping in CF8 and single INSERT syntax for MS SQL using Select. Note that I think on the MSSQL server side, the insert will compile the same whether you do it this way or as multiple INSERTs. The same is true for MySQL with the multiple set syntax. I wouldn't expect a speed increase but it's fun just the same.
<cfset lastitem = listlast(FORM.lst_id) />
<cfset sort = 0 />
INSERT INTO practice_practice_jn
<cfloop list = "#FORM.lst_id#" index="id" >
<cfset sort = incrementValue(sort) />
<cfqueryparam value="#sort#" /> as sort,
<cfqueryparam .... /> as parent_id,
<cfqueryparam value="#id#" /> as child_id
<cfif id neq lastitem> UNION </cfif>
Correct, the performance there shouldn't be a problem, since it is one cfquery been executed. I wonder what difference could be on mysql. Will test later and let you know.
On my tests when using this type of single statement (using select and union) on mssql the performance is 3x slower and there are a limitation on 1000 records or so. After that I got a DB error.
For anyone that wants to play:
I also tested using a stored procedure.
The results are (for 1000 records):
1 - Using multiple inserts in one cfquery: 641ms
2 - Using insert with select and union: 1469ms
3 - Using stored procedure: 312ms
I'm runing CFMX7,mssql 2k,IIS on win2k3.
The code is available here:
I wonder if it would make a difference using UNION ALL vs. UNION. When you use just UNION, the SQL server actually makes sure to remove duplicates (which has overhead). UNION ALL will tell the server not to bother. This will probably speed things up, but still probably not as fast as the MySQL single INSERT.
Keep in mind that unless the performance of your existing query is actually causing problems, all of these attempts at optimization are premature. I'd argue that having a simple loop and a straightforward (and probably reusable) insert statement is a lot easier to understand and maintain than the alternatives being discussed.
I'm, with Brian Rinaldi's help, getting this composition thing down. I think you have to go through his first example a couple of times and closely examine the index.cfm (controller) and the service cfc to get the picture.
His personalized help is also fantastic, but he has no Ask Brian button.
Sorry, this is a bit tangential to the core discussion, which is valuable and valid, but I feel the need to point out to a few of the earlier posters that the implicit JOIN is no longer supported in MSSQL 2007. Basically, all be aware that, as MS aims to discontinue support for MSSQL 2000 in spring of '08, this will matter to your larger clients and they will change. At that point, this will fail:
SELECT c.id, c.name
FROM parent p, child c
WHERE c.parentID = p.id
AND p.id = <cfqueryparam ...
ORDER BY c.sortOrder, c.name
Following Ben's model, however, will continue to work:
SELECT c.id, c.name
FROM parent p INNER JOIN
child c ON p.id = c.parentID
WHERE p.id = <cfqueryparam ...
ORDER BY c.sortOrder, c.name
Good tip. I did not know that that was not going to be supported. Personally, I think people should only use implicit JOINS when explicit ones are NOT available (such as with ColdFusion query of queries). But, now, it's no longer a matter of opinion (since you say that it will no longer be supported).
Actually, now that I'm home from work and have a moment to review, two mistakes on my part.
1) the new version is MSSQL 2005, not MSSQL 2007, but MS is still planning to cease support for 2000 next spring
2) the implicit version of INNER JOIN will continue to work in the new version, but not OUTER JOIN (WHERE x *= y)
Still and all, I have been using the explicit JOIN for over 10 years and it is the more standard SQL. Generally I've only used implicit with old Oracle drivers that don't support the actual JOIN keyword (go, craptastic Oracle native drivers!).
Forta had the first warning of this that I saw, back in Jan 2006:
Thanks for the update. Still, I would hope people use explicit joins as I think it more clearly states the intent of BOTH the SQL statement and of the Programmers desired outcome.