CFLock And Negative Outcomes - Think It Through
Posted January 14, 2008 at 10:00 AM by Ben Nadel
I was just reading over on Scott Bennett's block about CFLocking, and I wanted to touch on the issue a little bit myself. For a long time, I didn't use CFLock correctly. I had just learned about it, read the rules, and tried to apply them without ever thinking about what I was actually doing. Since I didn't know how important CFLock was, and how useful locking in an application can be, immediately I felt afraid and start using CFLock all over the place.
Scott Bennett lists these as the conditions of when you want to apply locking:
- Shared scope variables (variables in the application, server, and session* scopes
- Verity collections
- Files that are manipulated via CFFile or the ColdFusion File functions
These are good examples, but really, they can be condensed down into one category:
- Shared resources
This is what I was told when I first read about locking. The problem I had, was that no one qualified this. Scott Bennett does a good job of explaining CFLock and what the different types mean and offers some advice on how to reduce the number of locks that you use; but, I would like to take this one step further and really qualify WHEN a CFLock is necessary.
The way I see it, TWO conditions must be met in order to require the use of CFLock:
- A shared resource is being accessed or updated.
- There must be the possibility of a race condition resulting in a NEGATIVE outcome.
Point #2 is really what I think most discussions of CFLock lack and due to this lacking, many people (myself included) never really know or fully understood when to apply locking within an application. Point #2 is almost more important that point #1. A shared resource is only a potential place for locking, but a harmful race condition is really what determines if the locking is required.
Take, for example, the application DSN. Most of us persist the DSN data in some way within the APPLICATION scope, whether this is a direct storage (ex. APPLICATION.DSN) or storage via some cached singleton object (ex. APPLICATION.ServiceFactory.GetDSN()). DSN is persisted in a shared scope that is available to every user, and, if all you used for your locking was point #1 (shared resource), then you might assume that locking should be done around access to this variable.
But take a step back and think about the bigger picture (point #2). Is there a race condition? Absolutely! A site with thousands of concurrent users will most definitely run concurrent queries that require a reference to the APPLICATION.DSN object. But, and this is a HUGE BUTT (he he he he), does this race condition result in a negative outcome? Absolutely not! ColdFusion knows how to handle concurrent look-ups on Structs and will deal with synchronizing requests for you. Therefore, two users simultaneously trying to access APPLICATION.DSN will not result in an error, but rather, two successful reads.
Of course, that's just accessing; what if one request tried to access the APPLICATION.DSN object while another request tried to modify it? Here's where you really need to think about what your actual use-cases are. Is there ever going to be a time where:
- You change your APPLICATION.DSN in the middle of a page request.
- Locking would even prevent errors.
I am gonna take a strong stance and say that it is insanely rare that you are building an application that switches DSN information in the middle of page run. Therefore, that removes that use case (and therefore that negative outcome). Furthermore, even if tried to copy the DSN data into the REQUEST scope to prevent shared access, would that even stop errors from occurring? At that point, you might end up with queries that are using the wrong DSN data because they AREN'T referencing a shared resource.
Personally, the idea of switching DSN is ludicrous to me. But, application reinitialization is something that might occur from time to time. I'm talking about a situation where the entire APPLICATION scope is wiped out and repopulated. In this case, you might have a query that fails. But, here's where you really need to become the weigher of pros and cons; is it unacceptable for something to fail during reinitialization? This is something for you to decide for yourself, but I am gonna say, it is acceptable. On the very rare cases that I do have to reinitialize an application, I am fully comfortable with the thought that one or two people's pages might fail and they get our friendly, site-wide error handling page. To me, this extremely rare occurrence does NOT outweigh the cost of using CFLock around that particular resource.
This is just one example, but I want to drive home the point that locking should not be done blindly. Locking requires you to be a software engineer, not just a programmer; when going to apply CFLocks, you really have to think about what you are doing. Think about the consequences. Analyze whether or not race conditions are going to be harmful. Then, if you are convinced that enough harm is going to be done, apply the CFLock.
| || || |
| || |
| || || |
What Other People Are Searching For
I read and bookmarked Scott's excellent article.
I have read many, many times, the livedoc on CFLock, and it has never made much sense.
With CF, and shared resources, I have a question given a particular scenario.
Let's suppose we have a <cffunction>, we'll call UpdateEmail.
-UpdateEmail reads a user's sessionID
-You lock the sessionID when you read it, and set it to a local variable. -You set it to a local variable so you're not locking an entire query that reads a session variable.
Is there a chance a race condition could exist where the variable you've set is read as someone else's userID in the query?
Here's the pseudo-code:
var uID = "";
uID = session.usersID
set email = 'email@example.com'
email.id = UID
Think about it from this point of view - what is the usersID stored in the session? My guess is that it is a unique ID that is set:
1. When the session is created.
2. A second time when the user logs in.
3. A possible third time when the user logs out.
Ok, now let's look at where the race condition occures. It is the line in the query:
WHERE email.id = UID <!--- session.usersID. --->
Now, depending on what's in the session, that ID will only ever be:
1. Zero (assuming this represents a not-logged-in user).
2. The user's valid ID.
Ok, so let's say that it is zero. What happens in the query? The query will run and no records will be affected. No worries.
Let's say that it is a valid ID. In that case, the query runs as expected and updates the target record.
So really, there's no possible negative outcome to the race condition here.
To make the function even more "safe", what you should really be going is passing in both the user ID and the email to the function:
<cfset UpdateEmail( SESSION.UsersID, "firstname.lastname@example.org" ) />
This way, the function doesn't have to worry about where anything is being read from. This is what I would consider a "best practice", even regardless of the locking issue.
So, to sum up, in your scenario, there does not seem to be a negative outcome to any possible race condition, and therefore, I would consider the locking to not be necessary.
The only issue that I can think of is if someone tries to update their email address then immediately log out. In that instance, it is possible that what was originally a "Valid" UpdateEmail() call will no longer work (since the user ID will be reset).
However, I don't consider this a negative outcome as the person intended to log out. And, this scenario would not even be an issue if the user ID was passed into the function (best practice).
What would happen if:
-Gary tries to update his email at 1:00pm
-Sarah tries to update her email at 1:00pm.
Is there a chance Sarah's ID could be read as Gary's ID if they both request the same function at the same time and a session variable is read?
Nope. Gary and Sarah will both have different SESSION objects.
I think there was a bug in early versions of ColdFusion that *may* have had some problems. But as long as you are using a relatively recent version of CF, there is no way that those to Sessions would be confused.
What's an example where a session scope needs to be locked?
I can't think of place that I really do locking off-hand in a session. I used to do it during session initialization, but OnSessionStart() pretty much removed the need for that. Sometimes, I do it during login, but again, I feel like that is a race condition that doesn't have a negative outcome, unless of course, they tried to login with two different sets of credentials at the exact same time... but this seems like a fantasy scenario.
I might be a bit more relaxed about locking than most people, but I don't find all that much use for it when it comes to shared scopes like SESSION and APPLICATION.
This may sound a bit fascist ... but I'm of the opinion that you should ALWAYS lock Session-scope variables. Why? Many people think that if their site doesn't use frames or iFrames, then they don't need to lock the Session. This is completely untrue. I browse with lots of tabs open, many times I'll even have multiple tabs on the same site.
Another thing I'm seeing more of are people messing around with Ajax frameworks (including the new CF8 stuff) and not locking the Session scope. In old-school parlance, this is double-plus ungood.
I've even had some people argue that the automagic locking features in CF7 and CF8 mean they don't have to do manual locks. Don't believe it -- it's a damnable lie.
Easy rules of thumb:
1. Any time you have any shared scope or object on the left side of an equals sign (an assignment), it should be in a lock.
2. If the shared scope/object might ever go away or change (file, in-Session query, etc) then you even have to lock reads from that scope/object.
I also find that remembering to put in locks, and restructuring code to work with the locks, actually leads to cleaner, more logical code.
Great Post! I agree with you completely on this. I try to use CFlock as little as possible in my applications, and don't use it unless not using it will cause some kind of problem. In fact, that was the primary issue I was initially trying to address in my article, but while writing it I got so focused on trying to explain how CFLock works, I think I wavered a little from my main point.
As to your example, one thing I never really understood is why people even bother putting their data source names into the Application scope in the first place. I never ever put datasource names in the application scope because it just doesn't make any sense. Where is the benefit? especially when most of the time the code that sets the DSN is hard coded into your OnApplicationStart function in your application.cfc anyways. Why bother with locking the application scope and reading the DSN to the request scope at the beginning of each request?
I just set up my application.cfc so that in the OnRequestStart function, there is a cfset like this:
<cfset request.DSN = "MyDSN">
Then if I want to change my DSN I just change it there and after I hit save, all the subsequent requests will use the new DSN, and I don't even have to reinitialize my application.
to the quote in your previous comment:
"I might be a bit more relaxed about locking than most people, but I don't find all that much use for it when it comes to shared scopes like SESSION and APPLICATION."
There are definitely applications that I have worked on where locking the session/application scope is not really neccessary. But there are also situations where it is. If you are writing an application that is going to have relatively few users, like a personal blog for example. I really wouldn't worry about it. But If I am working on a high profile ecommerce site, where most likely there will be several people in my checkout proccess at any given time. Then while I am reinitializing my application, these users get my friendly error message right after they click the "place order" button because I didn't use CFlock to protect my application scope, I just lost a bunch of money.
Also, sorry to spam your blog with comments, I wanted to clarify this point you made:
"Scott Bennett lists these as the conditions of when you want to apply locking:
* Shared scope variables (variables in the application, server, and session* scopes
* Verity collections
* Files that are manipulated via CFFile or the ColdFusion File functions
These are good examples, but really, they can be condensed down into one category:
* Shared resources"
It looks like you misread what I wrote a little, in my opening paragraph I wrote:
"It is important to use CFLock to protect the integrity of data that is shared across requests in your application. "
Then I gave some examples of some typical shared resources by breaking out those bullet points. I was not listing them as a conditions where you want to use a lock. Later in my article I write the conditions where you would use a lock in this line:
"Basically, anytime you are in a situation where you are developing some code that could cause problems if two requests tried to do the same thing at the same time, you want to slap a CFLock around it."
I admit that sentence is not very clear, but I what I meant by that what you said much more clearly in this portion of your article:
"The way I see it, TWO conditions must be met in order to require the use of CFLock:
1. A shared resource is being accessed or updated.
2. There must be the possibility of a race condition resulting in a NEGATIVE outcome."
Either way between the two posts, I think we've gotten the point across.
Typically, I'll read Application and Session information into the Request scope at start of a request. I'll use a read only lock for this and then I can forget about locks on the rest of my page since I'm no longer accessing shared memory.
Can you maybe touch a bit more on the automagic locking feature in CFMX7 / CF8? I am not sure what you are referring to? Is this what I am talking about when I say concurrent requests to a Struct key won't produce an error?
I feel that you are erring to much on the locking side. I think it is comments like that that strike the fear into people's heart that I was talking about in my post. I am afraid that someone is going to read that and think "Oh my god! My application is going to fail unless I go back and put locks on everything"... and they are not going to even look up and realized that nothing in their application has even broken yet; they are going to feel compelled to fix problems they might not even have just because they don't understand locking.
I think there is a happy middle ground somewhere, and that is what I am trying to get at. I feel like the happy middle ground is a situation-wise, contextual scenario. Locking is an art and its takes that kind of thinking. You paint with such broad stokes will only confuse. You say any shared scope, but do you really mean that? How do you qualify "shared"? COOKIE can be accessed by multiple pages simultaneously, should this be locked? VARIABLES can be accessed simultaneously by multiple CFThread instances, should all modification to the VARIABLES scope be locked? What about the REQUEST scope? Is that considered shared? It can be accessed globally by any part of the application on a given page request... that seems shared to me.
I'm not saying that those statements are meant to be wrong or right? I am only trying to point out that locking requires thinking and absolute rules are the antithesis of thinking.
I think Application re-initialization is a difficult topic. It's one of those things that you almost thing you shouldn't have to plan for, but in reality, you do. It's like driving a car and being ready for it to suddenly explode! It's a use-case that really doesn't make sense from an application stand-point, but it does make sense from a reality stand-point.
And, to compound that, eCommerce is a beast unto itself because money is involved. Things, by necessity, need to err on the site of caution when that is the case. But again, this is a situation-specific thing; this is not a broad idea.
Also, I think you did a great job of explaining how CFLock works, so I hope none of this feels like an attack. I just wanted to take your explanation and expand on it a bit more.
Oops, you are right. I did misread your post. My apologies :) Monday morning :) But after you pointed that out, I can see that out ideas are nicely aligned.
Be careful about that. Remember that structures, queries, XML objects (I think), and other complex objects are passed by reference, not by value. Therefore, just by copying something from the APPLICATION scope to the REQUEST scope does not mean that it is no longer shared throughout the application; it just means you copied a reference to it into another scope.
"Can you maybe touch a bit more on the automagic locking feature in CFMX7 / CF8?"
Sorry, I was typing up two things at once and meant to say CF4 and CF6. These versions of CF had a "Locking" section in the CF Administrator where you could specify if you wanted to single-thread your sessions and have CF handle some (but certainly not all) of your locking for you. Many people checked the box and assumed that meant that they never had to use manual locks again.
"I feel that you are erring to much on the locking side. I think it is comments like that that strike the fear into people's heart that I was talking about in my post."
Actually, that's exactly what I was going for. ;-)
"I am afraid that someone is going to read that and think "Oh my god! My application is going to fail unless I go back and put locks on everything"..."
And ... it will, if it ever leaves your test server and gets used in the real world. Locking problems, especially after the move to Java from the old C++ codebase, only show up when your app starts to get heavily used. They are wicked hard to even diagnose, much less debug. You'll get strange and random errors and variable mutation and you won't know why. You'll pull out your hair and think you're going insane. Or, you could stop and think (as you advocate) and get the locks in the right place to begin with and not have to worry about it.
"You say any shared scope, but do you really mean that? How do you qualify "shared"? COOKIE can be accessed by multiple pages simultaneously, should this be locked?"
Yes, and yes. Any shared scope or object.
"VARIABLES can be accessed simultaneously by multiple CFThread instances, should all modification to the VARIABLES scope be locked?"
I would hope that if you're mucking around with threading that you have locking very secure under your belt. If not ... I hope you're not on my shared server.
"I am only trying to point out that locking requires thinking and absolute rules are the antithesis of thinking."
I agree, hence why I said "Rules of thumb". ;-)
Now when you say "yes and yes" to all shared scopes, what are you really trying to say :)
Just for the record, I wouldn't reference the VARIABLES scope from within the CFThread tag (in case we ever end of up on the same shared server) - I was just trying to get readers to think about this stuff.
I think locking is almost like a Catch-22 situation. You can't just get a good handle on it right off the bat. I think you kind of need to go too far with it before you get a better understanding, and then pull back a bit. At least, this is how I learned it. I totally overcompensated by locking everything without ever thinking about it whatsoever. Then, I realized I had no idea what I was doing, stopped, really evaluated what was going on, and then pulled back a bit.
Now, maybe I have pulled back too much as a response to such huge overcompensation and I need to stop an re-evaluate that as well.
I think you have obviously worked on bigger, more complex applications than anything that I have worked on, so naturally do and should have a different perspective. But, at the same time, realize that a lot of newer programmers are not going to be in situations like that.
I am not sure where I am going with this, exactly. I want to say that we are both right, in certain situations, but, just as with CFLock, cannot be right for every situation.
Your locking strategy really should be dictated by the needs of the application. I would not suggest that you just arbitrarily lock everything like you seem to be suggesting "just to be on the safe side", this is exactly the kind of thinking that causes a lot of developers whose sites suddenly are getting a lot more traffic. A lot of it really comes down to what kind of traffic you are expecting on your site and how sensitive the data is that you need to protect.
I have worked on sites that get maybe 20 page views on a good day and display someones photo's or professional portfolio, or what not. And I have worked on lager e-commerce sites that handle over half a million dollars in transactions on a busy day. I wouldn't treat them the same. The more traffic your site handles, the more you are going to be aware of whether your locking strategy is working well or not. over locking a high traffic site will slow it down, under locking will cause data to be corrupted. The exact same application on a low traffic evironment may not show any symptoms one way or the other no matter how many cflocks you have.
High traffic or low traffic, the question you need to ask yourself when deciding to use a lock or not is:
If this shared data resource I am working with gets corupted, what kind of impact will that have?
If it is going to be something that could cost you time or money or have some other negative impact on your business, then lock it.
My thoughts about putting the DSN (or any other info that basically never changes) into the Application scope vs putting it in the request scope is that it reduces the number of variables that need to be created and destroyed thus reducing the amount of garbage collection the JVM has to do.
Since the variables will likely never change locking should be irrelevant. Considering the number of Black Magic articles out there on tuning garbage collection I've always felt that reducing the work can't hurt.
I guess if you are not locking the application scope when you read the application.dsn variable and you are not re-scoping the variable to the request scope with every page request anyways, then maybe your garbage collector is doing a little less work by using application, but the impact on your garbage collector for such a small variable as a data source name is pretty insignificant, and won't have any impact on your site worth measuring.
Most people who use application.dsn end up re-scoping the variable to the request scope or local variable scope at the beginning of each request anyways, and that is not a pass by reference but a pass by value, so your garbage collector still has to clean up request.dsn at the end of the request, and the processing to lock the application scope, copy the value to the new scope, and unlock the application scope, is more work for the server than just setting the request variable to a string value at the beginning of the request.
Not to mention it saves me a little typing, which makes my fingers happy.
The memory used really depends on what's being stored. If you're using
<cfset request.dsn = "foo">
or if you're using
<cfset application.dsn = "foo">
They actually use nearly the same amount of memory in the JVM no matter the number of requests you see.
The key "DSN" is a literal string as far as the CF compiler is concerned, and the value "foo" is also, so both those strings will be intern()'d by the compiler and a single java.lang.String instance will be shared between all requests. There is of course still memory allocated by the coldfusion.runtime.RequestScope when it adds to the HashTable, but this is pretty minimal.
However, once we start using things like CFCs for the DSN we throw all that off since the component takes up tons more space, and creating a new one in the request every time would be quite a lot more expensive than placing it in the app scope... especially if the data is immutable.
I'm not sure what you mean by "using things like CFCs for the DSN "?
How could you use a CFC for a DSN, and when would you do that?
Woops, I meant DS. Like we use a Datasource bean in ColdSpring which has the DSN and default caching values.
In the sense that the memory used between the below snippets is radically different.
<cfset request.ds = createObject("component","Datasource").init("name",createTimeSpan(0,0,30,0))>
<cfif not structKeyExists(application,"ds")>
<cflock scope="application" timeout="10">
<cfif not structKeyExists(application,"ds")>
<cfset application.ds = createObject("component","Datasource").init("name",createTimeSpan(0,0,30,0))>
While doing the below is probably going to take up very little space and not cause a noticeable impact on performance, even under severe load situations.
<cfset request.dsn = "name">
<cfset request.dsnCache = createTimeSpan(0,0,30,0)>
<cfset application.dsn = "name">
<cfset application.dsnCache = createTimeSpan(0,0,30,0)>
(And if we really wanted to cheat we could use dsnCache = "0.0208333333333" and get the intern()'ed cache string and never have to call createTimeSpan(), but that's just silly! :P)
Not to get too off topic here, but I have found that you can't always substitute a hard coded value instead of CreateTimeSpan(). When experimenting with that and THIS.SessionTimeOut, I discovered that CreateTimeSpan() returned a double, which was not properly casted when using a hard coded value:
When you questioned the usefulness of putting the dsn in the application scope you asked:
"Why bother with locking the application scope and reading the DSN to the request scope at the beginning of each request?"
At the risk of sounding ignorant, why would this be even necessary? I agree with Ben in that changing the dsn in the middle of a page request is an extreme edge case (I've never written code that does this). So why not just access the dsn variable directly from the application scope? (cfquery datasource="#Application.dsn#"....)
@Scott and Elliot
I do agree that the amount of memory used by a single variable is pretty insignificant, but I think they add up if there is enough of them. I debated on the significance of what this would mean myself so I just tried it one day when we were slow.
The company I work for has an in-house built CMS and Store and the two combined have nearly 3 dozen 'settings' from the DSN to where to send error emails, Database type, slash type (linux vs windows), CF version, languages etc. Plus there are two functions in there that are instantiated into the application scope. These were all request variables when I started working here.
I don't re-scope these ones, that was the significance of the mentioning that it was for variables that will likely never change.
CF8 with all it's fancy new monitoring capabilities gave me a chance to actually take a look at the differences made if I switched them to application variables and it did seem to make a difference. Memory usage was more 'stable' and benchmarking it on our development server it did improve the performance of the same code consistently, a small amount albeit, about 2 -3 hits per second under high load.
I couldn't find any info out there myself so I may be missing something here that would account for the difference, any idea's?
We covered that scenario in the comment posted by Mark B on Jan 14, 2008 at 4:41 PM and my response in the comment posted by me on Jan 14, 2008 at 6:14 PM.
I agree if you are not going to bother locking or re-scoping your application.DSN, it would be fine to do it that way, but the difference between that and having one request variable per request is negligent. And most of the time where I have gone in to edit some code someone else has written, I typically see people that use application.DSN doing the lock and re-scope thing at the beginning of each request, and I believe most developers think it is "Best Practice" to use a lock any and every time an application variable is used in their code.
The site I work on uses my method of just using the request scope. the site is load balanced across 4 servers and in my load tests I have hammered my site with traffic simulating 75 requests per second and not had any problems with memory. Running out of memory is rarely a bottleneck in most applications I build.
First, let me say that I found both your and Ben's blog posts very well-written and informative.
I did notice Mark B's comment and your response on locking and rescoping application.dsn, but even there you just note the fact that most people do it and then later, in response to my comment, you say it's considered "best practice." I guess I'm not clear as to WHY it's best practice to lock and re-scope an application variable that never changes. Is it simply to just guard against things like unexpected server restarts during the request??
If you have an unexpected server restart, it ain't gonna matter where your data is stored :) That site is going down!
My response got too long and is a little off topic for this post so I posted it here:
Yeah -- I guess cflocking would be the LEAST of my worries, then huh?
Thanks -- (I posted a comment on your blog)
Well, I didn't know about the pass by reference on complex data types.
Do you know where this is documented so I could read up on it?
I don't know where the documentation is on this, but basically all "simple" values AND arrays are passed by value. This includes date/time stamps, strings, numbers, booleans. What this means is that when you copy one variable into another variable, it creates a completely separate copy.
Complex values like scopes, structs, Java objects, ColdFusion components are all passed by reference. This means that when you copy one variable into another, both variables are actually pointing to the SAME value and updates via one variable will be reflected in reads through the other variable.
When researching locks, I've come across the technique of locking the shared variable while assigning it to a local variable and then simply using the local variable to limit the amount of code being locked. However, I have a question about how this affects complex variables.
If I use this method with a struct, I'd have to Duplicate() the struct so as not to simply create a reference to the shared variable. In doing this, I end up creating a local copy of the shared struct. If I've put the struct in the Application scope to prevent having to create multiple copies (e.g., originally putting in the Request scope), am I even gaining that benefit when I use the Duplicate() function to copy it to the local scope? It seems to be I'd still end up with multiple copies of the struct, so I don't see the benefit of using the Application scope if I use this method.
Also, how does this affect objects, such as CFCs? If I put a component in the Application scope (which it seems a lot of people do), I can't Duplicate() the object. So when does locking apply? I can do an exclusive lock when I create it. Do I need to do a read-only lock every time I access it to call a method? From most of the comments here, it sounds like no. But if I have an exclusive lock when creating the object and no lock when calling its methods, can that lead to corruption of my object? And do I need to look more specifically at what my object's method is doing? For example, if the method is altering the object's values, then should I lock it?
If you use the Duplicate() method to copy a struct that is in your application scope into the request scope or the local variable scope, then the server will indeed have to allocate memory to hold that structure for the life of each request that uses it. The benefit you would get is not to reduce the amount of memory the server uses, but to reduce the amount of processing that is required by each request. You want to save a structure into the application scope when:
1) the process to build the structure from scratch takes a significant amount of time
2) the data in the structure will not change between requests.
If you know that if a race condition on this structure would NEVER cause a problem, then I would not bother locking it or using Duplicate(), I would just access the structure from the application scope and save the memory resources on the machine. If the structure gets update somewhat frequently during the life of the application, and a Race condition could possibly be harmful, then lock it and duplicate it with each request that needs it.
When putting a CFC instance into the application scope, I would not bother with locking because it is not source of shared data, it is an object, and race conditions are not going to occur. Within the CFC itself there may be some functions that access shared data and you may need a cflock within the function, but the CFC reference in the application scope should not.
When it comes to locking access to CFC methods, I think it really depends on what is going on inside the body of the method. If the CFC is performing actions that are not safe for concurrent users, then you can always implement locking within the CFC method itself (something like:)
...... [ your code ]
This way, you can make calls to the CFC method without having to lock it since it takes care of concurrency from within the method itself.
Thank you both for your thoughts, particularly on locking the object methods instead of the objects themselves.
"2. There must be the possibility of a race condition resulting in a NEGATIVE outcome."
There are some very basic examples like DSN where, ya, it seems pretty safe you're not going to have a race condition. The issue I have is that you're only looking at the application at the time you're writing your code. What happens 6 months or 3 years later when someone else adds something? Sure, a variable tracking a datasource is really unlikely to be changing during the lifetime of that application variable. Datasource and a few others seem the exception to the rule, IMHO.
Again, the problem I have with #2 is you're saying that for code yet not written or conceived that you are 100% that it will not cause a race condition that results in a NEGATIVE outcome.
Call it paranoid coding but I'm a bit sick and tired of having to clean up messes from developers who 5 years ago made some pretty ambitious assumptions with their code. Think of it this way, how much does implementing this cost right now for your application? So you have another line or 3 of code and make the CPU work a tad more. CPU's and memory are dirt cheap compared to developers time. And the sort of errors that could occur if you're wrong about #2 - or some newly added code at a later date makes you wrong about #2 - could easily chew up a few days, a week or even more of a developer's time. That time alone iss worth what? $400? $800? $2400? And that doesn't take into account the value of the bug itself. Is it messing up orders for a product? Causing a healthcare application to record the bloodtype? What's the value in that?
It's great to know but, and maybe I'm wrong, to me this sort of thing is the sort of issue that separates the code geeks from the software engineers. Maybe I'm being a pompous curmudgeon; just my two-bits worth at this point.
I think what we have here is the difference between "Computer Science - CS" and "Computer Engineering - CE". By that, I related back to my school days where I had to listen to conflicting opinions between the theorists (CS) and the realists (CE). I remember listening to things like:
CS: You can't use that algorithm, it has a O( n! ) complexity! It will take far too long to execute!
CE: Yeah, true, but in our application, "n" will never be bigger than 2 - the time to execute will be inconsequential.
I give the above example to illustrate that what is theoretically correct is not always practical.
And, I am not trying to say that locking is impractical; as you way, it's a few extra lines of code. What I mean to say is that we can't talk theory all the time. We need to live in a reality of actual use-cases. As such, I would be very interested in seeing places where no locking was implemented and then it turned out years later that locking was very important. My guess is that in 99% of the cases, a horrible decision was made in the first case.
In that case, what is the culprit? Not locking? Or poor programming?
Which begs the question, should be implement rules across the board which are not necessary just to protect people from poor programmers?
I would rather look at actual, concrete examples of places this went wrong than talk theory. The theory will never get us any where. But, if you have an example of programming we can look at, I think that would be a lot of fun and more useful in terms of creating better programmers than implementing rules without contemplating their use.
I agree with Ben that some examples would be helpful in validating your point. There have been several occasions where a new client comes to me because their business began to boom and the increased traffic on their site caused everything to come to a screeching halt because the original programmer over-used cflock and essentially single-threaded 50-90% of the application. But I have not encountered too many problems due to "under-locking", especially after they fixed the memory corruption issues after CFMX
Ben, I was actually thinking the same. Only that too me to say it's not need is the academic side of things. As you pointed out, it's common for the application scope to have to be wiped out and reset. You say that the cost of the user's errors, pain, and reduced application reliability does not out weigh the cost of the locking. But what does the locking cost? How much of a cost is there to duplicating the application scope to the variables or request scope when a request is made? .5ms? 1ms? 5ms? So a 3 lines of code and and a fraction of a second of time compared to a couple users every month or so that lose work or time because of errors in the application? Is there a cost to copying the application scope to the variables scope that I'm missing? For example, what does it take to run a piece of example code with it? Without it? What does that time difference mean for an application with 1,000 concurrent users? 10,000 concurrent users?
If you want, go ahead and copy the cached variables to your REQUEST scope. I have done that before to be safe. The problem is taking the locking advice without thinking about it. If you look at the recommendations for when to lock, it says to use it anytime you get or set information from or to a shared memory scope.
Well, here's the kicker - REQUEST and VARIABLES are both shared scopes. REQUEST has always been a globally accessible scope. VARIABLES, now that CFThread was introduced is also a shared scope (many threads can access the VARIABLES scope of the current request in an asynchronous manner).
So, if you took the CFLock "Best practices" as face value, you'd have to lock ALL of your variable usage!!
Of course, we don't do that because it would be silly. We calculate in our heads the cost-to-risk ratio of not locking a variables-scope varible because we estimate that the chance of someone throwing a CFThread in this request that references that variable in an asynchronous way in the future in such a way to corrupt the read/write paths.... it's just not gonna happen.
I guess the moral of what I am saying is to evaluate your situation. Don't just blindly implement things that which has been preached as best practices.
Sooooo lost!!! I have been working with CF for years and years and nobody has been able to explain to me, so that I understand, how CFLOCK works. It doesn't seem like I am alone either. If so many people are confused by the tag, maybe Adobe should rethink the tag.
It also seems like vars should lock automatically. When would anybody ever want their to be session swapping or whatever CFLOCK protects against?
Still CFLOCK lost...
I'm switching from traditional application.cfm to application.cfc keeping the rest of the application modules in ".cfm"s. I have OnApplicationStart, OnRequestStart, OnRequest and OnApplicationEnd events on Application.cfc.
I have set the application variables under OnApplicationStart event and then referencing the variables under OnRequestStart and OnRequest events.
For example, I have in OnApplicationStart event as:
Then under OnRequestStart event I have:
<cfargument name="TargetPage" type="string" required="yes" />
<cfif StructKeyExists( URL, "reset" )>
<cflock type="exclusive" scope="Application" throwontimeout="true" timeout="45">
<cfset OnApplicationStart() />
<cfreturn true /> <!--- Return out. --->
Then under OnRequest I have referenced the same variables as I did under OnApplicationStart event. i.e.
<cflock type="readonly" scope="Application" throwontimeout="true" timeout="45">
Then I have output all the application vars and other variables in a test module called test.cfm. However, test.cfm is displaying a timeout error on cflock on OnRequestStart event. If I take the cflock out from one or all events, I'm getting Application.EOL (in this example) is not defined in Application.
What I'm doing wrong? Any suggestion will be appreciated. Thanks.
you said you have this in OnApplicationStart:
My best guess (without seeing your actual code) would be that is the problem since it is not setting Application.EOL to something. it should probably be like:
also (although it looks like your error is not really directly related to your cflock tag) I would highly recommend you read through this whole article and it's comments, and make sure that you even need cflock around this application variable. If a race condition happens to occur on that variable, would it result in a negative outcome? If not, then you should remove the cflock code altogether.
Thanks for quick response.
My bad. I forgot to mention that I already have that value declared under OnApplicationStart event as:
I agree with you and read the thread. Based on that I took the CFLOCK out. But as I said, if I take it out, next error that I get is "EOL is undefined in Application".
All application variables are defined under OnApplicationStart event and under OnRequest event I'm referencing them as I mentioned in my original post.
Assuming that after you changed it, you hit a page on the app with the "url.reset" variable, or restarted CF to that the onapplicationstart function got run again, then I don't see anything wrong based on what you have written. It's kind of difficult to debug this kind of stuff without seeing the actual application.cfc file. Since this is not really related to this blog entry we could take it offline and you could just email me your application.cfc if you want and I can take a look for you real quick. scott (at) coldfusionguy (dot) com.
I realise this is an old discussion but I just found it and was wondering if someone (guru Ben) could advise on this:
What is better:
I always use named locks, just habit and not knowing any reason not to. Sample code below.
1. an "exclusive" lock around an if/else where some code does not require the exclusive lock. I have more complex code in similar locks in my application.
2. an "exclusive" and a "readonly" lock around 2 separate if statements.
3. Who cares, it is not significant? go watch "Over 40 vol.10"
<cflock name="session_active" type="exclusive" throwontimeout="no" timeout="5">
<cfset REQUEST.active = session.active>
<cfset REQUEST.active = structNew()>
<cfset SESSION.active = structNew()>
<cflock name="session_active" type="READONLY" throwontimeout="no" timeout="5">
<cfset REQUEST.active = session.active>
<cflock name="session_active" type="EXCLUSIVE" throwontimeout="no" timeout="5">
<cfif NOT isdefined("session.active")>
<cfset REQUEST.active = structNew()>
<cfset SESSION.active = structNew()>
I would certainly not recommend using named locks to lock session variables. A named lock will lock that section of code for all users, not just the single users session. That kind of locking strategy, particularly the exclusive lock will only serve to slow down your application during heavy traffic times.
And as we discussed earlier in this thread, always keep the golden locking rule in mind.... unless not locking could result in a race condition that could potentially produce a negative outcome, then don't lock at all.
Oh dear, I think I have some rework to do. For some reason I though named locks would be better as it only locked the specific variable being changed rather than the entire scope.
A few years ago I locked nothing, then I read something about the importance of locks (obviously missed the point) and started locking every SESSION variable read or write.
Thanks for the advice.
This is what I think and I hope it's right:
Use a dynamic name for an exclusive lock when you are protecting a resource (say you are dishing out a fixed number of prizes) and the prizes depend on a particular group the user is in. The dynamic name of the lock would include the groupID (example lock name: "bigPrizeGiveAway#local.userGroup#"). A user would only wait on locks for those in the same group.
Use a dynamic name for an exclusive lock when multiple (accidental simultaneous) requests/posts for one user can cause a problem (very common). The dynamic name of the lock could include the userCustomerID (example lock name: "updateFinalOrders#userCustomerID#"). A user would only wait for their own locks.
Use the session scope readonly lock for reading a sequence of session variables that could be changed or wiped-out by say another thread that deletes them (perhaps even a logout thread). Use the session scope exclusive lock for modifying session variables. Session locks only directly affect the user who has that session. Indirectly, if there are a great many locks getting hit by everybody, processing overhead could add up.
Use a static name for a lock if you want only one user through at a time.
One should lock the smallest section of code necessary.
According to a blog I read (sorry I don't have a link), the onRequestStart() method in Application.cfc is not thread safe for even the Request and Variables scopes, so beware. (After that, for the rest of the request, those scopes are safe.)
I'm ignoring any unique twists spawned threads could create regarding scope and locks, because I'm uninformed.
Please improve, as needed, anything I've written.