When I was working on a new ColdFusion and SQL pagination algorithm the other day, I dropped a lot of the validation logic that went into the pagination values. For instance, normally, if someone entered -30 for an offset, I would take the time to make sure that became 1. Or, if someone put in 9999999999 (past the size of the data table), I would make sure that the offset became the last possible page, and not 9999999999. I would even do things like adjusting to a page if someone put in an offset that was not the beginning of a page (ie. they put in the offset 8 and I adjust to offset=6 [for a 5 item page]).
Basically, what I was doing was taking on this attitude:
"If you try to break my site, I am going to bend over backwards to make sure that the site not only works but returns the closest possible data that matches your request".
But, as I was working on the pagination stuff, I realized, this is such a silly thing to do. If someone is going to be malicious, why should I care about putting in the effort? Now, I am not talking about error checking - that should always be done - the site should never error out. I am just talking about cleaning malicious data.
The only thing that I could think of was the problem of false-positives. For instance, what if someone pastes in a URL but not completely and some value copies only partially. This is not malicious, this is just user error. In a case like that, I would want to try and return the closest possible "guesstimated" target result.
Anyway, not sure where I was going with this. Just thinking out load. Cleaning data requires more effort than it might be worth in SOME cases. Just trying to find the balance I am comfortable with.
Looking For A New Job?
Here's a classic example that just popped into my head - TO and FROM dates. If someone puts in a FROM date that is after the TO date, then this will never return records. Should I put up an error message? Or should I just return the data (which will be empty). Is it worth cleaning that?
Ok, maybe not the best example, since that's not really malicious.
Does it make more sense to throw up an error? Or to just swap the two dates?
I normally tell them something like "the end date you selected occurs before the begin date. please select a valid one."
If you guess, and guess wrong, that can be a huge problem. For this case in particular, I think you'd almost always be guessing wrong, because people don't generally do something like that on purpose.
I'm not a big fan of trying to correct data like that - it can only lead to trouble if you guess wrong, and the worst part is that the user may never notice.
If I wanted to be extra friendly, I might provide a "The end date came before the begin date. Did you mean...? " style of thing where they are alerted to the problem, and have a "guessed" chance to fix it.
I could care less either on the paging issue. Sometimes I like to change the page in the URL rather than clicking "next" so I might accidentally go to a higher than highest page. I wouldn't be upset if you gave me the highest available, but I might like to know about it.
I have a project manager who *always* does that !
Show him a entry field that takes a number and he'll enter a million or two in it if he can, preferably a negative one.
I've learnt to anticipate it by now, but I'm *so* tempted to raise an alert of "F*^* off Jose" for the really stupid values :) But that would be very very stupid...
"I could care less either" should have been "I could care less either /way/"
Sorry - it sounded bad the first way.
I would definitely throw an error message, and not try to guess the users intent.
If you guess wrong, the user may get information they think means one thing, but in fact means something else.
Let's take a mapping application. Let's say you meet a woman that lives in a brand new development at 123 Man Street. You check your mapping application for directions because she is very hot, and you do not want to be late. The map database has not yet been updated with the new development and finds no results for Man St. so the app guesses you really meant "Main St." and serves up directions to 123 Main St.
Next thing you know you are spending another Friday night bowling on your Wii and your hot new friend is now someone else's hot new friend.
if they're clearly doing something malicious then send them a virus.. or write their IP address to a temporary blacklist.. or redirect them to a gov site or something..
Also it's a good idea to make sure you take care of decimal points by rounding them up so someone doesn't enter page=0.99999 etc...
Ha ha, I see someone just tried to break my pagination with 9999999999999. It does break, but this is a data type issue (the value entered was TOO large for the cf_sql_integer CFQueryParam. This raises another very interesting, but unrelated issue, that better data type validation is required.... or is it - is showing a "friendly" error page an OK thing to do for something like this, which is very clearly malicious.
In the case of pagination, you should be chucking a 404 if something's out of range.
This might be a valid page: index.cfm?page=2
As might this: index.cfm?page=300
But index.cfm?page=9999 might not return any records - so you should 404 it.
Otherwise, if you return a valid page, you'll be telling search engines, browsers etc that the page *really does exist*, and you might end up being indexed in a page you don't want...
I guess what I'm trying to say is you should be treating url parameters as if they're "real" pages (page-2.html, page-300.html), hence you'd be forced to 404 page-9999.html as it wouldn't exist...
Yeah I agree with everyone here I think about not guestimating the user. I also really like Geoffs concept of throwing a 404, I mean, You can still display a nice friendly error to the user saying that the page they requested does not exist, just add an HTTP reponse header for the 404 so the browser/search bot know what is going on.