When you set up a database (at least in MS SQL Server) you can flag a field as allowing NULL values and which default values to take. If you look through people's DB structures, you'll see that a lot of people allow NULL values in their database. This is a very bad idea. I would recommend never allowing NULL values unless the field can logically have a NULL value (and even this I find this only really happens in DATE/TIME fields).
NULL values cause several problems. For starters, NULL values are not the same as data values. A NULL value is basically an undefined values. On the ColdFusion end, this is not terrible as NULL values come across as empty strings (for the most part). But in SQL, NULL and empty string are very different and act very differently. Take the following data table for example:
id name
---------------
1 Ben
2 Jim
3 Simon
4 <NULL>
5 <NULL>
6 Ye
7
8
9 Dave
10
This table has some empty strings (id: 7, 8, 10) and some NULL values (id: 4, 5). To see how these behave differently, look at the following query where we are trying to find the number of fields that do not have values:
Launch code in new window » Download code as text file »
This returns the following record:
LEN Count: 3
NULL Count: 2
LIKE Count: 3
Combo Count: 5
We were looking for 5 as records 4, 5, 7, 8, and 10 do not have values in them. However, you can see that only one attempt returned 5. This is because while a NULL value does NOT have a length, it is not a data type that makes sense with length. How can nothing have or not have a length? It's like asking "What does that math equation smell like?" You can't make comparisons like that.
So, allowing NULL values makes you work extra hard to get the kind of data you are looking for. From a related angle, allowing NULL values reduces your convictions about the data in your database. You can never quite be sure if a value exists or not. Does that make you feel safe and comfortable when programming?
Furthermore, while running LEN() on a NULL value doesn't act as you might think it to, it also does NOT throw an error. This will make debugging your code even harder if you do not understand the difference between NULL values and data values.
Bottom line: DO NOT ALLOW NULL VALUES unless absolutely necessary. You will only be making things harder for yourself.
Download Code Snippet ZIP File
Comments (29) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Ben - If you feel nulls should be rarely used, what about these types of columns?
per_middle_initial -
per_fax_number
per_name_suffix
I could list several other examples where I've allowed nulls. I've designed hundreds of tables and believe allowing null values is perfectly valid where appropriate.
Based on a business rule there is a legitimate reason that the column will not have a value for some entities.
Posted by Bruce on Feb 20, 2007 at 9:03 PM
Bruce,
While, I agree that for business rules, there are legitimate reasons to have NULL values, I am not convinced by your examples. You have a field like "middle_initial" that can have NULL values. So, basically you are saying that the user is not allowed to leave that blank. Or rather, a user can leave that blank, but you are saying in your business rules that blank answers are not stored... but are instead converted to no answer.
Look at the code I posted above... what would happen if I was a user who came to you and said "I want to see all middle initials that have no length. You turn around and "No problem, I will find you all the middle initialials that are NULL. Then the user says "Hey loser, I didn't say 'NULL', I said the ones that have no length. So you say, "Yes, but non-length fields are being stored as NULL". So the user says to you "But I didn't store a NULL value, I stored an empty value, who the hell are you to tell me that I can't store a empty string?!?!?"
Of course that is a retarded example. And it is meant to be retarded to point out the craziness over debating the "philosophy" behind using NULL values vs. using empty strings.
My POINT is that using NULLs makes things less convenient! Just look at the query above. NULLs complicate length checks for length and for LIKE'ness. Is that worth it? Is it worth having to check for NULL and for length just so you can say that "philosophically" you used the correct data storage??? That is up to you... I personally say that it is not worth it.. but to each his own.
Posted by Ben Nadel on Feb 21, 2007 at 7:41 AM
Ben:
You are missing a critical step in your discussion of null values: validating user input.
You can either set up the database so that there is a constraint (SQL SERVER) to not allow an empty string to be stored in the per_middle_initial field or you could set up a validation check in your application that would ensure only null was stored in per_middle_initial even if the user entered an empty string.
Your example's problem is NOT because of null, but because of poor design that allowed empty strings to be placed into the field.
If you use proper validation checks to ensure only null is stored in the field, even if the user enters an empty string, your query problem is solved.
Bruce
Posted by Bruce on Feb 21, 2007 at 8:22 AM
@Bruce,
I disagree... well rather, I don't disagree that validating user input is essential, that is true. By setting up a constraint that stores NULL values or altering your INSERT statements still does not change the fact that by allowing NULL values means that the data in the column may have two different behaviors.
Think about it from this point of view: you architect the most awesome database tables and empty fields are stored as NULL values. All is good. But then a client calls up frantic and furious, "Bruce! I we've fired all of our data entry people! I need to know all user accounts in the database that do not yet have a SSN stored... and I need it YESTERDAY!"
"I'm sorry, Bruce is out of the office today at a ColdFusion conference, I will put Ben on the job!" ... Ben, do you know how to do this?
"Sure thing boss, I will just grab all records whose SSN field has a ZERO length."
"Perfect, another client is satisfied, we have saved the day".
Little do I know that I am about to pass to the client a VERY incomplete record set as I will not be returning any records whose SSN number is NULL, just whose length is ZERO... which, depending on your constraints is not going to return any records at all.
The solution? Fire Ben???
Where is the problem here? Maybe the problem is that Ben (me) does not understand the database architecture? Yes, that could be a problem.
But, I think the real problem here is that at a CONCEPTUAL level, NULL values don't have lengths. It doesn't have a value, how can it have a length? But, of course, that does not jive with the IMPLEMENTATION level that says NULL values are NOT values and therefore, length does not apply.
Now, PLEASE DO NOT GET ME WRONG... I am not saying that we should throw out our understanding of the implementation in order to cater to those who do not understand it.... what I am arguing is that the BENEFITS of catering to the CONCEPTUAL level of understanding out weigh, philosophical principles of NULL vs. Value.
And of course, at the end of the day, to each his own.
Posted by Ben Nadel on Feb 21, 2007 at 8:57 AM
Ben,
I disagree about the usage of NULL values in a database. NULL values are an important factor in DB design. Actually you should not allow to store empty strings in a varchar (char, text, etc) field.
As you said, a NULL value means "does not exist". So, if I do not have a middle name, a middle initial for me does not exist. A NULL value would be the proper represantation here, whereas en empty string would be value, and it would be a wrong one.
Sure, CF turns NULL values into empty strings, but, that does not mean we should use these values within the database. The <cfqueryparam> tag offers all we need to handle this:
INSERT INTO myTable (myField)
VALUES (<cfqueryparam cfsqltype="cf_sql_varchar" value="#myCFVar#" NULL="#Not len(myCFVar)">)
Databases (and all their topics, from relational design to NULL values to constraints, etc.) are a very complex field. A good database administrator has most likely spend years gathering his knowledge and enhancing his abilities. And if Mr. X does not know how to handle NULL values, that means we should not let him handle a production database without supervision. It does not mean we should adjust our database to his knowledge. ;-)
Chris
Posted by Christoph Schmitz on Mar 13, 2007 at 5:37 AM
@Chris,
I completely understand what you are saying (and what many others have said). I just do not see the benefit of the NULL value. What higher purpose does it server? What are the advantages of having a NULL value in a varchar field rather than an empty string.
I would say that in 99% of cases, there is no advantage. Sure, you could tell me that we can tell the difference between an empty value and "non-submitted" value.... BUT, I would say in our web applications, how many times is that required OR even relevant?
I can see the advantage (which I have stated many times before) of NOT having NULL values. Can you please tell me the advantage in a real-world way of having NULL values.
Posted by Ben Nadel on Mar 13, 2007 at 6:14 AM
Storing a NULL value should save space over storing a string of zero length. I am not completely sure as to how much space will be saved, but in my database program I am trying to use null values instead of strings of length 0 to save space, and cause null pointer exceptions if data is read from the database and not checked properly. Data should be read, and then verified before used, which the previous author of the software I am working on never did.
But really, I do it to save space.
Posted by Mitch Lockhart on Apr 18, 2007 at 3:35 PM
I know this debate is long over, but i thought i'd just throw in my two cents.
Chris - just so we're clear - NULL does not mean "doesn't exist". It means UNKNOWN.
If you have a middle name that is left blank by a user that means he knows what his middle name is... it's nothing - i.e. blank.
Inserting NULL is saying it's UNKNOWN.
NULL = UNKNOWN
BLANK = KNOWN
I'm currently in a debate with a Business Analyst about just this issue. To me there is no good reason to store SQL NULL.
There is a good reason to store null, as in the literal meaning "nothing", blank, or a boolean\bit value.
But SQL NULL's are not necessary.
It's just bad design.
Over and out... since this was a google search i will more than likely not see any update after this one so no point replying to my comment directly to me :)
Posted by David on Jul 18, 2007 at 5:21 AM
I think null is very valuable in a lot of cases where you want to know if something is unknwon (i.e. data was not collected) or if it is really empty. For instance, suppose we have a contact form with firstname, lastname and emailaddress. We use it for a year, then the boss asks us to add middlename. Now, from this point on, if our users don't enter anything for middle name, they will have an empty string recorded. But for all the existing data, it would not be correct to assume that those people don't have middle names. We just don't have that data - hence null.
But more importantly, you only seem to be talking about strings. What about numbers? In sql server there is no numeric equivalent of an empty string. So if I don't allow nulls, the best I can do is zero. But what if I am collecting stock prices and reporting them on the hour. If I have no price for the hour, is it reasonable to return zero? I don't think so. Or if you are collecting anaytic data, test results, or any other number of things, zero is NOT the same as unknown.
I think in respone to this statement:
"Maybe the problem is that Ben (me) does not understand the database architecture?"
The only reasonable response is - don't ask people who don't understand SQL to run your reports!
Posted by ian on Jul 25, 2007 at 4:14 PM
@Ian,
Agreed - there are times when NULL values should be used. Your cost as zero example is a good one. So are things like date-sold which should be NULL if is not an actual date.
And yes, if is is important that you know if a middle name is collected yet or now. My argument is that most of the time, people only look at NULL from a theoretical standpoint and don't take a step back and think "If that string is empty instead of NULL, will it have any impact on my ability to carry on my business logic".... I am saying that if the answer is NO, then a NULL adds more overhead than it is worth from a practical standpoint.
Posted by Ben Nadel on Jul 25, 2007 at 4:56 PM
Null means nothing. It's empty. Previous comments have tried to pin it down to a reason why it's empty, this is absurd. There are no conclusions you can draw from a null except that no information was provided, you can't say it wasn't provided because it wasn't known, or it wasn't provided because it doesn't exist. It just wasn't provided.
Now that being said... what extra value can an empty string provide above that of a null? It gives you two empties instead of one. The meaning of each is still arbitrary and can vary from system to system. As one poster said, you can add a field and give it a default of empty string, then use the null vs. empty string to determine if the record was there before the field was added or not... but is this a standard? No. If you bring in some trainee fresh out of college is he gonna look at it and say "oh yeah, null means that field didn't exist when the record was added!" No. This person assigned those uses for that particular system and they apply only to that system.
For this reason, I think the original article is dead-on in that there should only be one 'nothing' and not two to be arbitrarily assigned different meaning in different systems. However, I think he has the wrong one. Null wins, empty string loses. Why? Because null works on all data types, not just strings. Null is necessary. You cannot get rid of it, unless you make all fields of all types required. Empty string is unnecessary. It was added for flexibility only.
Posted by Alex on Aug 2, 2007 at 6:59 PM
Well, I mostly agree with your last assesment - that if it is a question of no nulls or no empties, nulls are way better than empties. But I still stand by the difference between known versus empty. It is true that when we see a null we don't know whether ther is or is not a real value lurking out there. But we know that at the time the record was created, the PROGRAM didn't "know" - that is to say, we know that a user did not submit an empty string - if they had, there would be one in the system. So either they weren't asked, or there was a malfunction. This assumes that we pass empty strings for all form fields that are left blank, which I personally think is only useful in some situations, but in a case where we are trying to determine what data was submitted, we will want to store all values, even empty ones. So if we define our table to allow nulls and to have no defaults, then we know that if a field is null, it was not submitted. In the case of the trainee, she may not know the history of the data, but if she should know right away that because there are nulls in a field, she'd better disclaim her results.
Posted by ian on Aug 2, 2007 at 7:13 PM
@Alex,
So, are you saying that all of my CFQueryParam tags should be of this nature:
<cfqueryparam
value="#FORM.first_name#"
cfsqltype="CF_SQL_VARCHAR"
null="#NOT Len( FORM.first_name )#"
/>
From what it sounds like, you are suggesting that pretty much all of my CFQueryParam tags should make use of the "null" attribute not matter what?
Posted by Ben Nadel on Aug 3, 2007 at 7:15 AM
To Ian, yes I agree that's pretty much the only circumstance that calls for using an empty string. However, those sorts of questions should generally be answered by the time an application goes production. ("Generally" being the key word there.)
To Ben, I'm not very familiar with Cold Fusion, but it looks about right to me, in fact you might even take it a step further and add a trim on the null condition to ensure you don't get emtpy space saved even if it is greater than zero length. Whether or not that's a good idea also depends on the circumstances, that being said I suppose there's an exception for every rule. In general though, nulls cannot be avoided unless you make every non-string field required.
Posted by Alex on Aug 3, 2007 at 11:07 AM
I am running this query in CF
<CFQUERY Name="GetCompanyName" datasource="DS123">
SELECT CompanyName, date_approved
FROM CompanyProfile
WHERE CompanyID = #Val(CompanyID)# and
date_approved is not null and
CompanyName is not null
</CFQUERY>
But this query is still returning CompanyName(s) that have [blank] value. I do not want companies with blank nor Null values.
You can test drive my Beta site -
visit this link http://www.naperlist.com/directory/search_main.cfm and press search w/o selecting anything. You will see some companyName that are blank. I do not want to display these companies that are blank, nor Null - is there away I can fix this with CF. Please advise.
Thank you so much for any input you may have.
-AJ
Posted by AJ on Aug 17, 2007 at 2:50 AM
Hi AJ,
that problem arises from putting blank values where a NULL should have been... you must check against both options:
<CFQUERY Name="GetCompanyName" datasource="DS123">
SELECT CompanyName, date_approved
FROM CompanyProfile
WHERE CompanyID = #VAL(CompanyID)# and
date_approved is not null and
CompanyName is not null
AND CompanyName != ''
</CFQUERY>
A few tips on the code, if you don't mind...
I would recommend putting the data source name in a variable and setting that in a central location. This way, if you have to transfer your app or simply use another data source, you do not need to change 100s of pages:
<cfquery.... datsource="#myDSN#">
VAL() is not necessary, if Company contains a numeric value:
WHERE CompanyID = #CompanyID#
It is considered best practice to use <cfqueryparam>. Not only does it make your code more secure (prevents SQL injection attacks because it evaluates the parameter type), it also speeds up query execution. (Although I must admit that with a small query like this, performance should not show much difference):
WHERE CompanyID = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#CompanyID#" />
Last, but not least, I alway recommend to write the SQL keywords all UPPER case and only put 1 statement on 1 line, to improve readability. That includes ANDs and ORs for me ;-)
SELECT ..
FROM ...
WHERE ...
AND ...
ORDER BY...
HTH,
Chris
Posted by Chris on Aug 17, 2007 at 3:25 AM
@AJ,
You can check the length of the field:
CompanyName is not null
AND
CompanyName LIKE '_%'
The LIKE '_%' ensures that there is at least one character. You may not even need the NULL statement in conjunction with this.
Posted by Ben Nadel on Aug 17, 2007 at 7:22 AM
Ooops, I didn't see Chris's post... he has some good stuff there. He also compares to the empty string != '', which is good - something that I forget to do occasionally.
Posted by Ben Nadel on Aug 17, 2007 at 7:24 AM
I hate to be so snarky but if you are one to confuse the empty string with NULL, even the slightest little bit, then please, for the love of God, don't go anywhere near databases, especially as a developer.
2ยข
Posted by Jack on Aug 20, 2007 at 2:44 PM
The logic of not using NULL because you need extra work in your query is the worse logic I ever heard. And when thinking that your extra work is only one extra line that is ridiculous. Sorry dude but there is a reason I think for NULL to be there and if it is there its purpose is to be used.
Using your same logic I can give you hundred of examples that will make starters to think that it is bad to use int, char, nvarchar, uniqueidentifiers and so on.
Posted by John on Aug 21, 2007 at 4:46 PM
@John,
NULL values do have a time and a place. AND, with every situation, there is a trade-off. My argument is that for MY purposes, most of the time, the existence of a NULL value only adds overhead RATHER than adding useful functionality. By all means, if the NULL value adds functionality, then use it.
My statement that NULL values are bad is not meant to be a blanket statement. On the contrary, I happen to love NULL values for things like date/time fields because to me, that is a really meaningful usage. However, with things like Last Name or First Name, they simply don't add any useful functionality - they only add the additional overhead / logic to your code.
So, I ask you this - if they serve no purpose to ME and MY situation, would it not be foolish to keep them in just because NULL was invented for a reason???
Posted by Ben Nadel on Aug 21, 2007 at 5:03 PM
Hello Ben, Chris & Everyone -
I appreciate your valuable input. My website is up & running now. After reading your comments, I decided not to use Null altogether, which really helped me with my coding. In my situation, when a user creates an account for Classifieds, they have an option to add their company info as well (see this link http://www.naperlist.com/accountcreate.cfm ) if they choose not to, then the company name must use the value NA. However, if they deleted NA & kept the field blank, then behind the seen I check for Blank, if true, I insert NA in the company field. As something as simple as this, it did save me time & the headaches as far as Null value is concerned.
FYI - I am not a CF developer, I've only used it for about 2yrs, I am actually a Sr. Oracle DBA, I have been in the field for about 12yrs now, but I really felt in love with CF the day I put my hands on it.
By the way, please visit my website when you have chance, I would welcome any suggestion you may have, I'd appreciate that..
I will certainly come back to this cool Forum anytime I have CF questions.
Thank you all!!
-AJ
Posted by AJ on Aug 22, 2007 at 12:52 AM
AJ,
The problem is your site is broken for people who work for a company named NA, or perhaps Northern Air or Norton-Arnold. If this field can be left unpopulated, then you should use NULL for the back-end representation.
Jack
Posted by Jack on Aug 22, 2007 at 1:08 AM
While I disagree with the use of NA, I would say use NULL only if you can have a valid company name that is blank. Unless of course, you just want to use NULL.
Posted by Ben Nadel on Aug 22, 2007 at 7:20 AM
Not quite Jack. When you create an account thru my site, the company info is Optional, therefore, when someone do a search for a particular item, all companies with NA are excluded from the search, no companies with NA will get displayed in the search results.
The way I setup my system is that a member can only create an account for Classifieds OR he/she can create a Classified Ad as well as a Biz profile. The goal is to provide Classifieds for free, and sometime in the future, i will need to charge a reasonable fee for the directory profiles - Someone has to pay for the expenses afterall :-)
Give it a try to see what I mean:
http://www.naperlist.com/directory/search_main.cfm
Hope this clarifies your question. Any other suggestions, please post them online.
Thank you.
-AJ
Posted by AJ on Aug 22, 2007 at 12:55 PM
One more thing Jack.
How many companies within my community that are called NA? no idea. If that is the case, they will need to spell out the whole name. Or they can call it NA Inc.
Actually this happened to me - I wanted to create a Yahoo account with the word NETWORK, guess what, I got an error messge that I can't use the work NETWORK. I am sure there are lots of companies that use the word Network (i.e. NetworkSolutions) - But did Yahoo give a damn about that, NO! They had to do what needs to be done to keep their system clean (no special word conflict) or for whatever other reasons..
Therefore, I chose a different name and life goes on.. :-)
-AJ
Posted by AJ on Aug 22, 2007 at 1:07 PM
Or, you could just use NULL, for the exact purpose it was intended.
Posted by Jack on Aug 22, 2007 at 1:19 PM
One of the main reasons against NULL values and a problem we are dealing with at my work right now is the fact that NULL values are not included in INDEXES. So if you have a small percentage of your table where a column has a NULL value and you have an index on that column, the optimizer will not use that index - it will force a full table scan, degrading performance considerably. I know this definitely applies to Oracle DBs - not so sure about the others though.
Posted by Shaun on Aug 25, 2007 at 2:48 PM
Hi, everyone.
Although it's an old topic. I have some thoughts which I'd like to share with you.
First of all, a database is only "a database". Data modeling and database design should not be so much involed with presentation technologies. I know some web plaftform supports direct database manipulation. But in my opinion, that's really for fast prototyping purpose. A full feathered production, or system, always is a distributed system. For example, SOA call for design the inferface according to message requirement, that is the data contract. A database always need to serve many customer, ofcourse, it doesn't directly exposed it's schema, but use a middleware instead.
I give a scenario I'm facing to currently.
A database has a custom table. Firstname and Lastname are two columns. It separatly "John" and "Smith" for custom "John Smith". You need to show "Smith, John" for service A. But I'm a chinese. My name is "Baoshan Sheng", "Sheng" is my Lastname. For service B, if my name is displayed as "Sheng, Baoshan", I feel the company doesn't respect me very well. ( The same as "Hu Jintao", not "Hu, Jintao" ).
How to handle this scenario?
You can add some algorithm to handle asian name convention, and ofcourse, a column "IsAsian".
But there're many different regions that our knowledge doesn't cover yet. So I decide to add a column name "RealGeneralName" to directly stored the "dictionary order" name spelling.
For service A, we can use SELECT ISNULL(RealGeneralName, Lastname + ', ' + Firstname) AS GeneralName if we store a NULL in RealGeneralName for most people whose name is regular. That means we hide the real schema to the service b's middleware. It's intuitive and it's performance is better than expose both the three column to the service middleware. (This may be controversial.)
For service B, e.g. an customer entry service(sales service?), we expose the three columns directly to our sales. Salers can handle different conditions.
If we don't allow NULLs, and use empty string instead. the SQLs will be more ugly - it'll no longer a scalar expression. Am I right?
That's a schema I designed at the current stage. I'm not sure I'll not change my design in the future. But I do like to share these thinkings with you. Comments are welcomed.
Baoshan Sheng. Pardom my english.
Posted by Baoshan Sheng on Jan 13, 2008 at 11:05 PM