Why NULL Values Should Not Be Used in a Database Unless Required

Posted June 14, 2006 at 2:00 PM

Tags: SQL

When you set up a database (at least in MS SQL Server) you can flag a field as allowing NULL values and which default values to take. If you look through people's DB structures, you'll see that a lot of people allow NULL values in their database. This is a very bad idea. I would recommend never allowing NULL values unless the field can logically have a NULL value (and even this I find this only really happens in DATE/TIME fields).

NULL values cause several problems. For starters, NULL values are not the same as data values. A NULL value is basically an undefined values. On the ColdFusion end, this is not terrible as NULL values come across as empty strings (for the most part). But in SQL, NULL and empty string are very different and act very differently. Take the following data table for example:

id    name
---------------
1     Ben
2     Jim
3     Simon
4     <NULL>
5     <NULL>
6     Ye
7
8
9     Dave
10

This table has some empty strings (id: 7, 8, 10) and some NULL values (id: 4, 5). To see how these behave differently, look at the following query where we are trying to find the number of fields that do not have values:

 Launch code in new window » Download code as text file »

  • SELECT
  • (
  • SELECT
  • COUNT( * )
  • FROM
  • test t
  • WHERE
  • LEN( t.name ) = 0
  • ) AS len_count,
  • (
  • SELECT
  • COUNT( * )
  • FROM
  • test t
  • WHERE
  • t.name IS NULL
  • ) AS null_count,
  • (
  • SELECT
  • COUNT( * )
  • FROM
  • test t
  • WHERE
  • t.name NOT LIKE '_%'
  • ) AS like_count,
  • (
  • SELECT
  • COUNT( * )
  • FROM
  • test t
  • WHERE
  • t.name IS NULL
  • OR
  • t.name NOT LIKE '_%'
  • ) AS combo_count

This returns the following record:

LEN Count: 3
NULL Count: 2
LIKE Count: 3
Combo Count: 5

We were looking for 5 as records 4, 5, 7, 8, and 10 do not have values in them. However, you can see that only one attempt returned 5. This is because while a NULL value does NOT have a length, it is not a data type that makes sense with length. How can nothing have or not have a length? It's like asking "What does that math equation smell like?" You can't make comparisons like that.

So, allowing NULL values makes you work extra hard to get the kind of data you are looking for. From a related angle, allowing NULL values reduces your convictions about the data in your database. You can never quite be sure if a value exists or not. Does that make you feel safe and comfortable when programming?

Furthermore, while running LEN() on a NULL value doesn't act as you might think it to, it also does NOT throw an error. This will make debugging your code even harder if you do not understand the difference between NULL values and data values.

Bottom line: DO NOT ALLOW NULL VALUES unless absolutely necessary. You will only be making things harder for yourself.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Feb 20, 2007 at 9:03 PM // reply »
22 Comments

Ben - If you feel nulls should be rarely used, what about these types of columns?

per_middle_initial -
per_fax_number
per_name_suffix

I could list several other examples where I've allowed nulls. I've designed hundreds of tables and believe allowing null values is perfectly valid where appropriate.

Based on a business rule there is a legitimate reason that the column will not have a value for some entities.


Feb 21, 2007 at 7:41 AM // reply »
6,516 Comments

Bruce,

While, I agree that for business rules, there are legitimate reasons to have NULL values, I am not convinced by your examples. You have a field like "middle_initial" that can have NULL values. So, basically you are saying that the user is not allowed to leave that blank. Or rather, a user can leave that blank, but you are saying in your business rules that blank answers are not stored... but are instead converted to no answer.

Look at the code I posted above... what would happen if I was a user who came to you and said "I want to see all middle initials that have no length. You turn around and "No problem, I will find you all the middle initialials that are NULL. Then the user says "Hey loser, I didn't say 'NULL', I said the ones that have no length. So you say, "Yes, but non-length fields are being stored as NULL". So the user says to you "But I didn't store a NULL value, I stored an empty value, who the hell are you to tell me that I can't store a empty string?!?!?"

Of course that is a retarded example. And it is meant to be retarded to point out the craziness over debating the "philosophy" behind using NULL values vs. using empty strings.

My POINT is that using NULLs makes things less convenient! Just look at the query above. NULLs complicate length checks for length and for LIKE'ness. Is that worth it? Is it worth having to check for NULL and for length just so you can say that "philosophically" you used the correct data storage??? That is up to you... I personally say that it is not worth it.. but to each his own.


Feb 21, 2007 at 8:22 AM // reply »
22 Comments

Ben:

You are missing a critical step in your discussion of null values: validating user input.

You can either set up the database so that there is a constraint (SQL SERVER) to not allow an empty string to be stored in the per_middle_initial field or you could set up a validation check in your application that would ensure only null was stored in per_middle_initial even if the user entered an empty string.

Your example's problem is NOT because of null, but because of poor design that allowed empty strings to be placed into the field.

If you use proper validation checks to ensure only null is stored in the field, even if the user enters an empty string, your query problem is solved.

Bruce


Feb 21, 2007 at 8:57 AM // reply »
6,516 Comments

@Bruce,

I disagree... well rather, I don't disagree that validating user input is essential, that is true. By setting up a constraint that stores NULL values or altering your INSERT statements still does not change the fact that by allowing NULL values means that the data in the column may have two different behaviors.

Think about it from this point of view: you architect the most awesome database tables and empty fields are stored as NULL values. All is good. But then a client calls up frantic and furious, "Bruce! I we've fired all of our data entry people! I need to know all user accounts in the database that do not yet have a SSN stored... and I need it YESTERDAY!"

"I'm sorry, Bruce is out of the office today at a ColdFusion conference, I will put Ben on the job!" ... Ben, do you know how to do this?

"Sure thing boss, I will just grab all records whose SSN field has a ZERO length."

"Perfect, another client is satisfied, we have saved the day".

Little do I know that I am about to pass to the client a VERY incomplete record set as I will not be returning any records whose SSN number is NULL, just whose length is ZERO... which, depending on your constraints is not going to return any records at all.

The solution? Fire Ben???

Where is the problem here? Maybe the problem is that Ben (me) does not understand the database architecture? Yes, that could be a problem.

But, I think the real problem here is that at a CONCEPTUAL level, NULL values don't have lengths. It doesn't have a value, how can it have a length? But, of course, that does not jive with the IMPLEMENTATION level that says NULL values are NOT values and therefore, length does not apply.

Now, PLEASE DO NOT GET ME WRONG... I am not saying that we should throw out our understanding of the implementation in order to cater to those who do not understand it.... what I am arguing is that the BENEFITS of catering to the CONCEPTUAL level of understanding out weigh, philosophical principles of NULL vs. Value.

And of course, at the end of the day, to each his own.


Mar 13, 2007 at 5:37 AM // reply »
24 Comments

Ben,

I disagree about the usage of NULL values in a database. NULL values are an important factor in DB design. Actually you should not allow to store empty strings in a varchar (char, text, etc) field.

As you said, a NULL value means "does not exist". So, if I do not have a middle name, a middle initial for me does not exist. A NULL value would be the proper represantation here, whereas en empty string would be value, and it would be a wrong one.

Sure, CF turns NULL values into empty strings, but, that does not mean we should use these values within the database. The <cfqueryparam> tag offers all we need to handle this:
INSERT INTO myTable (myField)
VALUES (<cfqueryparam cfsqltype="cf_sql_varchar" value="#myCFVar#" NULL="#Not len(myCFVar)">)

Databases (and all their topics, from relational design to NULL values to constraints, etc.) are a very complex field. A good database administrator has most likely spend years gathering his knowledge and enhancing his abilities. And if Mr. X does not know how to handle NULL values, that means we should not let him handle a production database without supervision. It does not mean we should adjust our database to his knowledge. ;-)

Chris


Mar 13, 2007 at 6:14 AM // reply »
6,516 Comments

@Chris,

I completely understand what you are saying (and what many others have said). I just do not see the benefit of the NULL value. What higher purpose does it server? What are the advantages of having a NULL value in a varchar field rather than an empty string.

I would say that in 99% of cases, there is no advantage. Sure, you could tell me that we can tell the difference between an empty value and "non-submitted" value.... BUT, I would say in our web applications, how many times is that required OR even relevant?

I can see the advantage (which I have stated many times before) of NOT having NULL values. Can you please tell me the advantage in a real-world way of having NULL values.


Apr 18, 2007 at 3:35 PM // reply »
1 Comments

Storing a NULL value should save space over storing a string of zero length. I am not completely sure as to how much space will be saved, but in my database program I am trying to use null values instead of strings of length 0 to save space, and cause null pointer exceptions if data is read from the database and not checked properly. Data should be read, and then verified before used, which the previous author of the software I am working on never did.

But really, I do it to save space.


Jul 18, 2007 at 5:21 AM // reply »
1 Comments

I know this debate is long over, but i thought i'd just throw in my two cents.

Chris - just so we're clear - NULL does not mean "doesn't exist". It means UNKNOWN.

If you have a middle name that is left blank by a user that means he knows what his middle name is... it's nothing - i.e. blank.
Inserting NULL is saying it's UNKNOWN.

NULL = UNKNOWN
BLANK = KNOWN

I'm currently in a debate with a Business Analyst about just this issue. To me there is no good reason to store SQL NULL.

There is a good reason to store null, as in the literal meaning "nothing", blank, or a boolean\bit value.
But SQL NULL's are not necessary.
It's just bad design.

Over and out... since this was a google search i will more than likely not see any update after this one so no point replying to my comment directly to me :)


ian
Jul 25, 2007 at 4:14 PM // reply »
2 Comments

I think null is very valuable in a lot of cases where you want to know if something is unknwon (i.e. data was not collected) or if it is really empty. For instance, suppose we have a contact form with firstname, lastname and emailaddress. We use it for a year, then the boss asks us to add middlename. Now, from this point on, if our users don't enter anything for middle name, they will have an empty string recorded. But for all the existing data, it would not be correct to assume that those people don't have middle names. We just don't have that data - hence null.

But more importantly, you only seem to be talking about strings. What about numbers? In sql server there is no numeric equivalent of an empty string. So if I don't allow nulls, the best I can do is zero. But what if I am collecting stock prices and reporting them on the hour. If I have no price for the hour, is it reasonable to return zero? I don't think so. Or if you are collecting anaytic data, test results, or any other number of things, zero is NOT the same as unknown.

I think in respone to this statement:
"Maybe the problem is that Ben (me) does not understand the database architecture?"
The only reasonable response is - don't ask people who don't understand SQL to run your reports!


Jul 25, 2007 at 4:56 PM // reply »
6,516 Comments

@Ian,

Agreed - there are times when NULL values should be used. Your cost as zero example is a good one. So are things like date-sold which should be NULL if is not an actual date.

And yes, if is is important that you know if a middle name is collected yet or now. My argument is that most of the time, people only look at NULL from a theoretical standpoint and don't take a step back and think "If that string is empty instead of NULL, will it have any impact on my ability to carry on my business logic".... I am saying that if the answer is NO, then a NULL adds more overhead than it is worth from a practical standpoint.


Aug 2, 2007 at 6:59 PM // reply »
3 Comments

Null means nothing. It's empty. Previous comments have tried to pin it down to a reason why it's empty, this is absurd. There are no conclusions you can draw from a null except that no information was provided, you can't say it wasn't provided because it wasn't known, or it wasn't provided because it doesn't exist. It just wasn't provided.

Now that being said... what extra value can an empty string provide above that of a null? It gives you two empties instead of one. The meaning of each is still arbitrary and can vary from system to system. As one poster said, you can add a field and give it a default of empty string, then use the null vs. empty string to determine if the record was there before the field was added or not... but is this a standard? No. If you bring in some trainee fresh out of college is he gonna look at it and say "oh yeah, null means that field didn't exist when the record was added!" No. This person assigned those uses for that particular system and they apply only to that system.

For this reason, I think the original article is dead-on in that there should only be one 'nothing' and not two to be arbitrarily assigned different meaning in different systems. However, I think he has the wrong one. Null wins, empty string loses. Why? Because null works on all data types, not just strings. Null is necessary. You cannot get rid of it, unless you make all fields of all types required. Empty string is unnecessary. It was added for flexibility only.


ian
Aug 2, 2007 at 7:13 PM // reply »
2 Comments

Well, I mostly agree with your last assesment - that if it is a question of no nulls or no empties, nulls are way better than empties. But I still stand by the difference between known versus empty. It is true that when we see a null we don't know whether ther is or is not a real value lurking out there. But we know that at the time the record was created, the PROGRAM didn't "know" - that is to say, we know that a user did not submit an empty string - if they had, there would be one in the system. So either they weren't asked, or there was a malfunction. This assumes that we pass empty strings for all form fields that are left blank, which I personally think is only useful in some situations, but in a case where we are trying to determine what data was submitted, we will want to store all values, even empty ones. So if we define our table to allow nulls and to have no defaults, then we know that if a field is null, it was not submitted. In the case of the trainee, she may not know the history of the data, but if she should know right away that because there are nulls in a field, she'd better disclaim her results.


Aug 3, 2007 at 7:15 AM // reply »
6,516 Comments

@Alex,

So, are you saying that all of my CFQueryParam tags should be of this nature:

<cfqueryparam
value="#FORM.first_name#"
cfsqltype="CF_SQL_VARCHAR"
null="#NOT Len( FORM.first_name )#"
/>

From what it sounds like, you are suggesting that pretty much all of my CFQueryParam tags should make use of the "null" attribute not matter what?


Aug 3, 2007 at 11:07 AM // reply »
3 Comments

To Ian, yes I agree that's pretty much the only circumstance that calls for using an empty string. However, those sorts of questions should generally be answered by the time an application goes production. ("Generally" being the key word there.)

To Ben, I'm not very familiar with Cold Fusion, but it looks about right to me, in fact you might even take it a step further and add a trim on the null condition to ensure you don't get emtpy space saved even if it is greater than zero length. Whether or not that's a good idea also depends on the circumstances, that being said I suppose there's an exception for every rule. In general though, nulls cannot be avoided unless you make every non-string field required.


AJ
Aug 17, 2007 at 2:50 AM // reply »
4 Comments

I am running this query in CF

<CFQUERY Name="GetCompanyName" datasource="DS123">
SELECT CompanyName, date_approved
FROM CompanyProfile
WHERE CompanyID = #Val(CompanyID)# and
date_approved is not null and
CompanyName is not null
</CFQUERY>

But this query is still returning CompanyName(s) that have [blank] value. I do not want companies with blank nor Null values.

You can test drive my Beta site -
visit this link http://www.naperlist.com/directory/search_main.cfm and press search w/o selecting anything. You will see some companyName that are blank. I do not want to display these companies that are blank, nor Null - is there away I can fix this with CF. Please advise.

Thank you so much for any input you may have.

-AJ


Aug 17, 2007 at 3:25 AM // reply »
24 Comments

Hi AJ,

that problem arises from putting blank values where a NULL should have been... you must check against both options:

<CFQUERY Name="GetCompanyName" datasource="DS123">
SELECT CompanyName, date_approved
FROM CompanyProfile
WHERE CompanyID = #VAL(CompanyID)# and
date_approved is not null and
CompanyName is not null
AND CompanyName != ''
</CFQUERY>

A few tips on the code, if you don't mind...

I would recommend putting the data source name in a variable and setting that in a central location. This way, if you have to transfer your app or simply use another data source, you do not need to change 100s of pages:
<cfquery.... datsource="#myDSN#">

VAL() is not necessary, if Company contains a numeric value:
WHERE CompanyID = #CompanyID#

It is considered best practice to use <cfqueryparam>. Not only does it make your code more secure (prevents SQL injection attacks because it evaluates the parameter type), it also speeds up query execution. (Although I must admit that with a small query like this, performance should not show much difference):
WHERE CompanyID = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#CompanyID#" />

Last, but not least, I alway recommend to write the SQL keywords all UPPER case and only put 1 statement on 1 line, to improve readability. That includes ANDs and ORs for me ;-)

SELECT ..
FROM ...
WHERE ...
AND ...
ORDER BY...

HTH,

Chris


Aug 17, 2007 at 7:22 AM // reply »
6,516 Comments

@AJ,

You can check the length of the field:

CompanyName is not null
AND
CompanyName LIKE '_%'

The LIKE '_%' ensures that there is at least one character. You may not even need the NULL statement in conjunction with this.


Aug 17, 2007 at 7:24 AM // reply »
6,516 Comments

Ooops, I didn't see Chris's post... he has some good stuff there. He also compares to the empty string != '', which is good - something that I forget to do occasionally.


Aug 20, 2007 at 2:44 PM // reply »
3 Comments

I hate to be so snarky but if you are one to confuse the empty string with NULL, even the slightest little bit, then please, for the love of God, don't go anywhere near databases, especially as a developer.

2¢


Aug 21, 2007 at 4:46 PM // reply »
1 Comments

The logic of not using NULL because you need extra work in your query is the worse logic I ever heard. And when thinking that your extra work is only one extra line that is ridiculous. Sorry dude but there is a reason I think for NULL to be there and if it is there its purpose is to be used.
Using your same logic I can give you hundred of examples that will make starters to think that it is bad to use int, char, nvarchar, uniqueidentifiers and so on.


Aug 21, 2007 at 5:03 PM // reply »
6,516 Comments

@John,

NULL values do have a time and a place. AND, with every situation, there is a trade-off. My argument is that for MY purposes, most of the time, the existence of a NULL value only adds overhead RATHER than adding useful functionality. By all means, if the NULL value adds functionality, then use it.

My statement that NULL values are bad is not meant to be a blanket statement. On the contrary, I happen to love NULL values for things like date/time fields because to me, that is a really meaningful usage. However, with things like Last Name or First Name, they simply don't add any useful functionality - they only add the additional overhead / logic to your code.

So, I ask you this - if they serve no purpose to ME and MY situation, would it not be foolish to keep them in just because NULL was invented for a reason???


AJ
Aug 22, 2007 at 12:52 AM // reply »
4 Comments

Hello Ben, Chris & Everyone -

I appreciate your valuable input. My website is up & running now. After reading your comments, I decided not to use Null altogether, which really helped me with my coding. In my situation, when a user creates an account for Classifieds, they have an option to add their company info as well (see this link http://www.naperlist.com/accountcreate.cfm ) if they choose not to, then the company name must use the value NA. However, if they deleted NA & kept the field blank, then behind the seen I check for Blank, if true, I insert NA in the company field. As something as simple as this, it did save me time & the headaches as far as Null value is concerned.

FYI - I am not a CF developer, I've only used it for about 2yrs, I am actually a Sr. Oracle DBA, I have been in the field for about 12yrs now, but I really felt in love with CF the day I put my hands on it.

By the way, please visit my website when you have chance, I would welcome any suggestion you may have, I'd appreciate that..

I will certainly come back to this cool Forum anytime I have CF questions.

Thank you all!!

-AJ


Aug 22, 2007 at 1:08 AM // reply »
3 Comments

AJ,

The problem is your site is broken for people who work for a company named NA, or perhaps Northern Air or Norton-Arnold. If this field can be left unpopulated, then you should use NULL for the back-end representation.

Jack


Aug 22, 2007 at 7:20 AM // reply »
6,516 Comments

While I disagree with the use of NA, I would say use NULL only if you can have a valid company name that is blank. Unless of course, you just want to use NULL.


AJ
Aug 22, 2007 at 12:55 PM // reply »
4 Comments

Not quite Jack. When you create an account thru my site, the company info is Optional, therefore, when someone do a search for a particular item, all companies with NA are excluded from the search, no companies with NA will get displayed in the search results.

The way I setup my system is that a member can only create an account for Classifieds OR he/she can create a Classified Ad as well as a Biz profile. The goal is to provide Classifieds for free, and sometime in the future, i will need to charge a reasonable fee for the directory profiles - Someone has to pay for the expenses afterall :-)

Give it a try to see what I mean:
http://www.naperlist.com/directory/search_main.cfm

Hope this clarifies your question. Any other suggestions, please post them online.

Thank you.

-AJ


AJ
Aug 22, 2007 at 1:07 PM // reply »
4 Comments

One more thing Jack.

How many companies within my community that are called NA? no idea. If that is the case, they will need to spell out the whole name. Or they can call it NA Inc.

Actually this happened to me - I wanted to create a Yahoo account with the word NETWORK, guess what, I got an error messge that I can't use the work NETWORK. I am sure there are lots of companies that use the word Network (i.e. NetworkSolutions) - But did Yahoo give a damn about that, NO! They had to do what needs to be done to keep their system clean (no special word conflict) or for whatever other reasons..

Therefore, I chose a different name and life goes on.. :-)

-AJ


Aug 22, 2007 at 1:19 PM // reply »
3 Comments

Or, you could just use NULL, for the exact purpose it was intended.


Aug 25, 2007 at 2:48 PM // reply »
1 Comments

One of the main reasons against NULL values and a problem we are dealing with at my work right now is the fact that NULL values are not included in INDEXES. So if you have a small percentage of your table where a column has a NULL value and you have an index on that column, the optimizer will not use that index - it will force a full table scan, degrading performance considerably. I know this definitely applies to Oracle DBs - not so sure about the others though.


Jan 13, 2008 at 11:05 PM // reply »
1 Comments

Hi, everyone.
Although it's an old topic. I have some thoughts which I'd like to share with you.
First of all, a database is only "a database". Data modeling and database design should not be so much involed with presentation technologies. I know some web plaftform supports direct database manipulation. But in my opinion, that's really for fast prototyping purpose. A full feathered production, or system, always is a distributed system. For example, SOA call for design the inferface according to message requirement, that is the data contract. A database always need to serve many customer, ofcourse, it doesn't directly exposed it's schema, but use a middleware instead.
I give a scenario I'm facing to currently.
A database has a custom table. Firstname and Lastname are two columns. It separatly "John" and "Smith" for custom "John Smith". You need to show "Smith, John" for service A. But I'm a chinese. My name is "Baoshan Sheng", "Sheng" is my Lastname. For service B, if my name is displayed as "Sheng, Baoshan", I feel the company doesn't respect me very well. ( The same as "Hu Jintao", not "Hu, Jintao" ).
How to handle this scenario?
You can add some algorithm to handle asian name convention, and ofcourse, a column "IsAsian".
But there're many different regions that our knowledge doesn't cover yet. So I decide to add a column name "RealGeneralName" to directly stored the "dictionary order" name spelling.
For service A, we can use SELECT ISNULL(RealGeneralName, Lastname + ', ' + Firstname) AS GeneralName if we store a NULL in RealGeneralName for most people whose name is regular. That means we hide the real schema to the service b's middleware. It's intuitive and it's performance is better than expose both the three column to the service middleware. (This may be controversial.)
For service B, e.g. an customer entry service(sales service?), we expose the three columns directly to our sales. Salers can handle different conditions.
If we don't allow NULLs, and use empty string instead. the SQLs will be more ugly - it'll no longer a scalar expression. Am I right?
That's a schema I designed at the current stage. I'm not sure I'll not change my design in the future. But I do like to share these thinkings with you. Comments are welcomed.
Baoshan Sheng. Pardom my english.


Jul 1, 2009 at 2:12 AM // reply »
3 Comments

If you are storing contact details in a table and you have a field Fax for example, if your contact does not have a fax machine shouldn't you put NULL in this field?


Jul 1, 2009 at 2:21 AM // reply »
34 Comments

@Paul

Hey Paul

When the user doesn't supply information it would be a lot smarter to have a generic fax number ex.) 000-000-0000 or however you format your number.

The reason being

Assume that you check to ensure that the number is valid. If you return a NULL value it's obviously not going to be valid. Oh course there are always ways around it but its best to stay consistent.


Jul 1, 2009 at 3:25 AM // reply »
24 Comments

@Jody

I respectfully disagree. While it may be more convenient to have '000-000-0000' instead of a NULL, it's neither good nor correct.

NULL marks the absence of a value, so, if a user doen't provide a value, NULL is what to store in the database. '000-000-0000' on the other hand is a value, it generates data where the input didn't provide it.

I know that the implementation of NULLs in RDBMS is, well, less that optimal (especially if you look at aggreta funcitons), but, they exist for a reason. All other models would either have introduced additional complexity (like multiple types of NULL) or would have resulted in too little complexity (like omitting the NULLs altogher).

Plus, one more practical reason: how are you going to handle fax numbers that do not follow your formatting? ;-)

Cheers

Chris


Jul 9, 2009 at 10:55 PM // reply »
3 Comments

Has no here heard of foreign key or unique constraints?

You get away with FKs and NOT NULL by forcing the foreign table to have a special reserved value which means "not there" but that's a horrible kludge that will keep on giving and giving.

However, without NULL you can't have have a "unique if it is there" constraint unless you want to write your own CHECK constraint.

Yeah I know I'm a couple years late to this discussion but these are important data integrity tools and they often won't work without NULL.


Jul 10, 2009 at 2:33 AM // reply »
3 Comments

@Chris

I agree with you 100%. However, I was referring to a value that would be "REQUIRED". If you return a NULL value it didn't meet the REQUIRED attributes, right?

I'm not just referring to the FAX number alone. I'm referring to POSTED values, and/or any other types of values that you want to validate.

To answer your question about proper validation Coldfusion has a wonderful function called validate which is added to the input tag and you can set it to validate phone numbers. Simple enough, or you could also validate the number using IsValid in a CFIF statement.


Jul 10, 2009 at 8:17 AM // reply »
6,516 Comments

@Eric,

I am not sure that I understand what you are saying. Any date column that could potentially be NULL would probably not be a good candidate for a uniqueness constraint.

Can you give an example of what you mean?


Jul 11, 2009 at 3:56 PM // reply »
3 Comments

@Ben,

I'm not talking about dates in particular, I'm talking about NULL
in general.

For example: say you have a blog/article/page/... that can have a comment
thread but doesn't necessarily need one. In the database you
will have "blog.comment_thread" FK'ed to "comment_thread.id" and
"blog.comment_thread" will allow NULL values. You could reverse the FK
but then you'd have problems if you want to attach comment threads to
other things. Either you use NULLs or you use a special reserved value
(i.e. a kludge) or you neglect referential integrity.

How about a tree structure that needs to be stored in an SQL
database? You're going to need NULLs for the parent and/or child pointers
or you're going to have to use a special reserved value (i.e. a kludge)
or you neglect referential integrity.

How about a simple user or user-group based ACL. You could use one ACL
table for per-user access and another table for per-group access but that
can be cumbersome to work with; you could also represent ACL entries as
(id,type) pairs but you'd lose integrity checks unless you had an external
sanity checker or CHECK constraint that knew how to interpret the pair.

Here's an example for a UNIQUE constraint. Say your customer records allow
each customer to have an SSN (or SIN or email address or driver's license
number) but don't require it. Furthermore, you want the SSN to be unique
to avoid duplicates and data corruption. If you're scared of NULL then you
have to ensure the "unique if there" constraint outside the database or
by writing a CHECK constraint (and if someone is afraid of NULL then what
are the chances that they're going to write a proper CHECK constraint?).

I think someone else already mentioned the significant difference between
zero and NULL in numeric columns.

The point is simple: use NULL where it makes sense. Going through a pile
of kludges, contortions, and justifications just to avoid using NULL is
foolish. Dogma helps no one.

Most of what I'm saying is about data integrity, referential integrity
in particular. Data integrity is very important: fixing broken code is
easy, fixing broken data is a nightmare.


Jul 18, 2009 at 4:43 PM // reply »
6,516 Comments

@Eric,

I'm not trying to say that NULL values should never be used. There are certainly times when they should and have to be used; a binary tree might be a great example of place where it should be used (although I'd probably argue that a parent node ID of zero would / should indicate no parent just as well as NULL would).

While at first glance, I can see how something like a column that must be unique requires NULL (assuming that NULL values can be repeated?); but, what happens when you want to start using soft-deletes where a table might have two records with the same SSN but have different "is_deleted" flags for soft delete? At that point, you can't have a unique column value.

Or, you could spread the uniqueness to SNN+is_deleted. But that doesn't make sense because you could have multiple delete rows with the same SSN+is_deleted value if they are all deleted.

But, to your point, I think we believe the same thing: use NULL where it makes sense. I agree completely. I think we just have different opinions about where NULLs make sense. I don't think they make sense in a lot of situations (not ALL, just a lot).


Jul 18, 2009 at 6:57 PM // reply »
1 Comments

I simply cannot agree in any way that there is fundamentally something wrong with allowing NULL values in a database.

"blank answers are not stored... but are instead converted to no answer."

"No answer" is a very valid (and in some situations, important logical) distinction between a blank value.

As far as conversion, that's up to you and your logic. If there is a value in a field, and a user blanks out the field and saves the record - it's up to you whether to convert that to null or put in an empty string (which would allow you to tell that it used to hold a value and no longer does, vs null meaning there has never been a value placed in that field).


Jul 18, 2009 at 7:01 PM // reply »
6,516 Comments

@Paul,

Exactly - value vs. NULL is different at the philosophical level. However, it is up to you as the developer and application architect to determine if that philosophical difference actually translates to any meaningful difference in the application.


Jul 18, 2009 at 10:27 PM // reply »
3 Comments

@Ben:

You said "There are certainly times when they should and have to be used; a binary tree might be a great example of place where it should be used (although I'd probably argue that a parent node ID of zero would / should indicate no parent just as well as NULL would)."

If you do it this way then you have to add a row with ID=0 or not use an FK and ignore referential integrity issues. Then you have this magical zero that does what NULL is supposed to do but it isn't NULL. Sounds like pointless complexity to me, pointless complexity that will only confuse anyone that is already comfortable and experienced with SQL.

If you were doing this sort of thing in C or C++ you would use "NULL" (or a raw 0 depending on local conventions), in perl you'd use "undef", in JavaScript you'd use "null", in python you'd use "None", etc. so why invent something special for SQL?

"NULL = NULL" is false so you can have as many NULLs as you want without violating a unique constraint. If you need a "soft delete" then you can NULL the offending column, dump the deleted-but-not-really data in another table, use a CHECK constraint instead of a UNIQUE constraint, or invent another pathological example :)

The next obvious question is "How many angels can dance on the head of a pin?" And after that we can move on to "which text editor is the best?".


Jul 19, 2009 at 1:17 AM // reply »
45 Comments

I personally believe that NULLs do have a purpose (which is not negated in this article), but should be used sparingly.

I say this for two reasons:
1. NULLs actually do take up space. In MS SQL, I believe they are stored in sets of 4 per byte per record; so 1 to 4 NULLs in a record equals 1 byte, 5 to 8 NULLs in a record equals 2 bytes, and so on.

2. As Ben points out previously, having NULLs everywhere leads to data with no integrity.

When it comes to data that is not supplied on a regular bases (eg: fax number, passport number), I personally choose to use the normalization method of placing these fields in there own table (eg: userPhone, userIdentification).

This approach makes your architecture easier to add to in the future. For example, say that you currently only hold one phone number for your users. What happens when you need to start storing home, work, fax and mobile numbers. Well, you could either tack on 3 extra fields to your user table or in your userPhone table with a composite key of userID and phoneType, you could add 3 more phone types.

With the soft-delete example, I am assuming that you are soft-deleting a record and not a column (like updating a CMS and keeping previous changes); can I recommend creating another table (perhaps in an archive database) that then has a date deleted field which is apart of the primary key/uniqueness constraint on the archive table. This keeps your original table clean and the transfer to the archive table can be automated with a bulk query-based insert via a delete trigger (the delete trigger can be in SQL or in your application).


Jul 19, 2009 at 11:50 AM // reply »
6,516 Comments

@Eric,

I see where our miscommunication is coming from. In my world, zero is never a foreign constraint. Therefore, anything that has a foreign key of zero is not pointing to anything. Therefore, referential integrity is fine, just slightly different than yours.

Also, I don't think that soft-deletes are bad examples? Are they? I use them all the time in applications.

@Eric, @Andrew,

If storing "soft delete" rows in an archive table is a best practice, I am all for it. I have not done that personally, but I am here to learn most definitely. Is that what you guys normally do?


Jul 19, 2009 at 6:29 PM // reply »
45 Comments

@Ben,

I use an archive database and delete triggers for soft deletes and have found that it has 3 prominent advantages, it is:

1. Still possible to keep referential integrity.
2. Easier for someone else to see what you are attempting to do.
3. Retrieving information can be achieved from 2 simple queries.


Jul 19, 2009 at 11:40 PM // reply »
6,516 Comments

@Andrew,

I can see, especially, that having a delete table would decrease the size of the primary tables and make the SELECT queries less complicated. I don't know very much of anything about triggers (I usually put all that in the ColdFusion code); perhaps it's time to bone up on my database architecture.


Jul 20, 2009 at 5:10 AM // reply »
45 Comments

@Ben,

Database triggers are like stored procedures that get called with some extra information on what has changed when an event occurs.

The only benefit that I can see with doing it in the database instead of in the code, is enforcing the data to be archived with every delete.

However, if you actually want to hard-delete; you have to either delete from both tables in separate transactions (implicit or explicit transactions) or temporarily disable the trigger before deleting.

I guess it comes down to where you want your business logic.


Aug 14, 2009 at 10:14 AM // reply »
5 Comments

Does anyone know how to set a default value using Alter Table sql.

ALTER TABLE addresses
add city varchar(64) default noCity

is not working:
Throws: ][ODBC Microsoft Access Driver] Syntax error in ALTER TABLE statement.

CF 8, Access DB
Thanks


Aug 14, 2009 at 11:09 AM // reply »
3 Comments

Your question is rather off-topic... however, you can't set the default value in the same ALTER statement as you create the column, as it's actually a constraint and not just a property of the column. Here is what you'd want to do after creating the column:

ALTER TABLE addresses ADD CONSTRAINT
DF_addresses_city DEFAULT 'noCity' FOR city

"DF_addresses_city" being the name for the new constraint, the rest should be self-explanatory.


Aug 14, 2009 at 1:03 PM // reply »
5 Comments

Thanks Alex, i think this may still be on-topic, as lot of developers will want to avoid Null fields after reading this blog. Knowing how to specify default values in SQL is surely the key?

Your explanation on how to create columns with default values is great thanks, and will be very useful.
David


Aug 25, 2009 at 5:44 PM // reply »
2 Comments

Dude, NULL is VALID. Not understanding how to code with NULL is not valid.

Consider a survey.

If the user fails to answer a question, do you store some arbitrary value to mean "unknown" or do you store NULL?

What if your survey_answers_received table has a foreign key field that points to the survey_answers_possible table to describe the answers per question? No arbitrary value is valid in this case, unless you put in an arbitrary "no answer provided" field in the parent table, but that's lame.

The point is, NULL means "Data was not provided". Any other arbitrary value set to mean that must be defined somewhere in the code base, which is itself ridiculous. My favorite is when people come up with multiple levels of "Data was not provided" ([-1, -2, -3], for example).

Intelligent coding means dealing intelligently with all possible inputs and outputs. NULL is a VALID input and a VALID output. These excuses are ridiculous: "I don't feel like dealing with it." "It's too inconvenient." "NULL doesn't feel like a safe thing to put in a record." "Cold Fusion doesn't treat NULL correctly."

In fact, that last excuse is an argument against Cold Fusion.


Aug 25, 2009 at 6:10 PM // reply »
2 Comments

Sorry to rant,

Way at the top you provided a lame example with middle name. You argue that you should store the empty string that a user input so you can't count the number of users who input an empty string. Should you also store a string of 150 spaces? Is trimming all spaces to store an empty string not the same as counting an empty string null?

Would you like to replicate a database full of nulls or a database full of empty strings? Which do you think will replicate faster?

The best argument against NULL is that there's no reason to store what you don't know. But, that argument does not suggest putting in arbitrary "unknown" values. That argument is for intelligently designing your schema to minimize the unknowns in the data.

(In fact, in my survey example, I wouldn't store NULL for unanswered, I would store nothing for unanswered.)


Aug 27, 2009 at 2:31 PM // reply »
1 Comments

I think there are plenty of reasons not to use NULL in a database.

For example, I don't use NULLS for the different name fields because if I want to create a query that concatenates first + middle + last, (like for a name drop down box) those names that have NULL as a middle name won't show up in the list.

Also using nulls often requires more overhead or extra SQL manipulation to work with. I either have to deal with them on the SQL side or do something like check for isNull and empty string in the code when I want to display information.

Also on forms where you want to insert data into the database, if I wanted to make my middle name null I would have to do something like this in .NET

Dim MiddleName As String = IIf(txtMiddle.Text.Trim = String.Empty, Nothing, txtMiddle.Text.Trim)

Insert(MiddleName)

When if I just allowed empty strings I could just do this:

Insert(txtMiddleName.text.trim)

The extra overhead on the GUI side is not acceptable to me.....people want fast applications so I cut out as much as possible.

Dates, Boolean, those types of fields make sense to me to use NULL however.

I think it all matters in what your application is supposed to do and what is most important to your users.


Sep 2, 2009 at 9:53 AM // reply »
6,516 Comments

@Brandon,

I have nothing against NULL values when the serve a purpose other than simply philosophical. That's all that I am really saying. If there is a business reason for your app to store NULL values, then go for it, there's nothing wrong with that.


Jim
Oct 27, 2009 at 1:50 PM // reply »
1 Comments

http://tkyte.blogspot.com/2009/10/back-from-oracle-openworld.html


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »
Nov 21, 2009 at 4:49 PM
Styling The ColdFusion 8 WriteToBrowser CFImage Output
Great work yet again Ben! Whilst I didn't use this whole code, I copied some of your regex code for a similar problem with the lack of an alt attribute and unescaped ampersands in CFIMAGE for Railo 3 ... read »
Nov 21, 2009 at 1:13 PM
My First ColdFusion Builder Extension - Encrypting And Decrypting CFM / CFC Files
@Ben, Because I am pedantic, I just want to make sure that everyone knows there is absolutely no encryption going on. There is only encoding and obfuscation. The cfencode tool only obfuscates your C ... read »
Nov 21, 2009 at 12:28 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jody I can't seem to get your code sample to work. If you are still having problems, try this code out and see if it gets you what you wanted. <!--- Comma delimited list with various duplicates ... read »
Nov 21, 2009 at 11:03 AM
Groovy Operator Overloading Does Not Work In The ColdFusion Context
Hi Ben, Thanks for this informative post. Now I am reading ur old posts too ... read »
Nov 21, 2009 at 10:56 AM
HostMySite.com Has The Best ColdFusion Hosting
@Mehul, Yes very nice people, however several downtimes per day which was not acceptable. Hence we had to move out. I am glad you are having good luck with them so far. ... read »