Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

This page is user contributed documentation. See the bottom of the page for information about the author.  

A Brief Guide To NULLS

A Brief Guide to Nulls

author: Richard Huxton

revision: 1.0

date: 2003-03-14

Overview

This is a short guide to the use of nulls in SQL databases. It is written with Postgresql in mind but should be applicable to any SQL-based DBMS.

Thanks to the members of the psql-sql mailing list for their assistance in preparing this guide.

You can get further information in: Any good relational database book (try something written by Date or Pascal) Bruce's book My Postgresql Notes

PART I - INTRODUCTION

What is a null?

A null is not an empty string. A null is not a value like others. A null is the absence of a value

The SQL standard defines it as a "special value" but its behaviour is so different to all the other values of that type that I prefer to view it as an absence of a value. If you don't see the difference, just ignore it.

What do nulls mean?

Well, they should mean one of two things: 1. There is no applicable value 2. There is a value but it is unknown

Example 1: Imagine you have a customer table with name and sex fields.

If you get a new customer "ACME Widgets Ltd", the sex field is meaningless since your customer is a company (case 1).

A new customer "Jackie Smith" might be male or female but you might not know (case 2).

Actually, since you are trying to store a company in the ACME example that might indicate that you need to rethink your design.

Example 2: You have an address table with (street,city,county,postalcode) fields.

You might insert an address ("10 Downing Street","London",Null,"WC1 1AA") since you don't have a valid county.

You might also insert an address ("1 Any Street","Maidstone","Kent",Null) where there must be a valid postalcode, but you don't know what it is.

It might be useful to be able to distinguish between these two cases - not applicable and unknown, but there is only one option "Null" available to us, so we can't.

How do nulls work?

There is one very important rule when dealing with nulls. A null is unknown and thus not equal to, less than or greater than any value it is compared to.

Example: with the customer table above you could run the following queries:

 

SELECT FROM customer WHERE sex='M';

SELECT
FROM customer WHERE sex<>'M';

Now you might think this returns all customers, but it will miss those where sex is null. You've asked for all rows where the value of sex is M and all those with values not equal to M but not rows with no value at all

It might help to think of a database as a set of statements you know to be true. A null indicates that you cannot say anything at all about that field. You can't say what it is, you can't say what it isn't, you can only say there is some information missing.

So, to see all the customers with unknown or inapplicable sex you would need:

 

SELECT * FROM customer WHERE sex IS NULL;

Note that the following will not work, you need to use "IS NULL"

 

SELECT * FROM customer WHERE sex=NULL;

There are actually three possible results for a test in SQL - True (the test passed), False (the test failed) and Null (unknown or can't say). The table below indicates the result of using AND/OR operations on a,b for values of True,False and Null.

 

a | b | a AND b | a OR b

------+-------+---------+--------

TRUE | TRUE | TRUE | TRUE

TRUE | FALSE | FALSE | TRUE

TRUE | NULL | NULL | TRUE

FALSE | FALSE | FALSE | FALSE

FALSE | NULL | FALSE | NULL

NULL | NULL | NULL | NULL

In the example of a=True,b=Null, (a AND b) is Null (which gets treated as false for the purposes of WHERE clauses). However (a OR b) is True since if a is True, we don't care what b is.

If you try to perform an operation on nulls, again the result is always null. So the results of all of the following are null:

 

SELECT abc || null;

SELECT 1 + null;

SELECT sqrt(null::numeric);

The first case can be especially confusing. Concatenating Null to a text value will return null, not the original value. This can catch you out if you are joining first_name to last_name and one of them contains nulls.

How are nulls implemented?

You can think of each null-able field/column having a separate "is_null" flag attached to it. So, if you have a column "a" of type integer, in addition to space required to store the number, there is another bit which says whether the item is null and the value should be ignored. Of course, there are optimisations that get made, but that is the general idea.

PART II - IMPLICATIONS

Uniqueness and nulls

If you define a unique index on a column it prevents you inserting two values that are the same. It does not prevent you inserting as many nulls as you like. How could it? You don't have a value so it can't be the same as any other.

Example: We create a table "ta" with a unique constraint on column "b"

 

CREATE TABLE ta (

a int4,

b varchar(3),

PRIMARY KEY (a)

);

CREATE UNIQUE INDEX ta_b_idx ON ta (b);

INSERT INTO ta VALUES (1,
aaa); -- succeeds

INSERT INTO ta VALUES (2,bbb); -- succeeds

INSERT INTO ta VALUES (3,null); -- succeeds

INSERT INTO ta VALUES (4,bbb); -- fails

INSERT INTO ta VALUES (5,null); -- succeeds!

Given the definition of what a null is, you only have two choices: allow multiple nulls or allow no nulls. If you want no nulls, define the column as NOT NULL when creating the table.

Keys and nulls

No column that is part of a primary key can be null. When you define a PRIMARY KEY, none of the columns mentioned can take a null value. Postgresql makes sure of this by defining the columns as NOT NULL for you.

Example: With table "ta" we just created, \d ta will show column a as being not null. Otherwise, we could insert several rows with "a" set to null and have no way to tell them apart. If the primary key was defined as being over (a,b) then neither could be null.

Aggregates and nulls

You need to be careful using count() if a column can contain nulls. The count() function is defined as counting values and so skips nulls. The same applies to other aggregates like sum() or max() but these behave more intuitively.

Example: we have a table ta (a int4, b int4) with the following data.

 

a | b

---+----

1 | 10

2 | 20

3 |

SELECT count(*) as num_rows, count(a) as num_a, count(b) as num_b FROM ta;

num_rows | num_a | num_b

----------+-------+-------

3 | 3 | 2



SELECT sum(b) FROM ta;

sum

-----

30

If you were trying to calculate the average of column b then sum(b)/count(b) gives a different result from sum(b)/count(*).

Subqueries and nulls

You need to think carefully about how the above rules impact sub-queries, especially something like NOT IN.

Example: Assume we have a companies table and a diary table. Diary entries are usually related to a particular company but not always.

 

SELECT co_id,co_name FROM companies;

co_id | co_name

-------+--------------------------

1 | Worldwide Partnership UK

2 | British Associates PLC

3 | Global Enterprises INC

SELECT dy_id,dy_company FROM diary;

dy_id | dy_company

-------+------------

101 | 1

102 | 2

103 |



SELECT co_name FROM companies WHERE co_id IN (SELECT dy_company FROM diary);

co_name

--------------------------

Worldwide Partnership UK

British Associates PLC



SELECT co_name FROM companies WHERE co_id NOT IN (SELECT dy_company FROM diary);

co_name

---------

(0 rows)

What happened in this last case? Where did co_id=4 go to - it's certainly not in the diary. Actually, that's not true. Since there is a null in dy_company, we can't say for sure whether 3 is there or not - that null throws everything into a state of uncertainty.

Looking at what's happening in the case of co_id=3:

WHERE co_id NOT IN (SELECT dy_company FROM diary)

WHERE 3 NOT IN (1,2,Null)

WHERE NOT (3=1 OR 3=2 OR 3=Null)

WHERE NOT (False OR False OR Null)

WHERE NOT (Null)

WHERE Null

You can see that the OR-ing a Null with false values gives Null and Not(Null)=Null. We can't prove that co_id=3 isn't in the diary so we can't return that row.

PART III - Advice

A thought

Remember that advice is worth exactly what you paid for it.

Where to use nulls

Think carefully about whether a particular field/column should be able to contain nulls. If not define it as NOT NULL when creating the table.

In cases where a column holds an enumerated type (like the sex column previously) you can define your own values for not applicable and unknown. So rather than values of (M,F) you could allow values of (M,F,N,U). This makes it clear whether the value is NOT APPLICABLE or UNKNOWN.

For timestamps there is the special value "infinity" which is later than any other value (and "-infinity" which is earlier). Note that this only applies to timestamps not date types (as of v7.3.2).

If you have a numeric or free-text field then you can't define your own replacements for null (except in the case when you can restrict permissable values). It's also not necessarily a good idea. Using negative or very large numbers to represent unknown is one of the reasons null was invented.

If you have a text field, you need to decide what you store when the user leaves it blank while using your application. Do you treat that as null or an empty string? The answer, I'm afraid will depend on context but if a field can be null you'll need a way to set/display it differently from an empty string.

Nulls and normalisation

Using nulls to mean not applicable can indicate you haven't normalised correctly.

Example 1: In the company/diary example earlier, we got tripped up with a null in the dy_company column. We could avoid this by splitting into three tables: companies, diary and co_diary with the last being the relevant id's from the other two tables. This makes it simple to identify what companies have/don't have diary entries.

Example 2: Assume we have a bird_description table to record sightings by bird-spotters. You could structure it as (sighting_id, weight, wingspan, belly_colour, wing_colour, wingtip_colour, beak_colour, feading_behaviour, mating_behaviour, ...) but for any individual sighting most of these values will be null. An alternative would be to define the table as (sighting_id,aspect,value) so you could store ("S0137","wingspan","15cm") - this means you are only storing the information supplied and also makes it easier to add new aspects of a sighting.

Updated: 2006-11-19 20:59
Author: Robert Treat
Long time PostgreSQL Contributor, Co-Author of Begining PHP and PostgreSQL 8
Operations: Edit Create subpage (Requires community login)

Privacy Policy | Project hosted by hub.org | Designed by tinysofa
Copyright © 1996 – 2007 PostgreSQL Global Development Group