Null (SQL): Difference between revisions

Content deleted Content added
m Reverted 2 edits by 172.59.185.141 (talk) to last revision by Kvng
I simplified most of the extra words, fixed some uppercase mistakes, and fixed punctuation.
Line 2:
[[File:Db null.png|300px|thumb|right|The Greek lowercase [[Omega|omega (ω)]] character is used to represent Null in [[database theory]].]]
 
In [[SQL]], '''null''' or '''NULL''' is a special marker used to indicate that a data value does not exist in the [[database]]. Introduced by the creator of the [[Relational model|relational]] database model, [[E.&nbsp;F. Codd]], SQL null serves to fulfilfulfill the requirement that all ''true relational database management systems ([[Relational database#RDBMS|RDBMS]])'' support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek [[omega]] (ω) symbol to represent null in [[database theory]]. In SQL, <code>NULL</code> is a [[List of SQL reserved words|reserved word]] used to identify this marker.
 
A null should not be confused with a value of [[0]]. A null indicates a lack of a value, which is not the same thing as a zero value. For example, consider the question "How many books does Adam own?" The answer may be "zero" (we ''know'' that he owns ''none'') or "null" (we ''do not know'' how many he owns). In a database table, the [[Column (database)|column]] reporting this answer would start out with no value (marked by null), and it would not be updated with the value zero until it is ascertained that Adam owns no books.
 
In SQL, null is a marker, not a value. This usage is quite different from most programming languages, where a [[Null pointer|null value]] of a reference means it is not pointing to any [[Object (computer science)|object]].
 
== History ==
E. F. Codd mentioned nulls as a method of representing missing data in the [[relational model]] in a 1975 paper in the ''FDT Bulletin of [[Association for Computing Machinery|ACM]]-[[SIGMOD]]''. Codd's paper that is most commonly cited in relation with the semantics of Null (as adopted in SQL) is his 1979 paper in the ''[[ACM Transactions on Database Systems]]'', in which he also introduced his [[Relational Model/Tasmania]], although much of the other proposals from the latter paper have remained obscure. Section 2.3 of his 1979 paper details the semantics of Null propagation in arithmetic operations as well as comparisons employing a [[Ternary logic|ternary (three-valued)]] logic when comparing to nulls; it also details the treatment of Nulls on other set operations (the latter issue still controversial today). In [[database theory]] circles, the original proposal of Codd (1975, 1979) is now referred to as "Codd tables".<ref name="Meyden"/> Codd later reinforced his requirement that all RDBMSs support Null to indicate missing data in a 1985 two-part article published in ''[[Computerworld]]'' magazine.<ref>{{cite journal|last=Codd|first=E.F.|title=Is Your Database Really Relational?|journal=[[Computerworld]]|date=October 14, 1985}}</ref><ref>{{cite journal|last=Codd|first=E.F.|title=Does Your DBMS Run By The Rules?|journal=[[Computerworld]]|date=October 21, 1985}}</ref>
 
The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in [[IBM System R]]. Although [[Don Chamberlin]] recognized nulls (alongside duplicate rows) as one of the most controversial features of SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form of system support for missing information, saving the programmer from many duplicative application-level checks (see [[semipredicate problem]]) while at the same time providing the database designer with the option not to use Nulls if they so desire; for example, in order to avoid well -known anomalies (discussed in the [[#missing-value semantics|semantics section]] of this article). Chamberlin also argued that besides providing some missing-value functionality, practical experience with Nulls also led to other language features whichthat rely on Nulls, like certain grouping constructs and outer joins. Finally, he argued that in practice Nulls also end up being used as a quick way to patch an existing [[database schema|schema]] when it needs to evolve beyond its original intent, coding not for missing but rather for inapplicable information; for example, a database that quickly needs to support electric cars while having a miles-per-gallon column.<ref name="Chamberlin1998">{{cite book|author1=Don Chamberlin|title=A Complete Guide to DB2 Universal Database|url=https://books.google.com/books?id=hb4zskzHrWYC&pg=PA28|year=1998|publisher=Morgan Kaufmann|isbn=978-1-55860-482-7|pages=28–32}}</ref>
 
Codd indicated in his 1990 book ''The Relational Model for Database Management, Version 2'' that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively.<ref name="isbn0201141922"/> Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' ___domain. It remains an active field of research though, with numerous papers still being published.
 
=== Challenges ===
Line 29:
}}.</ref> In the following example, multiplying 10 by Null results in Null:
 
<syntaxhighlight lang="sqlSQL">
10 * NULL -- Result is NULL
</syntaxhighlight>
Line 35:
This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception{{snd}} division by zero".<ref name="SQL2003-Part2-Sec626"/> Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following:
 
<syntaxhighlight lang="sqlSQL">
NULL / 0
</syntaxhighlight>
Line 53:
'Fish ' || NULL || 'Chips' -- Result is NULL
</syntaxhighlight>
This is not true for all database implementations. In an Oracle RDBMS, for example, NULL and the empty string are considered the same thing and therefore 'Fish ' || NULL || 'Chips' results in 'Fish Chips'.<ref>{{Cite web |date=2022-05-23 |title=Handle empty strings when migrating from Oracle to PostgreSQL {{!}} AWS Database Blog |url=https://aws.amazon.com/blogs/database/handle-empty-strings-when-migrating-from-oracle-to-postgresql/ |access-date=2023-12-30 |website=aws.amazon.com |language=en-US}}</ref>
 
== Comparisons with NULL and the three-valued logic (3VL) ==
Line 68:
}}</ref> The logical result of the expression below, which compares the value 10 to Null, is Unknown:
 
<syntaxhighlight lang="sqlSQL">
SELECT 10 = NULL -- Results in Unknown
</syntaxhighlight>
Line 74:
However, certain operations on Null can return values if the absent value is not relevant to the outcome of the operation. Consider the following example:
 
<syntaxhighlight lang="sqlSQL">
SELECT NULL OR TRUE -- Results in True
</syntaxhighlight>
Line 80:
In this case, the fact that the value on the left of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the left.
 
SQL implements three logical results, so SQL implementations must provide for a specialized [[Ternary logic|three-valued logic (3VL)]]. The rules governing SQL three-valued logic are shown in the tables below ('''p''' and '''q''' represent logical states)"<ref name="fourrules">{{cite journal | last = Coles, Michael | title = Four Rules for Nulls | journal = SQL Server Central | publisher = Red Gate Software | date = June 27, 2005 | url = http://www.sqlservercentral.com/columnists/mcoles/fourrulesfornulls.asp }}</ref> The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and Łukasiewicz three-valued logic (which differ in their definition of implication, however, SQL defines no such operation).<ref name="Klein">{{Cite book | last1 = Hans-Joachim | first1 = K. | chapter = Null Values in Relational Databases and Sure Information Answers | doi = 10.1007/3-540-36596-6_7 | title = Semantics in Databases. Second International Workshop Dagstuhl Castle, Germany, January 7–12, 2001. Revised Papers | series = Lecture Notes in Computer Science | volume = 2582 | pages = 119–138| year = 2003 | isbn = 978-3-540-00957-3 | chapter-url = http://www.is.informatik.uni-kiel.de/~hjk/sqlni.ps}}</ref>
 
{| class="wikitable"
Line 117:
SQL three-valued logic is encountered in [[Data Manipulation Language]] (DML) in comparison predicates of DML statements and queries. The <code>[[Where (SQL)|WHERE]]</code> clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by <code>[[Insert (SQL)|INSERT]]</code>, <code>[[Update (SQL)|UPDATE]]</code>, or <code>[[Delete (SQL)|DELETE]]</code> DML statements, and are discarded by <code>[[Select (SQL)|SELECT]]</code> queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls.<ref name="fourrules"/> The following simple example demonstrates this fallacy:
 
<syntaxhighlight lang="sqlSQL">
SELECT *
FROM t
Line 123:
</syntaxhighlight>
 
The example query above logically always returns zero rows because the comparison of the ''i'' column with Null always returns Unknown, even for those rows where ''i'' is Null. The Unknown result causes the <code>SELECT</code> statement to summarily discard each and every row. (However, in practice, some SQL tools will retrieve rows using a comparison with Null.)
 
=== Null-specific and 3VL-specific comparison predicates ===
Line 153:
The addition of IS UNKNOWN to the other operators of SQL's three-valued logic makes the SQL three-valued logic [[functionally complete]],<ref>C. J. Date, ''Relational database writings, 1991-1994'', Addison-Wesley, 1995, p. 371</ref> meaning its logical operators can express (in combination) any conceivable three-valued logical function.
 
On systems whichthat don't support the F571 feature, it is possible to emulate IS UNKNOWN ''p'' by going over every argument that could make the expression ''p'' Unknown and test those arguments with IS NULL or other NULL-specific functions, although this may be more cumbersome.
 
=== Law of the excluded fourth (in WHERE clauses) ===
Line 168:
</syntaxhighlight>
 
if the column x contains any Nulls; in that case, the second query would return some rows the first one does not return, namely all those in which x is Null. In classical two-valued logic, the law of the excluded middle would allow the simplification of the WHERE clause predicate, in fact its elimination. Attempting to apply the law of the excluded middle to SQL's 3VL is effectively a [[False dilemma|false dichotomy]]. The second query is actually equivalent with:
 
<syntaxhighlight lang="sql">