Null (SQL): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:49, 7 November 2024 edit 221locked (talk \| contribs) 5 edits m →When two nulls are equal: grouping, sorting, and some set operations: Minor corrections in English ← Previous edit		Latest revision as of 11:03, 4 May 2025 edit undo Moist hammer (talk \| contribs) Extended confirmed users 8,038 edits Punctuation, template & link corrections Tag: Visual edit
(2 intermediate revisions by 2 users not shown)
Line 2: [[File:Db null.png\|300px\|thumb\|right\|The Greek lowercase [[Omega\|omega (ω)]] character is used to represent Null in [[database theory]].]] In [[SQL]], '''null''' or '''NULL''' is a special marker used to indicate that a data value does not exist in the [[database]]. Introduced by the creator of the [[Relational model\|relational]] database model, [[E. F. Codd]], SQL null serves to fulfill the requirement that all ''true relational database management systems ([[Relational database#RDBMS\|RDBMS]])'' support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek [[omega]] (ω) symbol to represent null in [[database theory]]. In SQL, <code>NULL</code> is a [[List of SQL reserved words\|reserved word]] used to identify this marker. A null should not be confused with a value of [[0]]. A null indicates a lack of a value, which is not the same as a zero value. For example, consider the question "How many books does Adam own?" The answer may be "zero" (we ''know'' that he owns ''none'') or "null" (we ''do not know'' how many he owns). In a database table, the [[Column (database)\|column]] reporting this answer would start with no value (marked by null), and it would not be updated with the value zero until it is ascertained that Adam owns no books. Line 9: == History == E. F. Codd mentioned nulls as a method of representing missing data in the [[relational model]] in a 1975 paper in the ''FDT Bulletin of [[Association for Computing Machinery\|ACM]]-[[SIGMOD]]''. Codd's paper that is most commonly cited with the semantics of Null (as adopted in SQL) is his 1979 paper in the ''[[ACM Transactions on Database Systems]]'', in which he also introduced his [[Relational Model/Tasmania]], although much of the other proposals from the latter paper have remained obscure. Section 2.3 of his 1979 paper details the semantics of Null propagation in arithmetic operations as well as comparisons employing a [[Ternary logic\|ternary (three-valued)]] logic when comparing to nulls; it also details the treatment of Nulls on other set operations (the latter issue still controversial today). In [[database theory]] circles, the original proposal of Codd (1975, 1979) is now referred to as "Codd tables".<ref name="Meyden"/> Codd later reinforced his requirement that all RDBMSs support Null to indicate missing data in a 1985 two-part article published in ''[[Computerworld]]'' magazine.<ref>{{cite journal \|last=Codd \|first=E.F. \|author-link=Edgar F. Codd \|date=October 14, 1985 \|title=Is Your Database Really Relational? \|journal=[[Computerworld]]~~\|date=October 14, 1985~~}}</ref><ref>{{cite journal\|last=Codd\|first=E.F.\|title=Does Your DBMS Run By The Rules?\|journal=[[Computerworld]]\|date=October 21, 1985}}</ref> The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in [[IBM System R]]. Although [[Don Chamberlin]] recognized nulls (alongside duplicate rows) as one of the most controversial features of SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form of system support for missing information, saving the programmer from many duplicative application-level checks (see [[semipredicate problem]]) while at the same time providing the database designer with the option not to use Nulls if they so desire; for example, to avoid well-known anomalies (discussed in the [[#missing-value semantics\|semantics section]] of this article). Chamberlin also argued that besides providing some missing-value functionality, practical experience with Nulls also led to other language features that rely on Nulls, like certain grouping constructs and outer joins. Finally, he argued that in practice Nulls also end up being used as a quick way to patch an existing [[database schema\|schema]] when it needs to evolve beyond its original intent, coding not for missing but rather for inapplicable information; for example, a database that quickly needs to support electric cars while having a miles-per-gallon column.<ref name="Chamberlin1998">{{cite book \|~~author1~~last1=~~Don~~Chamberlin ~~Chamberlin~~\|~~title~~first=ADon ~~Complete Guide to DB2 Universal Database~~\|url=https://books.google.com/books?id=hb4zskzHrWYC&pg=PA28 \|~~year~~title=~~1998~~A Complete Guide to DB2 Universal Database \|publisher=Morgan Kaufmann \|year=1998 \|isbn=978-1-55860-482-7 \|pages=28–32}}</ref> Codd indicated in his 1990 book ''The Relational Model for Database Management, Version 2'' that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively.<ref name="isbn0201141922"/> Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' ___domain. It remains an active field of research though, with numerous papers still being published. === Challenges === Line 27: \| pages =Section 6.2.6: ''numeric value expressions'' \| no-pp =true }}.</ref> In the following example, multiplying 10 by Null results in Null: <syntaxhighlight lang="SQL"> Line 33: </syntaxhighlight> This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception{{snd}} division by zero".<ref name="SQL2003-Part2-Sec626"/> Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following: <syntaxhighlight lang="SQL"> Line 41: === String concatenation === String [[concatenation]] operations, which are common in SQL, also result in Null when one of the operands is Null.<ref name="SQL2003-Part2-Sec628"> {{cite book \|last=ISO/IEC \|title=ISO/IEC 9075-2:2003, "SQL/Foundation" \|publisher=ISO/IEC \|year=2003 \|pages=Section 6.2.8: ''string value expression'' \|no-pp=true}}</ref> The following example demonstrates the Null result returned by using Null with the SQL <code>\|\|</code> string concatenation operator. ~~{{cite book~~ ~~\| last =ISO/IEC~~ ~~\| title =ISO/IEC 9075-2:2003, "SQL/Foundation"~~ ~~\| publisher =ISO/IEC~~ ~~\| year =2003~~ ~~\| pages =Section 6.2.8: ''string value expression''~~ ~~\| no-pp =true~~ ~~}}</ref> The following example demonstrates the Null result returned by using Null with the SQL <code>\|\|</code> string concatenation operator.~~ <syntaxhighlight lang="sql"> Line 58 ⟶ 51: {{Further\|Three-valued logic}} Since Null is not a member of any [[data ___domain]], it is not considered a "value", but rather a marker (or placeholder) indicating the [[undefined value]]. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.<ref name="SQL2003-Part1-Sec442"> {{cite book \|last=ISO/IEC \|url=http://www.iso.org \|title=ISO/IEC 9075-1:2003, "SQL/Framework" \|publisher=ISO/IEC \|year=2003 \|pages=Section 4.4.2: ''The null value'' \|no-pp=true}}</ref> The logical result of the expression below, which compares the value 10 to Null, is Unknown: ~~{{cite book~~ ~~\| last =ISO/IEC~~ ~~\| title =ISO/IEC 9075-1:2003, "SQL/Framework"~~ ~~\| publisher =ISO/IEC~~ ~~\| year =2003~~ ~~\| pages =Section 4.4.2: ''The null value''~~ ~~\| url =http://www.iso.org~~ ~~\| no-pp =true~~ ~~}}</ref> The logical result of the expression below, which compares the value 10 to Null, is Unknown:~~ <syntaxhighlight lang="SQL"> Line 80 ⟶ 65: In this case, the fact that the value on the left of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the left. SQL implements three logical results, so SQL implementations must provide for a specialized [[Ternary logic\|three-valued logic (3VL)]]. The rules governing SQL three-valued logic are shown in the tables below ('''p''' and '''q''' represent logical states)"<ref name="fourrules">{{cite journal \| last = Coles, Michael \| title = Four Rules for Nulls \| journal = SQL Server Central \| publisher = Red Gate Software \| date = June 27, 2005 \| url = http://www.sqlservercentral.com/columnists/mcoles/fourrulesfornulls.asp }}</ref> The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and Łukasiewicz three-valued logic (which differ in their definition of implication,; however, SQL defines no such operation).<ref name="Klein">{{Cite book \| ~~last1~~ last= Hans-Joachim \| ~~first1~~ first= K. \| chapter = Null Values in Relational Databases and Sure Information Answers \| doi = 10.1007/3-540-36596-6_7 \| title = Semantics in Databases. Second International Workshop Dagstuhl Castle, Germany, January 7–12, 2001. Revised Papers \| series = Lecture Notes in Computer Science \| volume = 2582 \| pages = 119–138\| year = 2003 \| isbn = 978-3-540-00957-3 \| chapter-url = http://www.is.informatik.uni-kiel.de/~hjk/sqlni.ps}}</ref> {\| class="wikitable" Line 115 ⟶ 100: === Effect of Unknown in WHERE clauses === SQL three-valued logic is encountered in [[Data Manipulation Language]] (DML) in comparison predicates of DML statements and queries. The <code>~~[[Where (SQL)\|~~WHERE]]</code> clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by <code>[[Insert (SQL)\|INSERT]]</code>, <code>[[Update (SQL)\|UPDATE]]</code>, or <code>[[Delete (SQL)\|DELETE]]</code> DML statements, and are discarded by <code>[[Select (SQL)\|SELECT]]</code> queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls.<ref name="fourrules"/> The following simple example demonstrates this fallacy: <syntaxhighlight lang="SQL"> Line 123 ⟶ 108: </syntaxhighlight> The example query above logically always returns zero rows because the comparison of the ''i'' column with Null always returns Unknown, even for those rows where ''i'' is Null. The Unknown result causes the <code>SELECT</code> statement to summarily discard every row. (However, in practice, some SQL tools will retrieve rows using a comparison with Null.) === Null-specific and 3VL-specific comparison predicates === Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The <code>IS NULL</code> and <code>IS NOT NULL</code> predicates (which use a [[Reverse Polish notation\|postfix]] syntax) test whether data is, or is not, Null.<ref name="SQL2003-Part2-Sec87"> {{cite book \|last=ISO/IEC \|title=ISO/IEC 9075-2:2003, "SQL/Foundation" \|publisher=ISO/IEC \|year=2003 \|pages=Section 8.7: ''null predicate'' \|no-pp=true}}</ref> ~~{{cite book~~ ~~\| last =ISO/IEC~~ ~~\| title =ISO/IEC 9075-2:2003, "SQL/Foundation"~~ ~~\| publisher =ISO/IEC~~ ~~\| year =2003~~ ~~\| pages =Section 8.7: ''null predicate''~~ ~~\| no-pp =true~~ ~~}}</ref>~~ The SQL standard contains the optional feature F571 "Truth value tests" that introduces three additional logical unary operators (six in fact, if we count their negation, which is part of their syntax), also using postfix notation. They have the following truth tables:<ref>[[C.J. Date]] (2004), ''An introduction to database systems'', 8th ed., Pearson Education, p. 594</ref> {\| class="wikitable" Line 149 ⟶ 127: \|} The F571 feature is orthogonal to the presence of the [[#BOOLEAN data type\|Boolean datatype]] in SQL (discussed later in this article) and, despite syntactic similarities, F571 does not introduce Boolean or three-valued [[Literal (computer programming)\|literals]] in the language. The F571 feature was actually present in [[SQL92]],<ref name="MeltonSimon1993">{{cite book~~\|author1=Jim~~ ~~Melton~~\|~~author2~~last=~~Jim~~ Melton ~~Alan R. Simon~~\|~~title~~first=~~Understanding~~Jim ~~The New SQL: A Complete Guide~~\|url=https://books.google.com/books?id=ZOOMSTZ4T_QC&pg=PA145 \|~~year~~title=~~1993~~Understanding The New SQL: A Complete Guide \|last2=Simon \|first2=Alan R. \|publisher=Morgan Kaufmann \|year=1993 \|isbn=978-1-55860-245-8 \|pages=145–147}}</ref> well before the Boolean datatype was introduced to the standard in 1999. The F571 feature is implemented by few systems, however; PostgreSQL is one of those implementing it. The addition of IS UNKNOWN to the other operators of SQL's three-valued logic makes the SQL three-valued logic [[functionally complete]],<ref>C. J. Date, ''Relational database writings, 1991-1994'', Addison-Wesley, 1995, p. 371</ref> meaning its logical operators can express (in combination) any conceivable three-valued logical function. On systems that ~~don't~~do not support the F571 feature, it is possible to emulate IS UNKNOWN ''p'' by going over every argument that could make the expression ''p'' Unknown and test those arguments with IS NULL or other NULL-specific functions, although this may be more cumbersome. === Law of the excluded fourth (in WHERE clauses) === Line 243 ⟶ 221: ====IF statements in procedural extensions==== [[SQL/PSM]] (SQL Persistent Stored Modules) defines [[Procedural programming\|procedural]] extensions for SQL, such as the <code>[[Conditional (programming)\|IF]]</code> statement. However, the major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for looping and comparisons operate under Null comparison rules similar to those for DML statements and queries. The following code fragment, in ISO SQL standard format, demonstrates the use of Null 3VL in an <code>IF</code> statement. <syntaxhighlight lang="plpgsql"> Line 254 ⟶ 232: </syntaxhighlight> The <code>IF</code> statement performs actions only for those comparisons that evaluate to True. For statements that evaluate to False or Unknown, the <code>IF</code> statement passes control to the <code>[[Conditional (programming)\|ELSEIF]]</code> clause, and finally to the <code>[[Conditional (programming)\|ELSE]]</code> clause. The result of the code above will always be the message <code>'Result is Unknown'</code> since the comparisons with Null always evaluate to Unknown. == {{anchor\|missing-value semantics}} Analysis of SQL Null missing-value semantics == The groundbreaking work of [[Tomasz Imieliński\|T. Imieliński]] and [[Witold Lipski\|W. Lipski Jr.]] (1984)<ref name="JACM 1984">{{cite journal~~\|first1 =T.~~ \| ~~last1~~last=Imieliński \|first=T. \|author-link= Tomasz Imieliński \|last2=Lipski \|first2 = W. ~~\| last2~~Jr. \|author-link2=Witold Lipski ~~Jr.~~ \|date=1984 \|title=Incomplete information in relational databases \| journal=[[Journal of the ACM]] \|volume=31 \|issue=4~~\|date=~~ ~~1984~~\|pages=761–791 \| doi=10.1145/1634.1886\| \|s2cid=288040 \| doi-access=free }}</ref> provided a framework in which to evaluate the intended semantics of various proposals to implement missing-value semantics, that is referred to as [[Imieliński-Lipski Algebras]]. This section roughly follows chapter 19 of the "Alice" textbook.<ref>{{Cite book \|last=Abiteboul \|first=Serge \|author-link=Serge Abiteboul \|url=https://archive.org/details/foundationsofdat0000abit \|title=Foundations of Databases \|last2=Hull \|first2=Richard B. \|author2-link=Richard B. Hull \|last3=Vianu \|first3=Victor \|author3-link=Victor Vianu \|publisher=Addison-Wesley \|year=1995 \|isbn=978-0-201-53771-0 \|url-access=registration}}</ref> A similar presentation appears in the review of Ron van der Meyden, §10.4.<ref name="Meyden"/> ~~\|last1=Abiteboul~~ ~~\|first1=Serge~~ ~~\|author-link=Serge Abiteboul~~ ~~\|last2=Hull~~ ~~\|first2=Richard B.~~ ~~\|author2-link=Richard B. Hull~~ ~~\|last3=Vianu~~ ~~\|first3=Victor~~ ~~\|author3-link=Victor Vianu~~ ~~\|title=Foundations of Databases~~ ~~\|publisher=Addison-Wesley~~ ~~\|year=1995~~ ~~\|isbn=978-0-201-53771-0~~ ~~\|url=https://archive.org/details/foundationsofdat0000abit~~ ~~\|url-access=registration~~ ~~}}</ref> A similar presentation appears in the review of Ron van der Meyden, §10.4.<ref name="Meyden"/>~~ === In selections and projections: weak representation === Line 324 ⟶ 286: </syntaxhighlight> should include the possibility that a relation like EmpH22 may exist. However, Codd tables cannot represent the disjunction "result with possibly 0 or 1 rows". A device, mostly of theoretical interest, called [[conditional table]] (or c-table) can, however, represent such an answer: {\| class="wikitable" Line 335 ⟶ 297: \|} where the condition column is interpreted as the row ~~doesn't~~does not exist if the condition is false. It turns out that because the formulas in the condition column of a c-table can be arbitrary [[propositional logic]] formulas, an algorithm for the problem whether a c-table represents some concrete relation has a [[co-NP-complete]] complexity, thus is of little practical worth. A weaker notion of representation is therefore desirable. Imielinski and Lipski introduced the notion of ''weak representation'', which essentially allows (lifted) queries over a construct to return a representation only for ''sure'' information, i.e. if it's is valid for all "[[possible world]]" instantiations (models) of the construct. Concretely, a construct is a weak representation system if : <math> \bigcap\mathop{\mathrm{Models}}(\bar{q}(T)) = \bigcap \{ q(R)\,\| R \in \mathop{\mathrm{Models}}(T) \}</math> The right-hand side of the above equation is the ''sure'' information, i.e. information which can be certainly extracted from the database regardless of what values are used to replace Nulls in the database. In the example we considered above, it's is easy to see that the intersection of all possible models (i.e. the sure information) of the query selecting {{code\|2=sql\|1=WHERE Age = 22}} is actually empty because, for instance, the (unlifted) query returns no rows for the relation EmpH37. More generally, it was shown by Imielinski and Lipski that Codd tables are a weak representation system if the query language is restricted to projections, selections (and renaming of columns). However, as soon as we add either joins or unions to the query language, even this weak property is lost, as evidenced in the next section. === If joins or unions are considered: not even weak representation === Line 379 ⟶ 341: \|} Thus when unions are added to the query language, Codd tables are not even a weak representation system of missing information, meaning that queries over them ~~don't~~do not even report all ''sure'' information. It's is important to note here that semantics of UNION on Nulls, which are discussed in a later section, did not even come into play in this query. The "forgetful" nature of the two sub-queries was all that it took to guarantee that some sure information went unreported when the above query was run on the Codd table Emp. For [[natural join]]s, the example needed to show that sure information may be unreported by some query is slightly more complicated. Consider the table Line 431 ⟶ 393: ==Check constraints and foreign keys== The primary place in which SQL three-valued logic intersects with SQL [[Data Definition Language]] (DDL) is in the form of [[check constraint]]s. A check constraint placed on a column operates under a slightly different set of rules than those for the DML <code>WHERE</code> clause. While a DML <code>WHERE</code> clause must evaluate to True for a row, a check constraint must not evaluate to False. (From a logic perspective, the [[designated value]]s are True and Unknown.) This means that a check constraint will succeed if the result of the check is either True or Unknown. The following example table with a check constraint will prohibit any integer values from being inserted into column ''i'', but will allow Null to be inserted since the result of the check will always evaluate to Unknown for Nulls.<ref name="nullversusnull">{{cite journal \| last = Coles, Michael \| title = Null Versus Null? \| journal = SQL Server Central \| publisher = Red Gate Software \|date=February 26, 2007 \| url = http://www.sqlservercentral.com/columnists/mcoles/2829.asp}}</ref> <syntaxhighlight lang="sql"> Line 461 ⟶ 423: ==Outer joins== [[File:Sql query1.png\|270px\|thumb\|right\|Example [[SQL]] [[Join (SQL)\|outer join]] query with Null placeholders in the result set. The Null markers are represented by the word <code>NULL</code> in place of data in the results. Results are from [[Microsoft SQL Server]], as shown in SQL Server Management Studio.]]<!-- FAIR USE of Sql query1.png: see image description page at http://en.wikipedia.org/wiki/Image:Sql query1.png for rationale --> SQL [[Join (SQL)\|outer joins]], including left outer joins, right outer joins, and full outer joins, automatically produce Nulls as placeholders for missing values in related tables. For left outer joins, for instance, Nulls are produced in place of rows missing from the table appearing on the right-hand side of the <code>LEFT OUTER JOIN</code> operator. The following simple example uses two tables to demonstrate Null placeholder production in a left outer join. The first table ('''Employee''') contains employee ID numbers and names, while the second table ('''PhoneNumber''') contains related employee ID numbers and [[Telephone number\|phone numbers]], as shown below. Line 554 ⟶ 516: ==Aggregate functions== SQL defines [[aggregate function]]s to simplify server-side aggregate calculations on data. Except for the <code>COUNT()</code> function, all aggregate functions perform a Null-elimination step, so that Nulls are not included in the final result of the calculation.<ref name="SQL2003-Part2-Sec4154"> {{cite book \|last=ISO/IEC \|title=ISO/IEC 9075-2:2003, "SQL/Foundation" \|publisher=ISO/IEC \|year=2003 \|pages=Section 4.15.4: ''Aggregate functions'' \|no-pp=true}}</ref> ~~{{cite book~~ ~~\| last =ISO/IEC~~ ~~\| title =ISO/IEC 9075-2:2003, "SQL/Foundation"~~ ~~\| publisher =ISO/IEC~~ ~~\| year =2003~~ ~~\| pages =Section 4.15.4: ''Aggregate functions''~~ ~~\| no-pp =true~~ ~~}}</ref>~~ Note that the elimination of Null is not equivalent to replacing Null with zero. For example, in the following table, <code>AVG(i)</code> (the average of the values of <code>i</code>) will give a different result from that of <code>AVG(j)</code>: Line 583 ⟶ 538: \|} Here <code>AVG(i)</code> is 200 (the average of 150, 200, and 250), while <code>AVG(j)</code> is 150 (the average of 150, 200, 250, and 0). A well-known side effect of this is that in SQL, <code>AVG(z)</code> is not equivalent with <code>SUM(z)/COUNT()</code> but with <code>SUM(z)/COUNT(z)</code>.<ref name="Chamberlin1998"/> The output of an aggregate function can also be Null. Here is an example: Line 606 ⟶ 561: ==When two nulls are equal: grouping, sorting, and some set operations== Because [[SQL:2003]] defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct".<ref name="SQL2003-Part2-Sec3168">{{cite book \|last=ISO/IEC \|title=ISO/IEC 9075-2:2003, "SQL/Foundation" \|publisher=ISO/IEC \|year=2003 \|pages=Section 3.1.6.8: ''Definitions: distinct'' \|no-pp=true}}</ref> This definition of ''not distinct'' allows SQL to group and sort Nulls when the <code>GROUP BY</code> clause (or another SQL language feature that performs grouping) is used. ~~\| last =ISO/IEC~~ ~~\| title =ISO/IEC 9075-2:2003, "SQL/Foundation"~~ ~~\| publisher =ISO/IEC~~ ~~\| year =2003~~ ~~\| pages =Section 3.1.6.8: ''Definitions: distinct''~~ ~~\| no-pp =true~~ ~~}}</ref> This definition of ''not distinct'' allows SQL to group and sort Nulls when the <code>GROUP BY</code> clause (or another SQL language feature that performs grouping) is used.~~ Other SQL operations, clauses, and keywords using the "not distinct" definition in their treatment of Nulls include: Line 623 ⟶ 571: The principle that Nulls are not equal to each other (but rather that the result is Unknown) is effectively violated in the SQL specification for the <code>UNION</code> operator, which does identify nulls with each other.<ref name="Meyden"/> Consequently, some set operations in SQL, such as union and difference, may produce results not representing sure information, unlike operations involving explicit comparisons with NULL (e.g. those in a <code>WHERE</code> clause discussed above). In Codd's 1979 proposal (which was adopted by SQL92) this semantic inconsistency is rationalized by arguing that removal of duplicates in set operations happens "at a lower level of detail than equality testing in the evaluation of retrieval operations."<ref name="Klein"/> The SQL standard does not explicitly define a default sorting order for Nulls. Instead, on conforming systems, Nulls can be sorted before or after all data values by using the <code>NULLS FIRST</code> or <code>NULLS LAST</code> clauses of the <code>ORDER BY</code> list, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for Null sorting in the DBMS.<ref name="nullversusnull"/> ==Effect on index operation== Some SQL products do not index keys containing NULLs. For instance, [[PostgreSQL]] versions prior to 8.3 did not, with the documentation for a [[B-tree]] index stating that<ref>{{cite web\| url=http://www.postgresql.org/docs/8.0/interactive/indexes-types.html\| title=PostgreSQL 8.0.14 Documentation: Index Types \|publisher=PostgreSQL \|access-date= 6 November 2008}}</ref> {{quote\| B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < ≤ {{=}} ≥ > Line 633 ⟶ 581: }} In cases where the index enforces uniqueness, NULLs are excluded from the index and uniqueness is not enforced between NULLs. Again, quoting from the [[PostgreSQL]] documentation:<ref>{{cite web\| url=http://www.postgresql.org/docs/8.0/interactive/indexes-unique.html\| title=PostgreSQL 8.0.14 Documentation: Unique Indexes \|publisher=PostgreSQL \|access-date=November 6, 2008}}</ref> {{quote\| When an index is declared unique, multiple table rows with equal indexed values will not be allowed. Nulls are not considered equal. A multicolumn unique index will only reject cases where all of the indexed columns are equal in two rows. Line 649 ⟶ 597: ==Null-handling functions== SQL defines two functions to explicitly handle Nulls: <code>NULLIF</code> and <code>COALESCE</code>. Both functions are abbreviations for [[Case (SQL)\|searched <code>CASE</code> expressions]].<ref name="SQL2003-Part2-Sec611"> {{cite book \|last=ISO/IEC \|title=ISO/IEC 9075-2:2003, "SQL/Foundation" \|publisher=ISO/IEC \|year=2003 \|pages=Section 6.11: ''case expression'' \|no-pp=true}}</ref> ~~{{cite book~~ ~~\| last =ISO/IEC~~ ~~\| title =ISO/IEC 9075-2:2003, "SQL/Foundation"~~ ~~\| publisher =ISO/IEC~~ ~~\| year =2003~~ ~~\| pages =Section 6.11: ''case expression''~~ ~~\| no-pp =true~~ ~~}}</ref>~~ === NULLIF === The <code>NULLIF</code> function accepts two parameters. If the first parameter is equal to the second parameter, <code>NULLIF</code> returns Null. Otherwise, the value of the first parameter is returned. <syntaxhighlight lang="sql"> Line 688 ⟶ 629: </syntaxhighlight> Some SQL DBMSs implement vendor-specific functions similar to <code>COALESCE</code>. Some systems (e.g. [[Transact-SQL]]) implement an <code>ISNULL</code> function, or other similar functions that are functionally similar to <code>COALESCE</code>. (See [[Is functions\|<code>Is</code> functions]] for more on the <code>IS</code> functions in Transact-SQL.) === NVL === {{Redirect\|NVL}} The Oracle <code>NVL</code> function accepts two parameters. It returns the first non-NULL parameter or NULL if all parameters are NULL. A <code>COALESCE</code> expression can be converted into an equivalent <code>NVL</code> expression thus: Line 705 ⟶ 646: </syntaxhighlight> A use case of this function is to replace in an expression a NULL by a value like in <code>NVL(SALARY, 0)</code> which says, 'if <code>SALARY</code> is NULL, replace it with the value 0'. There is, however, one notable exception. In most implementations, <code>COALESCE</code> evaluates its parameters until it reaches the first non-NULL one, while <code>NVL</code> evaluates all of its parameters. This is important for several reasons. A parameter ''after'' the first non-NULL parameter could be a function, which could either be computationally expensive, invalid, or could create unexpected side effects. Line 711 ⟶ 652: == Data typing of Null and Unknown == {{See also\|Option type\|Nullable type}} The <code>NULL</code> [[Literal (computer programming)\|literal]] is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific [[data type]].<ref name="Understanding1999">{{cite book \|~~author1~~last=~~Jim~~Melton ~~Melton~~\|~~author-link~~first=Jim ~~Melton~~\|~~author2~~url=~~Alan R~~https://archive.org/details/sql1999understan0000melt/page/53 ~~Simon~~\|title=SQL:1999: Understanding Relational Language Components \|~~year~~last2=~~2002~~Simon \|first2=Alan R. \|publisher=Morgan Kaufmann \|year=2002 \|isbn=978-1-55860-456-8 \|page=[https://archive.org/details/sql1999understan0000melt/page/53 53]~~\|url=https://archive.org/details/sql1999understan0000melt/page/53~~}}</ref> Because of this, it is sometimes mandatory (or desirable) to explicitly convert Nulls to a specific data type. For instance, if [[Function overloading\|overloaded]] functions are supported by the RDBMS, SQL might not be able to automatically resolve to the correct function without knowing the data types of all parameters, including those for which Null is passed. Conversion from the <code>NULL</code> literal to a Null of a specific type is possible using the <code>CAST</code> introduced in [[SQL-92]]. For example: Line 730 ⟶ 671: == BOOLEAN data type == The ISO [[SQL:1999]] standard introduced the BOOLEAN data type to SQL,; however, it's is still just an optional, non-core feature, coded T031.<ref name="ISO-9075-1">{{cite web \|title=ISO/IEC 9075-1:1999 SQL Standard \|year=1999 \|publisher=ISO}}</ref> When restricted by a <code>NOT NULL</code> constraint, the SQL BOOLEAN works like the [[Boolean type]] from other languages. Unrestricted, however, the BOOLEAN datatype, despite its name, can hold the truth values TRUE, FALSE, and UNKNOWN, all of which are defined as Boolean literals according to the standard. The standard also asserts that NULL and UNKNOWN "may be used interchangeably to mean exactly the same thing".<ref name="Date2011">{{cite book \|first=C. \|last=Date \|author-link=Christopher J. Date \|url=https://books.google.com/books?id=Ew06OZtjuJEC&pg=PA83 \|title=SQL and Relational Theory: How to Write Accurate SQL Code \|publisher=O'Reilly Media, Inc. \|year=2011 \|isbn=978-1-4493-1640-2 \|page=83}}</ref><ref>ISO/IEC 9075-2:2011 §4.5</ref> interchangeably to mean exactly the same thing".<ref name="Date2011">{{cite book\|author=C. Date\|title=SQL and Relational Theory: How to Write Accurate SQL Code\|url=https://books.google.com/books?id=Ew06OZtjuJEC&pg=PA83\|year=2011\|publisher=O'Reilly Media, Inc.\|isbn=978-1-4493-1640-2\|page=83}}</ref><ref>ISO/IEC 9075-2:2011 §4.5</ref> The Boolean type has been subject of criticism, particularly because of the mandated behavior of the UNKNOWN literal, which is never equal to itself because of the identification with NULL.<ref name="Prigmore2007">{{cite book \|~~author~~last=~~Martyn~~Prigmore ~~Prigmore~~\|~~title~~first=~~Introduction~~Martyn ~~to Databases With Web Applications~~\|url=https://books.google.com/books?id=PKggKqIZnN0C&pg=PA197 \|~~year~~title=~~2007~~Introduction to Databases With Web Applications \|publisher=Pearson Education Canada \|year=2007 \|isbn=978-0-321-26359-9 \|page=197}}</ref> As discussed above, in the [[PostgreSQL]] implementation of [[SQL]], Null is used to represent all UNKNOWN results, including the UNKNOWN BOOLEAN. PostgreSQL does not implement the UNKNOWN literal (although it does implement the IS UNKNOWN operator, which is an orthogonal feature.) Most other major vendors do not support the Boolean type (as defined in T031) as of 2012.<ref>Troels Arvin, [http://troels.arvin.dk/db/rdbms/#data_types-boolean Survey of BOOLEAN data type implementation]</ref> The procedural part of Oracle's [[PL/SQL]], however, supports BOOLEAN variables; these can also be assigned NULL and the value is considered the same as UNKNOWN.<ref name="FeuersteinPribyl2009">{{cite book \|~~author1~~last1=~~Steven~~Feuerstein ~~Feuerstein~~\|~~author2~~first=~~Bill~~Steven ~~Pribyl~~\|title=Oracle PL/SQL Programming \|~~year~~last2=~~2009~~Pribyl \|first2=Bill \|publisher=O'Reilly Media, Inc. \|year=2009 \|isbn=978-0-596-51446-4 \|pages=74, 91}}</ref> ==Controversy== ===Common mistakes=== Misunderstanding of how Null works is the cause of a great number of errors in SQL code, both in ISO standard SQL statements and in the specific SQL dialects supported by real-world database management systems. These mistakes are usually the result of confusion between Null and either 0 (zero) or an empty string (a string value with a length of zero, represented in SQL as <code><nowiki>''</nowiki></code>). Null is defined by the SQL standard as different from both an empty string and the numerical value <code>0</code>, however. While Null indicates the absence of any value, the empty string and numerical zero both represent actual values. A classic error is the attempt to use the equals operator <code>=</code> in combination with the keyword <code>NULL</code> to find rows with Nulls. According to the SQL standard this is an invalid syntax and shall lead to an error message or an exception. But most implementations accept the syntax and evaluate such expressions to <code>UNKNOWN</code>. The consequence is that no rows are ~~found{{snd}} regardless~~found—regardless of whether rows with Nulls exist or not. The proposed way to retrieve rows with Nulls is the use of the predicate <code>IS NULL</code> instead of <code>= NULL</code>. <syntaxhighlight lang="sql"> Line 751 ⟶ 691: </syntaxhighlight> In a related, but more subtle example, a <code>WHERE</code> clause or conditional statement might compare a column's value with a constant. It is often incorrectly assumed that a missing value would be "less than" or "not equal to" a constant if that field contains Null, but, in fact, such expressions return Unknown. An example is below: <syntaxhighlight lang="sql"> Line 762 ⟶ 702: These confusions arise because the [[Law of Identity]] is restricted in SQL's logic. When dealing with equality comparisons using the <code>NULL</code> literal or the <code>UNKNOWN</code> truth-value, SQL will always return <code>UNKNOWN</code> as the result of the expression. This is a [[partial equivalence relation]] and makes SQL an example of a ''Non-Reflexive logic''.<ref>{{citation\|title=Classical Logic or Non-Reflexive Logic? A case of Semantic Underdetermination\|journal=Revista Portuguesa de Filosofia\|volume=68\|issue=1/2\|last=Arenhart, Krause\|year=2012\|pages=73–86\|jstor=41955624\|doi=10.17990/RPF/2012_68_1_0073}}.</ref> Similarly, Nulls are often confused with empty strings. Consider the <code>LENGTH</code> function, which returns the number of characters in a string. When a Null is passed into this function, the function returns Null. This can lead to unexpected results, if users are not well versed in 3-value logic. An example is below: <syntaxhighlight lang="sql"> Line 773 ⟶ 713: ===Criticisms=== The ISO SQL implementation of Null is the subject of criticism, debate and calls for change. In ''The Relational Model for Database Management: Version 2'', Codd suggested that the SQL implementation of Null was flawed and should be replaced by two distinct Null-type markers. The markers he proposed were to stand for ''"Missing but Applicable"'' and ''"Missing but Inapplicable"'', known as ''A-values'' and ''I-values'', respectively. Codd's recommendation, if accepted, would have required the implementation of a four-valued logic in SQL.<ref name="isbn0201141922">{{cite book\|last=Codd\|first=E.F.\|year=1990\|title=The Relational Model for Database Management\|edition=Version 2\|publisher=[[Pearson PLC\|Addison Wesley Publishing Company]]\|isbn=978-0-201-14192-4}}</ref> Others have suggested adding additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be "Missing", increasing the complexity of SQL's logic system. At various times, proposals have also been put forth to implement multiple user-defined Null markers in SQL. Because of the complexity of the Null-handling and logic systems required to support multiple Null markers, none of these proposals have gained widespread acceptance. [[Chris Date]] and [[Hugh Darwen]], authors of ''The Third Manifesto'', have suggested that the SQL Null implementation is inherently flawed and should be eliminated altogether,<ref name="3rdmanifesto"> {{cite web \|last=Darwen \|first=Hugh \|author-link=Hugh Darwen \|last2=Date \|first2=Chris \|author-link2=Christopher J. Date \|title=The Third Manifesto \|url=http://www.thethirdmanifesto.com/ \|access-date=May 29, 2007 \|website=thethirdmanifesto.com}}</ref> pointing to inconsistencies and flaws in the implementation of SQL Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be removed from the relational model.<ref name="askew-wall"> ~~{{cite web~~ {{cite web \|last=Darwen \|first=Hugh \|author-link=Hugh Darwen \|title=The Askew Wall \|url=http://www.dcs.warwick.ac.uk/~hugh/TTM/TTM-TheAskewWall-printable.pdf \|access-date=May 29, 2007 \|website=dcs.warwick.ac.uk}}</ref> Others, like author [[Fabian Pascal]], have stated a belief that "how the function calculation should treat missing values is not governed by the relational model."{{citation needed\|date=November 2012}} ~~\| last =Darwen~~ ~~\| first =Hugh~~ ~~\|author2=Chris Date~~ ~~\| title =The Third Manifesto~~ ~~\| url =http://www.thethirdmanifesto.com/~~ ~~\| access-date = May 29, 2007~~ }}</ref> pointing to inconsistencies and flaws in the implementation of SQL Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be removed from the relational model.<ref name="askew-wall"> ~~{{cite web~~ ~~\| last =Darwen~~ ~~\| first =Hugh~~ ~~\| title =The Askew Wall~~ ~~\| url =http://www.dcs.warwick.ac.uk/~hugh/TTM/TTM-TheAskewWall-printable.pdf~~ ~~\| access-date = May 29, 2007~~ }}</ref> Others, like author [[Fabian Pascal]], have stated a belief that "how the function calculation should treat missing values is not governed by the relational model."{{citation needed\|date=November 2012}} ===Closed-world assumption=== Another point of conflict concerning Nulls is that they violate the [[closed-world assumption]] model of relational databases by introducing an [[open-world assumption]] into it.<ref name="isbn0596100124">{{cite book \|last=Date \|first=Chris \|author-link=Christopher J. Date \|title=Database in Depth: Relational Theory for Practitioners \|date=May 2005 \|publisher=O'Reilly Media, Inc. \|isbn=978-0-596-10012-4 \|page=73}}</ref> The closed world assumption, as it pertains to databases, states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false."<ref name="cwa">{{cite web \|last=Date \|first=Chris \|author-link=Christopher J. Date \|title=Abstract: The Closed World Assumption \|url=http://www.sfdama.org/ChrisDate_20070110.htm \|archive-url=https://web.archive.org/web/20070519134146/http://www.sfdama.org/ChrisDate_20070110.htm \|archive-date=2007-05-19 \|access-date=May 29, 2007 \|publisher=[[Data Management Association]], San Francisco Bay Area Chapter}}</ref> This view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under the open world assumption, in which some items stored in the database are considered unknown, making the database's stored knowledge of the world incomplete. Another point of conflict concerning Nulls is that they violate the [[closed-world assumption]] model of relational databases by introducing an [[open-world assumption]] into it.<ref name="isbn0596100124">{{cite book ~~\| last =Date~~ ~~\| first =Chris~~ ~~\| title =Database in Depth: Relational Theory for Practitioners~~ ~~\| publisher =O'Reilly Media, Inc.~~ ~~\|date=May 2005~~ ~~\| isbn = 978-0-596-10012-4~~ ~~\| page = 73~~ }}</ref> The closed world assumption, as it pertains to databases, states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false."<ref name="cwa">{{cite web ~~\| last =Date~~ ~~\| first =Chris~~ ~~\| title =Abstract: The Closed World Assumption~~ ~~\| publisher =[[Data Management Association]], San Francisco Bay Area Chapter~~ ~~\| url =http://www.sfdama.org/ChrisDate_20070110.htm~~ ~~\| access-date = May 29, 2007~~ ~~\| archive-url = https://web.archive.org/web/20070519134146/http://www.sfdama.org/ChrisDate_20070110.htm~~ ~~\| archive-date = 2007-05-19~~ }}</ref> This view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under the open world assumption, in which some items stored in the database are considered unknown, making the database's stored knowledge of the world incomplete. ==See also== Line 821 ⟶ 731: * [[Check constraint]] * [[Relational Model/Tasmania]] * [[~~RDBMS\|~~Relational database management system]] * [[Join (SQL)]] Line 829 ⟶ 739: ==Further reading== * E. F. Codd. Understanding relations (installment #7). FDT Bulletin of ACM-SIGMOD, 7(3-4):23–28, 1975. * {{Cite journal \| ~~last1~~ last= Codd \| ~~first1~~ first= E. F. \|author-link=Edgar ~~title~~F. =Codd \|title=Extending the database relational model to capture more meaning \| doi = 10.1145/320107.320109 \| journal = ACM Transactions on Database Systems \| volume = 4 \| issue = 4 \| pages = 397–434 \| year = 1979 \| citeseerx = 10.1.1.508.5701 \| s2cid = 17517212 }} Especially §2.3. * {{cite book \|author-link=C.Christopher J. Date \|last=Date \|first=C. J. \|year=2000 \|title=The Database Relational Model: A Retrospective Review and Analysis: A Historical Account and Assessment of E. F. Codd's Contribution to the Field of Database Technology \|publisher=[[Pearson PLC\|Addison Wesley Longman]] \|isbn=978-0-201-61294-3 \|url=https://archive.org/details/databaserelation00date}} * {{cite journal \| ~~last1~~ last= Klein \| ~~first1~~ first= Hans-Joachim \| year = 1994 \| title = How to modify SQL queries in order to guarantee sure answers \| ~~url = http://www.acm.org/sigmod/record/issues/9409/sql.ps \|~~ journal = ACM SIGMOD Record \| volume = 23 \| issue =3 3\| pages = 14–20 \| doi=10.1145/187436.187445 \| s2cid = 17354724 \| doi-access = free }} * Claude Rubinson, [http://www.u.arizona.edu/~rubinson/scrawl/Rubinson.2007.Nulls_Three-Valued_Logic_and_Ambiguity_in_SQL.pdf Nulls, Three-Valued Logic, and Ambiguity in SQL: Critiquing Date's Critique] {{Webarchive\|url=https://web.archive.org/web/20160305071445/http://www.u.arizona.edu/~rubinson/scrawl/Rubinson.2007.Nulls_Three-Valued_Logic_and_Ambiguity_in_SQL.pdf \|date=2016-03-05 }}, SIGMOD Record, December 2007 (Vol. 36, No. 4) * John Grant, [http://www09.sigmod.org/sigmod/record/issues/0809/p23.grant.pdf Null Values in SQL]. SIGMOD Record, September 2008 (Vol. 37, No. 3) * Waraporn, Narongrit, and Kriengkrai Porkaew. "[http://www.iaeng.org/IJCS/issues_v35/issue_3/IJCS_35_3_08.pdf Null semantics for subqueries and atomic predicates]". [[IAENG]] International Journal of Computer Science 35.3 (2008): 305-313. * {{cite journal \|~~author~~last=Thalheim \|first=Bernhard ~~Thalheim~~\|last2=Schewe \|~~author2~~first2=Klaus-Dieter ~~Schewe~~ \|title=NULL 'Value' Algebras and Logics \|journal=Frontiers in Artificial Intelligence and Applications \|volume=225 \|issue=Information Modelling and Knowledge Bases XXII \|url=https://zenodo.org/record/1038419 \|doi=10.3233/978-1-60750-690-4-354 \|year=2011}} * Enrico Franconi and Sergio Tessaris, [http://ceur-ws.org/Vol-866/paper8.pdf On the Logic of SQL Nulls], Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management, Ouro Preto, Brazil, June 27–30, 2012. pp. 114–128 ==External links==