Relational model: Difference between revisions

Content deleted Content added
Rewrite most of "SQL and the relational model" to be streamlined and cited.
Remove discussions of SQL implementations from non-SQL sections. This article is about the theoretical model, and doesn't need to highlight every way in which one specific technology differs from the model.
Line 24:
 
=== Extensions ===
Some years after publication of his 1970 model, Codd proposed a [[three-valued logic]] (True, False, Missing/[[Null (SQL)|NULL]]) version of it to deal with missing information, and in his ''The Relational Model for Database Management Version 2'' (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version.<ref>{{cite book| first =Christopher J. | last = Date|title=Date on Database: Writings 2000–2006|date= 2006 |publisher= Apress|isbn= 978-1-59059-746-0|pages=329–41|chapter= 18. Why Three- and Four-Valued Logic Don't Work}}</ref> These have never been implemented, presumably because of their inherent complexity. SQL's NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.<ref>{{cite book | first =Christopher J. | last = Date|title=An Introduction to Database Systems| url =https://archive.org/details/introductiontoda0000date | url-access =registration |date=2004|publisher= Addison Wesley |isbn=978-0-321-19784-9|pages=[https://archive.org/details/introductiontoda0000date/page/592 592–97]|edition= 8}}</ref>
 
== Topics ==
The fundamental assumption behind a relational model is that all [[data]] is represented as mathematical ''n''-[[arity|ary]] '''[[Relation (database)|relation]]s''', an ''n''-ary relation being a [[subset]] of the [[Cartesian product]] of ''n'' domains. In the mathematical model, [[reasoning]] about such data is done in two-valued [[predicate logic]], meaning there are two possible [[evaluation]]s for each [[proposition]]: either ''true'' or ''false'' (and in particular no third value such as ''unknown'', or ''not applicable'', either of which are often associated with the concept of [[Null (SQL)|NULL]]). Data are operated upon by means of a [[relational calculus]] or [[relational algebra]], these being equivalent in [[expressive power (computer science)|expressive power]].
 
The relational model of data permits the database designer to create a consistent, logical representation of [[information]]. Consistency is achieved by including declared '''''[[constraint (database)|constraint]]s''''' in the database design, which is usually referred to as the ''logical schema''. The theory includes a process of [[database normalization]] whereby a design with certain desirable properties can be selected from a set of [[Logical equivalence|logically equivalent]] alternatives. The [[access plan]]s and other implementation and operation details are handled by the [[DBMS]] engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which [[performance tuning]] often requires changes to the logical model.
 
The basic relational building block is the [[Data ___domain|___domain]] or [[data type]], usually abbreviated nowadays to '''''type'''''. A ''[[tuple]]'' is an unordered [[set (mathematics)|set]] of '''''attribute values'''''. An [[Attribute (computing)|attribute]] is an unordered pair of '''''attribute name''''' and '''''type name'''''. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.
Line 36:
 
A relation is defined as a [[set (mathematics)|set]] of ''n''-tuples. In both mathematics and the relational database model, a set is an ''unordered'' collection of unique, non-duplicated items, although some DBMSs impose an order to their data. In mathematics, a [[tuple]] has an order, and allows for duplication. [[Edgar F. Codd|E.{{nbsp}}F. Codd]] originally defined tuples using this mathematical definition.<ref name= "codd1970"/> Later, it was one of [[Edgar F. Codd|E.{{nbsp}}F. Codd]]'s great insights that using attribute names instead of an ordering would be more convenient (in general) in a computer language based on relations {{Citation needed |date= February 2007}}. This insight is still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important consequence of this distinguishing feature is that in the relational model the [[Cartesian product]] becomes [[Commutative operation|commutative]].
 
A [[table (database)|table]] is an accepted visual representation of a relation; a tuple is similar to the concept of a ''[[row (database)|row]]''.
 
A ''[[relvar]]'' is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.
Line 50 ⟶ 48:
The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of the [[extension (predicate logic)|extension]] of some [[Predicate (logic)|predicate]], this being the set of true [[proposition]]s that can be formed by replacing each [[free variable]] in that predicate by a name (a term that designates something).
 
There is a [[bijection|one-to-one correspondence]] between the free variables of the predicate and the attribute names of the relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation, but which does not appear in the body is deemed to be false. This assumption is known as the [[closed world assumption]]: it is often violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding proposition is unknown. For example, the absence of the tuple ('John', 'Spanish') from a table of language skills cannot necessarily be taken as evidence that John does not speak Spanish.
 
For a formal exposition of these ideas, see the section [[Relational model#Set-theoretic formulation|Set-theoretic Formulation]], below.
 
=== Relational operations ===
Users (or programs) request data from a relational database by sending it a [[database query|query]] that is written in a special language, usually a dialect of SQL. In response to a query, the database returns a result set.
 
Often, data from multiple tables are combined into one, by doing a [[Join (SQL)|join]]. Conceptually, this is done by taking all possible combinations of rows (the [[Cartesian product]]), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("[[Query optimizer|optimize]]") queries to perform faster, using a variety of techniques.
 
There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (that lists the rows in one table that are not found in the other), intersect (that lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators&nbsp;– many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit [[relation (database)|relation]] values as attributes (relation-valued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators.
 
The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses.