Relational model: Difference between revisions

Content deleted Content added
m History: Spelling/grammar correction
clean up, typo(s) fixed: For example → For example, (2) using AWB
Line 1:
The '''relational model''' ('''RM''') for [[database]] management is an approach to managing data using a structure and language consistent with [[first-order logic|first-order predicate logic]], first described in 1969 by [[Edgar F. Codd]].<ref>{{Citation | title = Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks | first = E.F | last = Codd | publisher = IBM | series = Research Report | year = 1969}}.</ref><ref name= "codd1970">{{cite journal|last= Codd |first= E.F |year=1970|title=A Relational Model of Data for Large Shared Data Banks | series = Classics | url= http://www.acm.org/classics/nov95/toc.html |journal=[[Communications of the ACM]] |volume= 13 |issue= 6 |pages= 377–87 |doi= 10.1145/362384.362685}}</ref> In the relational model of a database, all data is represented in terms of [[tuple]]s, grouped into [[relation (database) |relation]]s. A database organized in terms of the relational model is a [[relational database]].
 
[[File:Relational Model.svg|thumb|280px|Diagram of an example database according to the relational model<ref name="USDT01">{{Citation | format = [[Portable document format |PDF]] | url = http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf | title = Data Integration Glossary | place = [[United States of America |US]] | publisher = Department of Transportation | date = August 2001}}.</ref>]]
[[File:Relational key.png|thumb|280px|In the relational model, related records are linked together with a "key".]]
The purpose of the relational model is to provide a [[Declarative programming |declarative]] method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries.
 
Most relational databases use the [[SQL]] data definition and query language; these systems implement what can be regarded as an engineering approximation to the relational model. A ''table'' in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases [[#SQL and the relational model|deviate from the relational model in many details]], and Codd fiercely argued against deviations that compromise the original principles.<ref>{{Citation | first = E. F | last = Codd | title = The Relational Model for Database Management | publisher = Addison-Wesley | year = 1990 | ISBN = 0-201-14192-2 | pages=371-388371–388}}.</ref>
 
== Overview ==
The relational model's central idea is to describe a database as a collection of [[Predicate (mathematical logic)|predicate]]s over a finite set of predicate variables, describing [[constraint (database) |constraints]] on the possible values and combinations of values. The content of the database at any given time is a finite (logical) [[model (logic) |model]] of the database, i.e. a set of [[Relation (database)|relation]]s, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a [[database query]]) is also a predicate.
[[File:Relational model concepts.png|thumb|360px|Relational model concepts.]]
 
Line 15:
 
=== Implementation ===
There have been several attempts to produce a true implementation of the relational database model as originally defined by [[Edgar F. Codd|Codd]] and explained by [[Christopher J. Date |Date]], [[Hugh Darwen |Darwen]] and others, but none have been popular successes so far. {{As of |2015|10}} [[Rel (DBMS) |Rel]] is one of the more recent attempts to do this.
 
The relational model was the first database model to be described in formal mathematical terms. Hierarchical and network databases existed before relational databases, but their specifications were relatively informal. After the relational model was defined, there were many attempts to compare and contrast the different models, and this led to the emergence of more rigorous descriptions of the earlier models; though the procedural nature of the data manipulation interfaces for hierarchical and network databases limited the scope for formalization.<ref>{{Citation | title = Database Administration: High-impact Strategies – What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors | first = Kevin | last = Roebuck | publisher = Emereo | year = 2012 | ISBN = 978-1-74304877-1 | url = https://books.google.com/books?id=piUPBwAAQBAJ}}.</ref>{{Page needed |date=October 2015}}
Line 23:
 
=== Controversies ===
Codd himself, some years after publication of his 1970 model, proposed a [[three-valued logic]] (True, False, Missing/[[Null (SQL) |NULL]]) version of it to deal with missing information, and in his ''The Relational Model for Database Management Version 2'' (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version.<ref>{{cite book| first =C.J | last = Date|title=Date on Database: Writings 2000–2006|date= 2006 |publisher= Apress|isbn= 978-1-59059-746-0|pages=329–41|chapter= 18. Why Three- and Four-Valued Logic Don't Work}}</ref> But these have never been implemented, presumably because of attending complexity. SQL's NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.<ref>{{cite book | first =C.J | last = Date|title=An Introduction to Database Systems|date=2004|publisher= Addison Wesley |isbn=0-321-19784-4|pages=592–97|edition= 8}}</ref>
 
== Topics ==
The fundamental assumption of the relational model is that all [[data]] is represented as mathematical ''n''-[[arity|ary]] '''[[Relation (database)|relation]]s''', an ''n''-ary relation being a [[subset]] of the [[Cartesian product]] of ''n'' domains. In the mathematical model, [[reasoning]] about such data is done in two-valued [[predicate logic]], meaning there are two possible [[evaluation]]s for each [[proposition]]: either ''true'' or ''false'' (and in particular no third value such as ''unknown'', or ''not applicable'', either of which are often associated with the concept of [[Null (SQL)|NULL]]). Data are operated upon by means of a [[relational calculus]] or [[relational algebra]], these being equivalent in [[expressive power (computer science)|expressive power]].
 
The relational model of data permits the database designer to create a consistent, logical representation of [[information]]. Consistency is achieved by including declared '''''[[constraint (database)|constraint]]s''''' in the database design, which is usually referred to as the logical schema. The theory includes a process of [[database normalization]] whereby a design with certain desirable properties can be selected from a set of [[Logical equivalence |logically equivalent]] alternatives. The [[access plan]]s and other implementation and operation details are handled by the [[DBMS]] engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which [[performance tuning]] often requires changes to the logical model.
 
The basic relational building block is the [[Data ___domain|___domain]] or [[data type]], usually abbreviated nowadays to '''''type'''''. A ''[[tuple]]'' is an ordered [[set (mathematics) |set]] of '''''attribute values'''''. An [[Attribute (computing)|attribute]] is an ordered pair of '''''attribute name''''' and '''''type name'''''. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.
 
A relation consists of a '''''heading''''' and a '''''body'''''. A heading is a set of attributes. A body (of an ''n''-ary relation) is a set of ''n''-tuples. The heading of the relation is also the heading of each of its tuples.
 
A relation is defined as a [[set (mathematics) |set]] of ''n''-tuples. In both mathematics and the relational database model, a set is an ''unordered'' collection of unique, non-duplicated items, although some DBMSs impose an order to their data. In mathematics, a [[tuple]] has an order, and allows for duplication. [[Edgar F. Codd|E.{{nbsp}}F. Codd]] originally defined tuples using this mathematical definition.<ref name= "codd1970"/> Later, it was one of [[Edgar F. Codd |E.{{nbsp}}F. Codd]]'s great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations {{Citation needed |date= February 2007}}. This insight is still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important consequence of this distinguishing feature is that in the relational model the [[Cartesian product]] becomes [[Commutative operation |commutative]].
 
A [[table (database)|table]] is an accepted visual representation of a relation; a tuple is similar to the concept of a ''[[row (database) |row]]''.
 
A ''[[relvar]]'' is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.
Line 99:
 
== Examples ==
 
=== Database ===
An idealized, very simple example of a description of some [[relvar]]s ([[relation (database) |relation]] variables) and their attributes:
* Customer ('''<u>Customer ID</u>''', Tax ID, Name, Address, City, State, Zip, Phone, Email,Sex)
* Order ('''<u>Order No</u>''', <u>Customer ID</u>, <u>Invoice No</u>, Date Placed, Date Promised, Terms, Status)
Line 112 ⟶ 113:
Usually one [[candidate key]] is chosen to be called the [[primary key]] and used in [[preference]] over the other candidate keys, which are then called [[alternate key]]s.
 
A ''candidate key'' is a unique [[identifier]] enforcing that no [[tuple]] will be duplicated; this would make the [[Relation (database)|relation]] into something else, namely a [[bag (mathematics) |bag]], by violating the basic definition of a [[set (mathematics)|set]].{{Citation needed |date=October 2015}} Both foreign keys and superkeys (that includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.
 
=== Customer relation ===
Line 128 ⟶ 129:
|}
 
If we attempted to ''insert'' a new customer with the ID ''1234567890'', this would violate the design of the relvar since '''<u>Customer ID</u>''' is a ''primary key'' and we already have a customer ''1234567890''. The [[DBMS]] must reject a [[database transaction|transaction]] such as this that would render the [[database]] inconsistent by a violation of an [[Database integrity |integrity constraint]].
 
''[[Foreign key]]s'' are [[integrity constraint]]s enforcing that the [[Value (computer science) |value]] of the [[attribute set]] is drawn from a ''[[candidate key]]'' in another [[relation (database) |relation]]. For example, in the Order relation the attribute '''<u>Customer ID</u>''' is a foreign key. A ''[[Join (SQL)|join]]'' is the [[Operation (mathematics)|operation]] that draws on [[information]] from several relations at once. By joining relvars from the example above we could ''query'' the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a [[restriction condition]].
 
If we wanted to retrieve all of the Orders for Customer ''1234567890'', we could [[Information retrieval|query]] the database to return every row in the Order table with '''<u>Customer ID</u>''' ''1234567890'' and join the Order table to the Order Line table based on '''<u>Order No</u>'''.
 
There is a flaw in our [[database design]] above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words, there can be many Invoices per Order and many Orders per Invoice. This is a '''[[many-to-many (data model) |many-to-many]]''' relationship between Order and Invoice (also called a ''non-specific relationship''). To represent this relationship in the database a new relvar should be introduced whose [[role]] is to specify the correspondence between Orders and Invoices:
 
OrderInvoice ('''<u>Order No</u>''', '''<u>Invoice No</u>''')
Line 157 ⟶ 158:
: A relation is a tuple <math>(H, B)</math> with <math>H</math>, the header, and <math>B</math>, the body, a set of tuples that all have the ___domain <math>H</math>.
 
Such a relation closely corresponds to what is usually called the extension of a predicate in [[first-order logic]] except that here we identify the places in the predicate with attribute names. Usually in the relational model a [[logical schema|database schema]] is said to consist of a set of relation names, the headers that are associated with these names and the [[constraint (database) |constraints]] that should hold for every instance of the database schema.
 
; Relation universe
Line 249 ⟶ 250:
 
== Further reading ==
* {{cite book|last=Date|first=C. J.|title=Foundation for future database systems: the third manifesto; a detailed study of the impact of type theory on the relational model of data, including a comprehensive model of type inheritance|year= 2000|publisher= Addison-Wesley |___location= Reading, [[Massachusetts | MA]] |isbn=0-201-70928-7|edition= 2 | last2= Darwen | first2 = Hugh}}
* {{cite book|last=Date|first=C. J.|title= An Introduction to Database Systems|year= 2007 | author-mask = 3 | publisher= Pearson Education|___location=Boston |isbn=0-321-19784-4|edition= 8}}