Functional dependency: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:27, 3 April 2022 edit Vycl1994 (talk \| contribs) Autopatrolled, Extended confirmed users 130,118 edits No edit summary ← Previous edit		Latest revision as of 19:22, 10 August 2025 edit undo Turtlecrown (talk \| contribs) Extended confirmed users 4,939 edits fix typo Tags: Mobile edit Mobile web edit Advanced mobile edit
(24 intermediate revisions by 18 users not shown)
Line 1: {{Short description\|Relational database theory concept}} {{about\|a concept in relational database theory\|function dependencies in the Haskell programming language\|type class}} {{refimprove\|date=October 2012}} In [[relational database]] theory, a '''functional dependency''' ~~is a~~ ('''FD''') is [[Relational database#Constraints\|constraint]]~~'''~~ between two attribute sets, ofwhereby ~~attributes~~values in aone ~~[[Relation~~set (~~database~~the ''determinant'' set)~~\|relation]]~~ ~~from~~determine athe ~~database.~~values Inof the other ~~words,~~set a(the ''dependent'' set). A functional dependency isbetween a ~~constraint~~determinant ~~between~~set ~~two~~''X'' ~~attributes in~~and a ~~relation.~~dependent set ''Y'' can described as follows: Given a relation ''R'' and sets of attributes <math>X,Y \subseteq R</math>, ''X'' is said to '''functionally determine''' ''Y'' (written ''X'' → ''Y'') if and only if each ''X'' value in ''R'' is associated with precisely one ''Y'' value in ''R''; ''R'' is then said to ''satisfy'' the functional dependency ''X'' → ''Y''. Equivalently, the [[projection (relational algebra)\|projection]] <math>\Pi_{X,Y}R</math> is a [[Function (mathematics)\|function]], i.e. ''Y'' is a function of ''X''.<ref name="HalpinMorgan2008">{{cite book \|author1=Terry Halpin \|title=Information Modeling and Relational Databases \|url=https://books.google.com/books?id=puO_VlbR_x4C&pg=PA140 \|year=2008 \|publisher=Morgan Kaufmann \|isbn=978-0-12-373568-3 \|page=140 \|edition=2nd}}</ref><ref name="Date2012">{{cite book \|author=Chris Date \|title=Database Design and Relational Theory: Normal Forms and All That Jazz \|url=https://books.google.com/books?id=8jAGhpMSjAcC&pg=PA21 \|year=2012 \|publisher=O'Reilly Media, Inc. \|isbn=978-1-4493-2801-6 \|page=21}}</ref> In simple words, if the values for the ''X'' attributes are known (say they are ''x''), then the values for the ''Y'' attributes corresponding to ''x'' can be determined by looking them up in ''any'' [[Tuple#Relational model\|tuple]] of ''R'' containing ''x''. Customarily ''X'' is called the ''determinant'' set and ''Y'' the ''dependent'' set. A functional dependency FD: ''X'' → ''Y'' is called ''trivial'' if ''Y'' is a [[subset]] of ''X''.▼ ▲Given a [[Relation (database)\|relation]] ''R'' and attribute sets ~~of attributes <math>~~''X'',''Y'' <math>\subseteq R</math> ''R'', ''X'' is said to ~~'''~~functionally determine~~'''~~ ''Y'' (written ''X'' → ''Y'') ~~if and only~~ if each ''X'' value ~~in ''R''~~ is associated with precisely one ''Y'' value ~~in ''R'';~~. ''R'' is then said to ''satisfy'' the functional dependency ''X'' → ''Y''. Equivalently, the [[projection (relational algebra)\|projection]] <math>\Pi_{X,Y}R</math> is a [[Function (mathematics)\|function]], ~~i.e.~~that is, ''Y'' is a function of ''X''.<ref name="HalpinMorgan2008">{{cite book \|author1=Terry Halpin \|title=Information Modeling and Relational Databases \|url=https://books.google.com/books?id=puO_VlbR_x4C&pg=PA140 \|year=2008 \|publisher=Morgan Kaufmann \|isbn=978-0-12-373568-3 \|page=140 \|edition=2nd}}</ref><ref name="Date2012">{{cite book \|author=Chris Date \|title=Database Design and Relational Theory: Normal Forms and All That Jazz \|url=https://books.google.com/books?id=8jAGhpMSjAcC&pg=PA21 \|year=2012 \|publisher=O'Reilly Media, Inc. \|isbn=978-1-4493-2801-6 \|page=21}}</ref> In simple words, if the values for the ''X'' attributes are known (say they are ''x''), then the values for the ''Y'' attributes corresponding to ''x'' can be determined by looking them up in ''any'' [[Tuple#Relational model\|tuple]] of ''R'' containing ''x''. Customarily ''X'' is called the ''determinant'' set and ''Y'' the ''dependent'' set. A functional dependency FD: ''X'' → ''Y'' is called ''trivial'' if ''Y'' is a [[subset]] of ''X''. In other words, a dependency FD: ''X'' → ''Y'' means that the values of ''Y'' are determined by the values of ''X''. Two tuples sharing the same values of ''X'' will necessarily have the same values of ''Y''.▼ In other words: The determination of functional dependencies is an important part of designing databases in the [[relational model]], and in [[database normalization]] and [[denormalization]]. A simple application of functional dependencies is ''Heath's theorem''; it says that a relation ''R'' over an attribute set ''U'' and satisfying a functional dependency ''X'' → ''Y'' can be safely split in two relations having the [[Lossless-Join Decomposition\|lossless-join decomposition]] property, namely into <math>\Pi_{XY}(R)\bowtie\Pi_{XZ}(R) = R</math> where ''Z'' = ''U'' − ''XY'' are the rest of the attributes. ([[set union\|Union]]s of attribute sets are customarily denoted by mere juxtapositions in database theory.) An important notion in this context is a [[candidate key]], defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the [[attribute ___domain]]s, are selected so as to generate constraints that would exclude as much data inappropriate to the [[user ___domain]] from the system as possible.▼ * when ''X'' attributes have known values (here, ''x''), the values for their corresponding ''Y'' attibutes can be determined by looking them up in ''any'' [[Tuple#Relational model\|tuple]] of ''R'' containing ''x''. * two tuples sharing the same values of ''X'' will necessarily have the same values of ''Y''. ▲~~In other words, a~~A dependency FD: ''X'' → ''Y'' means that the values of ''Y'' are determined by the values of ''X''. ~~Two~~A ~~tuples~~functional ~~sharing~~dependency ~~the~~FD: ~~same~~''X'' ~~values of~~→ ''XY'' ~~will~~is ~~necessarily~~called ~~have~~''trivial'' ~~the~~if ~~same~~''Y'' ~~values~~is a [[subset]] of ''YX''. A notion of [[logical implication]] is defined for functional dependencies in the following way: a set of functional dependencies <math>\Sigma</math> logically implies another set of dependencies <math>\Gamma</math>, if any relation ''R'' satisfying all dependencies from <math>\Sigma</math> also satisfies all dependencies from <math>\Gamma</math>; this is usually written <math>\Sigma \models \Gamma</math>. The notion of logical implication for functional dependencies admits a [[soundness\|sound]] and [[completeness (logic)\|complete]] finite [[axiomatization]], known as ''Armstrong's axioms''.▼ ▲The determination of functional dependencies is an important part of designing databases in the [[relational model]], and in [[database normalization]] and [[denormalization]]. A simple application of functional dependencies is ''[[Heath's theorem'']]; it says that a relation ''R'' over an attribute set ''U'' and satisfying a functional dependency ''X'' → ''Y'' can be safely split in two relations having the [[Lossless-Join Decomposition\|lossless-join decomposition]] property, namely into <math>\Pi_{XY}(R)\bowtie\Pi_{XZ}(R) = R</math> where ''Z'' = ''U'' − ''XY'' are the rest of the attributes. ([[set union\|Union]]s of attribute sets are customarily denoted by ~~mere~~their juxtapositions in database theory.) An important notion in this context is a [[candidate key]], defined as a minimal set of attributes that functionally determine all of the attributes in a relation. The functional dependencies, along with the [[attribute ___domain]]s, are selected so as to generate constraints that would exclude as much data inappropriate to the [[user ___domain]] from the system as possible. ▲A notion of [[logical implication]] is defined for functional dependencies in the following way: a set of functional dependencies <math>\Sigma</math> logically implies another set of dependencies <math>\Gamma</math>, if any relation ''R'' satisfying all dependencies from <math>\Sigma</math> also satisfies all dependencies from <math>\Gamma</math>; this is usually written <math>\Sigma \models \Gamma</math>. The notion of logical implication for functional dependencies admits a [[soundness\|sound]] and [[completeness (logic)\|complete]] finite [[axiomatization]], known as ''[[Armstrong's axioms'']]. == Examples == Line 40 ⟶ 46: * StudentID → Semester. ~~Note that if~~If a row was added where the student had a different value of semester, then the functional dependency FD would no longer exist. This means that the FD is implied by the data as it is possible to have values that would invalidate the FD. Other nontrivial functional dependencies can be identified, for example: Line 48 ⟶ 54: The latter expresses the fact that the set {StudentID, Lecture} is a [[superkey]] of the relation. === Employee department ~~model~~ === A classic example of functional dependency is the employee department model. Line 74 ⟶ 80: * Department ID → Department Name This example demonstrates that even though there exists a FD Employee ID → Department ID - the employee ID would not be a logical key for determination of the department IDName. The process of normalization of the data would recognize all FDs and allow the designer to construct tables and relationships that are more logical based on the data. == Properties and axiomatization of functional dependencies == {{Main article\|Armstrong's axioms}} Given that ''X'', ''Y'', and ''Z'' are sets of attributes in a relation ''R'', one can derive several properties of functional dependencies. Among the most important are the following, usually called [[Armstrong's axioms]]:<ref name="SilberschatzKorth2010a">{{cite book\|author1-link=Abraham Silberschatz\|author2-link=Henry F. Korth\|author1=Abraham Silberschatz\|author2=Henry Korth\|author3=S. Sudarshan\|title=[[Database System Concepts]]\|year=2010\|publisher=McGraw-Hill\|isbn=978-0-07-352332-3\|edition=6th\|page=339}}</ref> * '''Reflexivity''': If ''Y'' is a subset of ''X'', then ''X'' → ''Y'' * '''Augmentation''': If ''X'' → ''Y'', then ''XZ'' → ''YZ'' Line 91 ⟶ 97: These three rules are a [[Soundness\|sound]] and [[Completeness (logic)\|complete]] axiomatization of functional dependencies. This axiomatization is sometimes described as finite because the number of inference rules is finite,<ref name="alice">{{Citation \|~~last~~last1=Abiteboul \|~~first~~first1=Serge \|author-link=Serge Abiteboul \|last2=Hull Line 115 ⟶ 121: :''X'' → ''Y'' and ''X'' → ''Z'' [[if and only if]] ''X'' → ''YZ'' == Closure == === Closure of functional dependency === The closure of a set of values is ~~essentially~~ the ~~full~~ set of ~~values~~attributes that can be determined ~~from~~using aits ~~set~~functional ~~of known values~~dependencies for a given relationship ~~using its functional dependencies~~. One uses [[Armstrong's axioms]] to provide a proof - i.e. reflexivity, augmentation, transitivity. Given <math>R</math> and <math>F</math> a set of FDs that holds in <math>R</math>: The closure of <math>F</math> in <math>R</math> (denoted <math>F</math><sup>+</sup>) is the set of all FDs that are logically implied by <math>F</math>.<ref>{{Cite journal\|last=Saiedian\|first=H.\|date=1996-02-01\|title=An Efficient Algorithm to Compute the Candidate Keys of a Relational Database Schema\|url=https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/39.2.124\|journal=The Computer Journal\|language=en\|volume=39\|issue=2\|pages=124–132\|doi=10.1093/comjnl/39.2.124\|issn=0010-4620\|url-access=subscription}}</ref> === Closure of a set of attributes === Closure of a set of attributes X with respect to <math>F</math> is the set X<sup>+</sup> of all attributes that are functionally determined by X using <math>F</math><sup>+</sup>. ~~attributes that are functionally determined by X using <math>F</math><sup>+</sup>.~~ ==== Example ==== Imagine the following list of FDs. We are going to calculate a closure for A (written as A<sup>+</sup>) from this relationship. 1.# ''A'' → ''B'' ~~<br/>~~ 2.# ''B'' → ''C'' ~~<br/>~~ 3.# ''AB'' → ''D'' The closure would be as follows: {{ordered list \| list-style-type = lower-alpha a)\| A → A (by Armstrong's reflexivity) ~~<br/>~~ b)\| A → AB (by 1. and (a)) ~~<br/>~~ c)\| A → ABD (by (b), 3, and Armstrong's transitivity) ~~<br/>~~ d)\| A → ABCD (by (c), and 2) }} Therefore, A<sup>+</sup>= ABCD. Because A<sup>+</sup> includes every attribute in the relationship, it is a [[superkey]]. ~~The closure is therefore A → ABCD. By calculating the closure of A, we have validated that A is also a good candidate key as its closure is every single data value in the relationship.~~ == Covers and equivalence == Line 159 ⟶ 166: === Heath's theorem === An important property (yielding an immediate application) of functional dependencies is that if ''R'' is a relation with columns named from some set of attributes ''U'' and ''R'' satisfies some functional dependency ''X'' → ''Y'' then <math>R=\Pi_{XY}(R)\bowtie\Pi_{XZ}(R)</math> where ''Z'' = ''U'' − ''XY''. Intuitively, if a functional dependency ''X'' → ''Y'' holds in ''R'', then the relation can be safely split in two relations alongside the column ''X'' (which is a key for <math>\Pi_{XY}(R)\bowtie\Pi_{XZ}(R)</math>) ensuring that when the two parts are joined back no data is lost, i.e. a functional dependency provides a simple way to construct a [[lossless join decomposition]] of ''R'' in two smaller relations. This fact is sometimes called ''Heaths theorem''; it is one of the early results in database theory.<ref>{{Cite book \| last1 = Heath \| first1 = I. J. \| chapter = Unacceptable file operations in a relational data base \| doi = 10.1145/1734714.1734717 \| title = Proceedings of the 1971 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control - SIGFIDET '71 \| pages = 19–33 \| year = 1971 \| s2cid = 22069259 }} cited in: * {{cite book\|editor=Michael Anshel and William Gewirtz\|title=Mathematics of Information Processing: [short Course Held in Louisville, Kentucky, January 23-24, 1984]\|chapter-url=https://archive.org/details/mathematicsofinf0034unse/page/23\|year=1986\|publisher=American Mathematical Soc.\|isbn=978-0-8218-0086-7\|author=Ronald Fagin and Moshe Y. Vardi\|chapter=The Theory of Data Dependencies - A Survey\|page=[https://archive.org/details/mathematicsofinf0034unse/page/23 23]}} {{cite book\|author=C. Date\|title=Database in Depth: Relational Theory for Practitioners\|url=https://books.google.com/books?id=TR8f5dtnC9IC&pg=PT162\|year=2005\|publisher=O'Reilly Media, Inc.\|isbn=978-0-596-10012-4\|page=142}} </ref> Line 180 ⟶ 187: # Reducing any functional dependency will change the content of S. Sets of functional dependencies with these properties are also called ''canonical'' or ''minimal''. Finding such a set S of functional dependencies which is equivalent to some input set S' provided as input is called finding a ''minimal cover'' of S': this problem can be solved in polynomial time.<ref>{{Cite journal\|last1=Meier\|first1=Daniel\|title=Minimum covers in the relational database model\|year=1980\|journal=[[Journal of the ACM]]\|volume=27 \|issue=4 \|pages=664–674 \|doi=10.1145/322217.322223\|s2cid=15789293 \|doi-access=free}}{{Closed access}}</ref> == See also == Line 193 ⟶ 200: {{reflist}} == Further ~~readings~~reading == {{cite journal\|url=https://forum.thethirdmanifesto.com/wp-content/uploads/asgarosforum/987737/00-efc-further-normalization.pdf\|title=Further Normalization of the Data Base Relational Model\|first=E. F.\|last=Codd\|author-link=Edgar F. Codd\|place=San Jose, California\|journal=ACM Transactions on Database Systems\|publisher=[[Association for Computing Machinery]]\|date=1972}}