Chase (algorithm): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 11:44, 16 April 2012 edit A3nm (talk \| contribs) Extended confirmed users, Pending changes reviewers 4,578 edits The chase process is confluent. ← Previous edit		Latest revision as of 17:34, 26 September 2021 edit undo Jac16888 (talk \| contribs) Administrators 55,766 edits rm sig Tag: Undo
(27 intermediate revisions by 19 users not shown)
Line 1: '''The ~~Chase~~chase''' is a simple [[fixed -point iteration\|~~fixpoint~~fixed-point algorithm]] testing and enforcing implication of data dependencies in [[database\|database systems]]. It plays important roles in [[database theory]] as well as in practice. It is used, directly or indirectly, on an everyday basis by people who design databases, and it is used in commercial systems to reason about the consistency and correctness of a data design.{{citation needed\|date=November 2012}} New applications of the chase in meta-data management and data exchange are still being discovered. The ~~Chase~~chase has its origins in two seminal papers of 1979, one by [[Alfred V. Aho]], [[Catriel Beeri]], and [[Jeffrey D. Ullman]]<ref>[[Alfred V. Aho]], [[Catriel Beeri]], and [[Jeffrey D. Ullman]]: "The Theory of Joins in Relational Databases", ACM Trans. Datab. Syst. 4(3):297-314, 1979.</ref> and the other by [[David Maier]], [[Alberto O. Mendelzon]], and [[Yehoshua Sagiv]].<ref> [[David Maier]], [[Alberto O. Mendelzon]], and [[Yehoshua Sagiv]]: "Testing Implications of Data Dependencies". ACM Trans. Datab. Syst. 4(4):455-469, 1979.</ref> ~~and the other by~~ [[Alfred V. Aho]], [[Catriel Beeri]], and [[Jeffrey D. Ullman]]<ref>[[Alfred V. Aho]], [[Catriel Beeri]], and [[Jeffrey D. Ullman]]: "The Theory of Joins in Relational Databases", ACM Trans. Datab. Syst. 4(3):297-314, 1979.</ref>. In its simplest application the chase is used for testing whether the [[projection (relational algebra)\|projection]] of a [[relation schema]] constrained by some [[functional dependency\|functional dependencies]] onto a given decomposition can be [[join dependency\|recovered by rejoining the projections]]. Let ''t'' be a tuple in <math>\pi_{S_1}(R) \bowtie \pi_{S_2}(R) \bowtie ... \bowtie \pi_{S_k}(R)</math> where ''R'' is a [[relation (database)\|relation]] and ''F'' is a set of ~~[[functional dependency\|~~functional dependencies]] (FD). If tuples in ''R'' are represented as ''t<sub>1</sub>, ..., t<sub>k</sub>'', the join of the projections of each ''t<sub>i</sub>'' should agree with ''t'' on <math>\pi_{S_i}(R)</math> where ''i'' = 1, 2, ..., ''k''. If ''t<sub>i</sub>'' is not on <math>\pi_{S_i}(R)</math>, the value is unknown.▼ ~~'''Chase test''' is for testing whether the [[projection (relational algebra)\|projection]] of a relation onto any decomposition can be recovered by rejoining.~~ ▲Let ''t'' be a tuple in <math>\pi_{S_1}(R) \bowtie \pi_{S_2}(R) \bowtie ... \bowtie \pi_{S_k}(R)</math> where ''R'' is a [[relation (database)\|relation]] and ''F'' is a set of [[functional dependency\|functional dependencies]] (FD). If tuples in ''R'' are represented as ''t<sub>1</sub>, ..., t<sub>k</sub>'', the join of the projections of each ''t<sub>i</sub>'' should agree with ''t'' on <math>\pi_{S_i}(R)</math> where ''i'' = 1, 2, ..., ''k''. If ''t<sub>i</sub>'' is not on <math>\pi_{S_i}(R)</math>, the value is unknown. ~~Chase~~The ~~test~~chase can be done by drawing a tableau (which is the same formalism used in [[tableau query]]). Suppose ''R'' has [[attribute (computing)\|attributes]] ''A, B, ...'' and components of ''t'' are ''a, b, ...''. For ''t<sub>i</sub>'' use the same letter as ''t'' in the components that are in S<sub>''i''</sub> but subscript the letter with ''i'' if the component is not in S<sub>''i''</sub>. Then, ''t<sub>i</sub>'' will agree with ''t'' if it is in S<sub>''i''</sub> and will have a unique value otherwise. The chase process is [[confluence (rewriting system)\|confluent]]. There exist implementations of the chase algorithm,<ref>[[Michael Benedikt (computer scientist)\|Michael Benedikt]], [[George Konstantinidis]], [[Giansalvatore Mecca]], [[Boris Motik]], [[Paolo Papotti]], [[Donatello Santoro]], [[Efthymia Tsamoura]]: ''Benchmarking the Chase''. In Proc. of PODS, 2017.</ref> some of them are also open-source.<ref>{{cite web \|url=https://github.com/donatellosantoro/Llunatic \|title=The Llunatic Mapping and Cleaning Chase Engine\|date=6 April 2021}}</ref> ~~The chase process is [[confluence (rewriting system)\|confluent]].~~ ==Example== ~~Suppose~~Let ''R''(''A'', ''B'', ''C'', ''D'') ~~which~~be ~~are~~a relation schema known to obey the set of functional dependencies ''F'' = {''A''→''B'', ''B''→''C'', ''CD→A''}. Suppose ''R'' is decomposed into ~~relations~~three ~~with~~relation ~~attributes~~schemas S<sub>1</sub> = {''A'', ''D''}, S<sub>2</sub> = {''A'', ''C''} and S<sub>3</sub> = {''B'', ''C'', ''D''}. ~~and~~Determining ~~''F''~~whether =this ~~{''A''→''B'',~~decomposition ~~''B''→''C'',~~is ~~''CD→A''}~~lossless iscan ~~given.~~be ~~The~~done ~~initial~~by ~~tableau~~performing ~~for~~a ~~this~~chase ~~decomposition~~as shown ~~is:~~below. The initial tableau for this decomposition is: {\| border="1" cellspacing="0" cellpadding="5" align="center" ! ''A'' !! ''B'' !! ''C'' !! ''D'' Line 25: \|} The first row represents S<sub>1</sub>. The components for attributes ''A'' and ''D'' are unsubscripted and those for attributes ''B'' and ''C'' are subscripted with ''i'' = 1. The second and third rows are filled in the same manner with S<sub>2</sub> and S<sub>3</sub> respectively. The goal for this test is to use the given ''F'' to prove that ''t'' = (''a'', ''b'', ''c'', ''d'') is really in ''R''. To do so, the tableau can be chased by applying the FD’s in ''F'' to equate symbols in the tableau. Final tableau with a row that is the same as ''t'' implies that any tuple ''t'' in the join of the projections is actually a tuple of ''R''. ToThe ~~perform~~goal ~~the~~for ~~chase~~this test, ~~first~~is ~~decompose~~to ~~all~~use ~~FD’s~~the ingiven ''F'' soto ~~each~~prove FDthat ~~has~~''t'' = (''a'', ~~single~~''b'', ~~attribute~~''c'', on''d'') is really in ''R''. To do so, the ~~right~~tableau ~~hand~~can ~~side~~be ofchased by applying the ~~"arrow".~~FDs in ''F'' ~~remains~~to ~~unchanged~~equate ~~because~~symbols ~~all~~in ofthe ~~its~~tableau. ~~FD's~~A final ~~already~~tableau ~~has~~with a ~~single~~row ~~attribute~~that onis the ~~right~~same ~~hand side.~~as ''Ft'' =implies ~~{''A''→''B'',~~that any tuple ''Bt''~~→''C'',~~ in the join of the projections is actually a tuple of ''~~CD''→''A~~R''}. <br /> When equating two symbols, if one of them is unsubscripted, make the other be the same so that the final tableau can have a row that is exactly the same as ''t'' = (''a'', ''b'', ''c'', ''d''). Also, if both have their own subscript, change either to be the other. However, to avoid confusion, all of the occurrences should be changed.▼ To perform the chase test, first decompose all FDs in ''F'' so each FD has a single attribute on the right hand side of the "arrow". (In this example, ''F'' remains unchanged because all of its FDs already have a single attribute on the right hand side: ''F'' = {''A''→''B'', ''B''→''C'', ''CD''→''A''}.) First, apply ''A''→''B'' to the tableau. The first row is (''a'', ''b<sub>1</sub>'', ''c<sub>1</sub>'', ''d'') where ''a'' is unsubscripted and ''b<sub>1</sub>'' is subscripted with 1. Comparing the first row with the second one, change ''b<sub>2</sub>'' to ''b<sub>1</sub>''. Since the third row has ''a<sub>3</sub>'', ''b'' in the third row stays the same. The resulting tableau is:▼ ▲When equating two symbols, if one of them is unsubscripted, make the other be the same so that the final tableau can have a row that is exactly the same as ''t'' = (''a'', ''b'', ''c'', ''d''). ~~Also, if~~If both have their own subscript, change either to be the other. However, to avoid confusion, all of the occurrences should be changed. <br> First, apply ''A''→''B'' to the tableau. ▲~~First, apply ''A''→''B'' to the tableau.~~ The first row is (''a'', ''b<sub>1</sub>'', ''c<sub>1</sub>'', ''d'') where ''a'' is unsubscripted and ''b<sub>1</sub>'' is subscripted with 1. Comparing the first row with the second one, change ''b<sub>2</sub>'' to ''b<sub>1</sub>''. Since the third row has ''a<sub>3</sub>'', ''b'' in the third row stays the same. The resulting tableau is: {\| border="1" cellspacing="0" cellpadding="5" align="center" ! ''A'' !! ''B'' !! ''C'' !! ''D'' Line 37 ⟶ 42: \|- \| ''a<sub>3</sub>'' \|\| ''b'' \|\| ''c'' \|\| ''d'' \|} Then consider ''B''→''C''. Both first and second rows have ''b<sub>1</sub>'' and notice that the second row has an unsubscripted ''c''. Therefore, the first row changes to (''a'', ''b<sub>1</sub>'', ''c'', ''d''). Then the resulting tableau is: Line 48 ⟶ 53: \|- \| ''a<sub>3</sub>'' \|\| ''b'' \|\| ''c'' \|\| ''d'' \|} Now consider ''CD''→''A''. The first row has an unsubscripted ''c'' and an unsubscripted ''d'', which is the same as in third row. This means that the A value for row one and three must be the same as well. Hence, change ''a<sub>3</sub>'' in the third row to ''a''. The resulting tableau is: Line 68 ⟶ 73: * [[Jeffrey Ullman\|J. D. Ullman]]: ''Principles of Database and Knowledge-Base Systems, Volume I''. Computer Science Press, New York, 1988. * [[Jeffrey Ullman\|J. D. Ullman]], [[Jennifer Widom\|J. Widom]]: ''A First Course in Database Systems'' (3rd ed.). pp. 96–99. Pearson Prentice Hall, 2008. * [[Michael Benedikt (computer scientist)\|Michael Benedikt]], [[George Konstantinidis]], [[Giansalvatore Mecca]], [[Boris Motik]], [[Paolo Papotti]], [[Donatello Santoro]], [[Efthymia Tsamoura]]: ''Benchmarking the Chase''. In Proc. of PODS, 2017. == Further reading == * {{cite book\|author1=Sergio Greco\|author2=Francesca Spezzano\|author3=Cristian Molinaro\|title=Incomplete Data and Data Dependencies in Relational Databases\|year=2012\|publisher=Morgan & Claypool Publishers\|isbn=978-1-60845-926-1}} {{DEFAULTSORT:Chase (Algorithm)}}