Content deleted Content added
Minibear898 (talk | contribs) No edit summary |
rm sig |
||
(37 intermediate revisions by 24 users not shown) | |||
Line 1:
'''The
It is used, directly or indirectly, on an everyday basis by people who design databases, and it is used in commercial systems to reason about the consistency and correctness of a data design.{{citation needed|date=November 2012}} New applications of the chase in meta-data management and data exchange are still being discovered.
The
[[David Maier]], [[Alberto O. Mendelzon]], and [[Yehoshua Sagiv]]: "Testing Implications of Data Dependencies". ACM Trans. Datab. Syst. 4(4):455-469, 1979.</ref>
In its simplest application the chase is used for testing whether the [[projection (relational algebra)|projection]] of a [[relation schema]] constrained by some [[functional dependency|functional dependencies]] onto a given decomposition can be [[join dependency|recovered by rejoining the projections]]. Let ''t'' be a tuple in <math>\pi_{S_1}(R) \bowtie \pi_{S_2}(R) \bowtie ... \bowtie \pi_{S_k}(R)</math> where ''R'' is a [[relation (database)|relation]] and ''F'' is a set of
▲Let ''t'' be a tuple in <math>\pi_{S_1}(R) \bowtie \pi_{S_2}(R) \bowtie ... \bowtie \pi_{S_k}(R)</math> where ''R'' is a relation and ''F'' is a set of [[functional dependency | functional dependencies]] (FD). If tuples in ''R'' are represented as ''t<sub>1</sub>, ..., t<sub>k</sub>'', the join of the projections of each ''t<sub>i</sub>'' should agree with ''t'' on <math>\pi_{S_i}(R)</math> where ''i'' = 1, 2, ..., ''k''. If ''t<sub>i</sub>'' is not on <math>\pi_{S_i}(R)</math>, the value is unknown.
The chase process is [[confluence (rewriting system)|confluent]]. There exist implementations of the chase algorithm,<ref>[[Michael Benedikt (computer scientist)|Michael Benedikt]], [[George Konstantinidis]], [[Giansalvatore Mecca]], [[Boris Motik]], [[Paolo Papotti]], [[Donatello Santoro]], [[Efthymia Tsamoura]]: ''Benchmarking the Chase''. In Proc. of PODS, 2017.</ref> some of them are also open-source.<ref>{{cite web |url=https://github.com/donatellosantoro/Llunatic |title=The Llunatic Mapping and Cleaning Chase Engine|date=6 April 2021}}</ref>
==Example==
The initial tableau for this decomposition is:
{| border="1" cellspacing="0" cellpadding="5" align="center"
! ''A'' !! ''B'' !! ''C'' !! ''D''
Line 23 ⟶ 25:
|}
The first row represents S<sub>1</sub>. The components for attributes ''A'' and ''D'' are unsubscripted and those for attributes ''B'' and ''C'' are subscripted with ''i'' = 1. The second and third rows are filled in the same manner with S<sub>2</sub> and S<sub>3</sub> respectively.
The goal for this test is to use the given ''F'' to prove that ''t'' = (''a'', ''b'', ''c'', ''d'') is really in ''R''. To do so, the tableau can be chased by applying the
To perform the chase test, first decompose all FD’s in ''F'' so each FD has a single attribute. Then, ''F'' = {''A''→''B'', ''B''→''C'', ''C''→''A'', ''D''→''A''}.▼
<br />
When equating two symbols, if one of them is unsubscripted, make the other be the same so that the final tableau can have a row that is exactly the same as ''t'' = (''a'', ''b'', ''c'', ''d''). Also, if both have their own subscript, change either to be the other. However, to avoid confusion, all of the occurrences should be changed.▼
▲To perform the chase test, first decompose all
First, apply ''A''→''B'' to the tableau. The first row is (''a'', ''b<sub>1</sub>'', ''c<sub>1</sub>'', ''d'') where ''a'' is unsubscripted and ''b<sub>1</sub>'' is subscripted with 1. Comparing the first row with the second one, change ''b<sub>2</sub>'' to ''b<sub>1</sub>''. Since the third row has ''a<sub>3</sub>'', ''b'' in the third row stays the same. The resulting tableau is:▼
▲When equating two symbols, if one of them is unsubscripted, make the other be the same so that the final tableau can have a row that is exactly the same as ''t'' = (''a'', ''b'', ''c'', ''d'').
<br>
First, apply ''A''→''B'' to the tableau.
▲
{| border="1" cellspacing="0" cellpadding="5" align="center"
! ''A'' !! ''B'' !! ''C'' !! ''D''
Line 35 ⟶ 42:
|-
| ''a<sub>3</sub>'' || ''b'' || ''c'' || ''d''
|}
Then consider ''B''→''C''. Both first and second rows have ''b<sub>1</sub>'' and notice that the second row has an unsubscripted ''
{| border="1" cellspacing="0" cellpadding="5" align="center"
! ''A'' !! ''B'' !! ''C'' !! ''D''
Line 46 ⟶ 53:
|-
| ''a<sub>3</sub>'' || ''b'' || ''c'' || ''d''
|}
Now consider ''
{| border="1" cellspacing="0" cellpadding="5" align="center"
! ''A'' !! ''B'' !! ''C'' !! ''D''
Line 58 ⟶ 65:
| ''a'' || ''b'' || ''c'' || ''d''
|}
At this point, notice that the third row is (''a'', ''b'', ''c'', ''d'') which is the same as ''t''. Therefore, this is the final tableau for the chase test with given ''R'' and ''F''. Hence, whenever ''R'' is projected onto S<sub>1</sub>, S<sub>2</sub> and S<sub>3</sub> and rejoined, the result is in ''R''. Particularly, the resulting tuple is the same as the tuple of ''R'' that is projected onto {''B'', ''C'', ''D''}.
== References ==
<references/>
* [[Serge Abiteboul]], [[Richard B. Hull]], [[Victor Vianu]]: Foundations of Databases. Addison-Wesley, 1995.
* [[Alfred Aho|A. V. Aho]], C. Beeri, and [[Jeffrey Ullman|J. D. Ullman]]: ''The
* [[Jeffrey Ullman|J. D. Ullman]]: ''Principles of Database and Knowledge-Base Systems, Volume I''. Computer Science Press, New York, 1988.
* [[Jeffrey Ullman|J. D. Ullman
* [[Michael Benedikt (computer scientist)|Michael Benedikt]], [[George Konstantinidis]], [[Giansalvatore Mecca]], [[Boris Motik]], [[Paolo Papotti]], [[Donatello Santoro]], [[Efthymia Tsamoura]]: ''Benchmarking the Chase''. In Proc. of PODS, 2017.
== Further reading ==
* {{cite book|author1=Sergio Greco|author2=Francesca Spezzano|author3=Cristian Molinaro|title=Incomplete Data and Data Dependencies in Relational Databases|year=2012|publisher=Morgan & Claypool Publishers|isbn=978-1-60845-926-1}}
{{DEFAULTSORT:Chase (Algorithm)}}
[[Category:Database theory]]
[[Category:Database algorithms]]
|