Apriori algorithm: Difference between revisions

Content deleted Content added
Bask (talk | contribs)
m - linkless
edited algorithm and references
Line 3:
{{cleanup-context}}
 
'''Apriori''' is an efficient [[association rule mining]] [[algorithmdata mining]] algorithm, developed by Rakesh Agrawal, et al (Agrawal 93, Agrawal 94)..
 
Apriori (Agrawal 94) employs [[breadth-first search]] and uses a hash tree structure to count candidate item sets efficiently. The algorithm generates candidate item sets (patterns) of length <math>k</math> from <math>k-1</math> length item sets. Then, the patterns which have an infrequent sub pattern are pruned. According to the [[downward closure lemma]], the generated candidate set contains all frequent <math>k</math> length item sets. Following that, the whole transaction database is scanned to determine frequent item sets among the candidates. For determining frequent items in a fast manner, the algorithm uses a hash tree to store candidate itemsets. Note: This hash tree has item sets at the leaves and [[hash table]]s at internal nodes (Zaki, 99). Note that this is not the same kind of [[hash tree]] structure used in for instance p2p systems.
 
Apriori is designed to operate on [[database]]s containing transactions (eg: collection of items bought by customers or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).
 
== Algorithm ==
 
''Apriori''<math>(T,\varepsilon)</math>
<math>L_1 \gets \{ </math> large 1-itemsets <math> \} </math>
: <math>L_1 \gets \{ </math> ''large 1-itemsets'' <math>k \gets} 2</math>
: <math>k \gets 2</math>
:: '''while''' <math> L_{k-1} \neq \varnothing </math>
::: <math>C_k \gets </math>''Generate''<math>(L_k-1)</math>
::: '''for transactions''' transactions <math>t \in T</math>
:::: <math>C_t \gets </math>Subset<math>(C_k,t)</math>
:::: '''for candidates''' candidates <math>c \in C_t</math>
::::: <math>\mathrm{count}[c] \gets \mathrm{count}[c]+1</math>
::: <math>L_k \gets \{ c \in C_k | ~ \mathrm{count}[c] \geq \varepsilon \}</math>
::: <math>k \gets k+1</math>
: '''return''' <math>\bigcup_k L_k</math>
 
== References ==
 
*Rakesh Agrawal and Tomasz Imielinski and Arun N. Swami, ''Mining Association Rules between Sets of Items in Large Databases'', Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data.
* Agrawal R, Imielinski T, Swami AN. "Mining Association Rules between Sets of Items in Large Databases." ''ACM [[SIGMOD]]''. June 1993, '''22'''(2), 207-16, [http://portal.acm.org/ft_gateway.cfm?id=170072&type=pdf&coll=GUIDE&dl=portal,ACM&CFID=11111111&CFTOKEN=2222222 pdf].
*Rakesh Agrawal and Ramakrishnan Srikant, ''Fast Algorithms for Mining Association Rules'', Proc. 20th Int. Conf. Very Large Data Bases (VLDB), 1994.
* Agrawal R, Srikant R. "Fast Algorithms for Mining Association Rules", ''[[VLDB]]'', Sep 12-15 1994, Chile, 487-99, [http://www.acm.org/sigmod/vldb/conf/1994/P487.PDF pdf], ISBN 1-55860-153-8.
*Heikki Mannila and HannuH, Toivonen and A. InkeriH, Verkamo, ''AI. "Efficient algorithms for discovering association rules." '', [[AAAI]] Workshop on Knowledge Discovery in Databases (KDD-94[[SIGKDD]]),''. July 1994, Seattle, 181-92, [http://www.softlab.ntua.gr/facilities/public/AD/DM/Efficient_Algorithms_for_Discovering_Association_Rules.ps ps].
*Mohammed Javeed Zaki and Srinivasan Parthasarathy and Mitsunori Ogihara and Wei Li, ''Parallel Algorithms for Discovery of Association Rules'', Data Mining and Knowledge Discovery, 1997.
 
* Zaki MJ, Parthasarathy S, Ogihara M, Li W. "Parallel Algorithms for Discovery of Association Rules." ''Data Mining and Knowledge Discovery''. Dec 1997, '''1'''(4), 343-73, [http://www.hpjava.org/pcrc/doc/rochester/97.DMKD.Parallel_algorithms_for_fast_discovery_of_assoc_rules.ps ps].
 
[[Category:Search algorithms]]