Apriori algorithm: Difference between revisions

Content deleted Content added
No edit summary
Tags: Reverted Mobile edit Mobile web edit
GreenC bot (talk | contribs)
Reformat 1 archive link. Wayback Medic 2.5 per WP:USURPURL and JUDI batch #21aa
Line 4:
== Overview ==
 
The Apriori algorithm was proposed by Agrawal and Srikant in 1994. Apriori is designed to operate on [[database]]s containing transactions (for example, collections of items bought by customers, or details of a website frequentation or [[IP address]]es<ref>[https://deductive.com/blogs/data-science-ip-matching/ The data science behind IP address matching] {{Webarchiveusurped|url1=[https://web.archive.org/web/20210822191810/https://deductive.com/blogs/data-science-ip-matching/ |date=2021-08-22The data science behind IP address matching]}} Published by deductive.com, September 6, 2018, retrieved September 7, 2018</ref>). Other algorithms are designed for finding association rules in data having no transactions ([[Winepi]] and Minepi), or having no timestamps (DNA sequencing). Each transaction is seen as a set of items (an ''itemset''). Given a threshold <math>C</math>, the Apriori algorithm identifies the item sets which are subsets of at least <math>C</math> transactions in the database.
 
Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as ''candidate generation''), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.