Content deleted Content added
m →Aho Corasick vs TF-IDF: remove separator |
→bab not in tree: new section |
||
Line 55:
: Apart from both having applications in text mining, they are nothing alike. Aho-Corasick is a string searching algorithm, while TF-IDF is a term weighting statistic. Given a dictionary, Aho-Corasick builds a tree-like data structure for efficiently matching all the dictionary strings against a text document (i.e., just another string). On the other hand, TF-IDF measures the relevance of a given term in a given document from a particular collection of documents. Let's say that a user is searching the web and inputs a keyword query in a search engine. Then, the TF-IDF for each term in the query can be computed for each document in the collection and used to compute a final document score. Then, documents are ranked from highest to lowest score, in order to present the most relevant documents to the user. Aho-Corasick, on the other hand, might be useful on a situation where you want to identify, let's say, people's names in a collection of documents. You might use the people's names as your dictionary and then process each document to find matches. [[User:José Devezas|José Devezas]] ([[User talk:José Devezas|talk]]) 10:22, 6 June 2018 (UTC)
== bab not in tree ==
In the presented example the word bab is mentioned among the words, but it is not represented in the automaton...[[Special:Contributions/217.153.187.164|217.153.187.164]] ([[User talk:217.153.187.164|talk]]) 18:21, 27 October 2022 (UTC) sorry, I have to remember my account info
|