Document clustering: Difference between revisions

Content deleted Content added
Yobot (talk | contribs)
m Removed invisible unicode characters + other fixes, replaced: → using AWB (12020)
adding references for Clustering in search engines
Line 20:
 
==Clustering in search engines==
A [[web search engine]] often returns thousands of pages in response to a broad query, making it difficult for users to browse or to identify relevant information. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories, as is achieved by Enterprise Search engines such as [[Northern Light Group|Northern Light]] and [[Vivisimo]], consumer search engines such as [http://www.polymeta.com/ PolyMeta] and [http://www.helioid.com Helioid], or free Desktop Search Tools such as [https://www.noggle.online Noggle], or open source software such as [[Carrot2]].
 
Examples:
Line 26:
* Clustering divides the results of a search for "cell" into groups like "biology," "battery," and "prison."
* [http://FirstGov.gov FirstGov.gov], the official Web portal for the U.S. government, uses document clustering to automatically organize its search results into categories. For example, if a user submits “immigration”, next to their list of results they will see categories for “Immigration Reform”, “Citizenship and Immigration Services”, “Employment”, “Department of Homeland Security”, and more.
* The Noggle search and clustering engine has clustered over 2000 TED Talks into automatically generated clusters. E.g. what had all TED talks from 2006-2016 in common about "happiness"? The results are available for further review.<ref>{{cite news|last1=von Thienen|first1=Lars|title=What would a robot see in TED talks?|url=https://www.noggle.online/knowledge-base/robot-see-ted-talks/|work=noggle.online|agency=TED.com}}</ref>
 
== Clustering v. Classifying ==