Document clustering: Difference between revisions

Content deleted Content added
No edit summary
Line 21:
==Clustering in search engines==
A [[web search engine]] often returns thousands of pages in response to a broad query, making it difficult for users to browse or to identify relevant information. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories, as is achieved by e.g. open source software such as [[Carrot2]].
 
Examples:
 
* Clustering divides the results of a search for "cell" into groups like "biology," "battery," and "prison."
* [http://FirstGov.gov FirstGov.gov], the official Web portal for the U.S. government, uses document clustering to automatically organize its search results into categories. For example, if a user submits “immigration”, next to their list of results they will see categories for “Immigration Reform”, “Citizenship and Immigration Services”, “Employment”, “Department of Homeland Security”, and more.
* The Noggle search and clustering engine has clustered over 2000 TED Talks into automatically generated clusters. E.g. what had all TED talks from 2006-2016 in common about "happiness"? The results are available for further review.<ref>{{cite news|last1=von Thienen|first1=Lars|title=What would a robot see in TED talks?|url=https://www.noggle.online/knowledge-base/robot-see-ted-talks/|work=noggle.online|agency=TED.com}}</ref>
 
==Procedures==