Revision as of 04:43, 4 June 2016 edit Yobot (talk \| contribs) Bots 4,733,870 edits m Removed invisible unicode characters + other fixes, replaced: → using AWB (12020) ← Previous edit		Revision as of 08:33, 11 June 2016 edit undo Vonthienen (talk \| contribs) 13 edits adding references for Clustering in search engines Next edit →
Line 20: ==Clustering in search engines== A [[web search engine]] often returns thousands of pages in response to a broad query, making it difficult for users to browse or to identify relevant information. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories, as is achieved by Enterprise Search engines such as [[Northern Light Group\|Northern Light]] and [[Vivisimo]], consumer search engines such as [http://www.polymeta.com/ PolyMeta] and [http://www.helioid.com Helioid], or free Desktop Search Tools such as [https://www.noggle.online Noggle], or open source software such as [[Carrot2]]. Examples: Line 26: * Clustering divides the results of a search for "cell" into groups like "biology," "battery," and "prison." * [http://FirstGov.gov FirstGov.gov], the official Web portal for the U.S. government, uses document clustering to automatically organize its search results into categories. For example, if a user submits “immigration”, next to their list of results they will see categories for “Immigration Reform”, “Citizenship and Immigration Services”, “Employment”, “Department of Homeland Security”, and more. * The Noggle search and clustering engine has clustered over 2000 TED Talks into automatically generated clusters. E.g. what had all TED talks from 2006-2016 in common about "happiness"? The results are available for further review.<ref>{{cite news\|last1=von Thienen\|first1=Lars\|title=What would a robot see in TED talks?\|url=https://www.noggle.online/knowledge-base/robot-see-ted-talks/\|work=noggle.online\|agency=TED.com}}</ref> == Clustering v. Classifying ==

Document clustering: Difference between revisions