Revision as of 20:04, 4 June 2009 edit Locobot (talk \| contribs) 17,725 edits m Check Wikipedia cleanup (Fixing breaks in lists) + gen. fixes ← Previous edit		Revision as of 13:11, 7 December 2009 edit undo SusieBreakfast (talk \| contribs) 1 edit m →How to use the unlabeled query logs to help with query classification? Next edit →
Line 47: === How to use the unlabeled query logs to help with query classification? === Since the manually labeled training data for query classification ~~are~~is expensive., ~~How~~how to use a very large web search engine query log as a source of unlabeled data to aid in automatic query classification becomes a hot issue. These logs record the Web users' behavior when they search for information via a search engine. Over the years, query logs have become a rich resource which contains Web users' knowledge about the World Wide Web. * Query clustering method<ref>Wen et al. [http://portal.acm.org/ft_gateway.cfm?id=503108 "Query Clustering Using User Logs"], ''ACM TOIS, Volume 20, Issue 1, January 2002''.</ref> tries to associate related queries by clustering “session data”, which contain multiple queries and click-through information from a single user interaction. They take into account terms from result documents that a set of queries has in common. The use of query keywords together with session data is shown to be the most effective method of performing query clustering. * Selectional preference based method<ref>Beitzel et al. [http://portal.acm.org/ft_gateway.cfm?id=1229183 "Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs"], ''ACM TOIS, Volume 25, Issue 2, April 2007''.</ref> tries to exploit some [[association rules]] between the query terms to help with the query classification. Given the training data, they exploit several classification approaches including exact-match using labeled data, N-Gram match using labeled data and classifiers based on ~~perceptron~~perception. They emphasize on an approach adapted from computational linguistics named selectional preferences. If x and y form a pair (x; y) and y belongs to category c, then all other pairs (x; z) headed by x belong to c. They use unlabeled query log data to mine these rules and validate the effectiveness of their approaches on some labeled queries. == Applications ==

Web query classification: Difference between revisions