Web query classification: Difference between revisions

Content deleted Content added
Locobot (talk | contribs)
Line 47:
=== How to use the unlabeled query logs to help with query classification? ===
 
Since the manually labeled training data for query classification areis expensive., Howhow to use a very large web search engine query log as a source of unlabeled data to aid in automatic query classification becomes a hot issue. These logs record the Web users' behavior when they search for information via a search engine. Over the years, query logs have become a rich resource which contains Web users' knowledge about the World Wide Web.
 
* Query clustering method<ref>Wen et al. [http://portal.acm.org/ft_gateway.cfm?id=503108 "Query Clustering Using User Logs"], ''ACM TOIS, Volume 20, Issue 1, January 2002''.</ref> tries to associate related queries by clustering “session data”, which contain multiple queries and click-through information from a single user interaction. They take into account terms from result documents that a set of queries has in common. The use of query keywords together with session data is shown to be the most effective method of performing query clustering.
 
* Selectional preference based method<ref>Beitzel et al. [http://portal.acm.org/ft_gateway.cfm?id=1229183 "Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs"], ''ACM TOIS, Volume 25, Issue 2, April 2007''.</ref> tries to exploit some [[association rules]] between the query terms to help with the query classification. Given the training data, they exploit several classification approaches including exact-match using labeled data, N-Gram match using labeled data and classifiers based on perceptronperception. They emphasize on an approach adapted from computational linguistics named selectional preferences. If x and y form a pair (x; y) and y belongs to category c, then all other pairs (x; z) headed by x belong to c. They use unlabeled query log data to mine these rules and validate the effectiveness of their approaches on some labeled queries.
 
== Applications ==