Content deleted Content added
No edit summary |
Eurohunter (talk | contribs) →top: -capitals |
||
(60 intermediate revisions by 39 users not shown) | |||
Line 1:
{{Cleanup|date=March 2011}}
A '''
== Difficulties ==
Web query topic classification is to automatically assign a query to some predefined categories. Different from the traditional document classification tasks, there are several major difficulties which hinder the progress of Web [[query understanding]]:
===
Many queries are short, and query terms are often noisy.{{Clarify|reason=what
===
The meanings of queries may also evolve over time. Therefore, the old labeled training queries may be out-of-data and useless soon. How to make the classifier adaptive over time becomes a big issue. For example, the word "''Barcelona''" has a new meaning of the new micro-processor of AMD, while it refers to a city or football club before 2007. The distribution of the meanings of this term is therefore a function of time on the Web.
===
* Selectional preference based method<ref>Beitzel et al. [http://portal.acm.org/ft_gateway.cfm?id=1229183 "Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs"], ''ACM TOIS, Volume 25, Issue 2, April 2007''.</ref> tries to exploit some [[association rules]] between the query terms to help with the query classification. Given the training data, they exploit several classification approaches including exact-match using labeled data, N-Gram match using labeled data and classifiers based on perceptron. They emphasize on an approach adapted from computational linguistics named selectional preferences. If x and y form a pair (x; y) and y belongs to category c, then all other pairs (x; z) headed by x belong to c. They use unlabeled query log data to mine these rules and validate the effectiveness of their approaches on some labeled queries.▼
▲
== Applications ==
* '''[[metasearch|Metasearch engines]]''' send a user's query to multiple search engines and blend the top results from each into one overall list. The search engine can organize the large number of Web pages in the search results, according to the potential categories of the issued query, for the convenience of Web users' navigation.
* '''[[Vertical search]]''', compared to general search, focuses on specific domains and addresses the particular information needs of niche audiences and professions. Once the search engine can predict the category of information a Web user is looking for, it can select a certain vertical search engine automatically, without forcing the user to access the vertical search engine explicitly.
* '''[[Online advertising]]'''<ref>[http://www.kdd2007.com/workshops.html#adkdd Data Mining and Audience Intelligence for Advertising (ADKDD'07)], KDD workshop 2007</ref><ref>[http://research.yahoo.com/workshops/troa-2008/ Targeting and Ranking for Online Advertising (TROA'08)], WWW workshop 2008</ref> aims at providing interesting advertisements to Web users during their search activities. The search engine can provide relevant advertising to Web users according to their interests, so that the Web users can save time and effort in research while the advertisers can reduce their advertising costs.
▲* '''Metasearch engines''' send a user's query to multiple search engines and blend the top results from each into one overall list. The search engine can organize the large number of Web pages in the search results, according to the potential categories of the issued query, for the convenience of Web users' navigation.<br />
▲* '''Vertical search''', compared to general search, focuses on specific domains and addresses the particular information needs of niche audiences and professions. Once the search engine can predict the category of information a Web user is looking for, it can select a certain vertical search engine automatically, without forcing the user to access the vertical search engine explicitly. <br />
▲* '''Online advertising''' aims at providing interesting advertisements to Web users during their search activities. The search engine can provide relevant advertising to Web users according to their interests, so that the Web users can save time and effort in research while the advertisers can reduce their advertising costs.
All these services rely on the understanding Web users' search intents through their Web queries.
== See also ==
* [[Document classification]]
* [[Web search query]]
* [[Information retrieval]]
* [[Query expansion]]
* [[Naive Bayes classifier]]
* [[Support vector machines]]
* [[Meta search]]
* [[Vertical search]]
* [[Online advertising]]
==
* Shen. [http://lbxml.ust.hk/th/th_search.pl?smode=VIEWBYCALLNUM&skeywords=CSED%202007%20Shen "Learning-based Web Query Understanding"]. ''Phd Thesis'', ''HKUST'', June 2007.<br />▼
{{reflist}}
== Further reading ==
▲* Shen. [http://lbxml.ust.hk/th/th_search.pl?smode=VIEWBYCALLNUM&skeywords=CSED%202007%20Shen "Learning-based Web Query Understanding"]. ''Phd Thesis'', ''HKUST'', June 2007.
[[Category:Data mining]]▼
{{Internet search}}
{{DEFAULTSORT:Web Query Classification}}
[[Category:Information retrieval techniques]]
|