Content deleted Content added
Dinamik-bot (talk | contribs) m r2.6.5) (robot Adding: ru:Кластеризация результатов поиска |
Eurohunter (talk | contribs) →top: -capitals |
||
(23 intermediate revisions by 16 users not shown) | |||
Line 1:
{{Cleanup|date=March 2011}}
A '''
== Difficulties ==
Web query topic classification is to automatically assign a query to some predefined categories. Different from the traditional document classification tasks, there are several major difficulties which hinder the progress of Web [[query understanding]]:
===
Many queries are short, and query terms are often noisy.{{Clarify|reason=what
▲=== How to adapt the changes of the queries and categories over time? ===
The meanings of queries may also evolve over time. Therefore, the old labeled training queries may be out-of-data and useless soon. How to make the classifier adaptive over time becomes a big issue. For example, the word "''Barcelona''" has a new meaning of the new micro-processor of AMD, while it refers to a city or football club before 2007. The distribution of the meanings of this term is therefore a function of time on the Web.
===
Since the manually labeled training data for query classification is expensive, how to use a very large web search engine query log as a source of unlabeled data to aid in automatic query classification becomes a hot issue. These logs record the Web users' behavior when they search for information via a search engine. Over the years, query logs have become a rich resource which contains Web users' knowledge about the World Wide Web.
== Applications ==
Line 79 ⟶ 51:
== Further reading ==
* Shen. [http://lbxml.ust.hk/th/th_search.pl?smode=VIEWBYCALLNUM&skeywords=CSED%202007%20Shen "Learning-based Web Query Understanding"]. ''Phd Thesis'', ''HKUST'', June 2007.
{{Internet search}}
{{DEFAULTSORT:Web Query Classification}}
[[Category:Information retrieval techniques]]
[[Category:Internet search]]
|