Content deleted Content added
Fix Linter errors. |
Eurohunter (talk | contribs) →top: -capitals |
||
(8 intermediate revisions by 3 users not shown) | |||
Line 1:
{{Cleanup|date=March 2011}}
A '''
== Difficulties ==
Line 31 ⟶ 6:
Web query topic classification is to automatically assign a query to some predefined categories. Different from the traditional document classification tasks, there are several major difficulties which hinder the progress of Web [[query understanding]]:
===
Many queries are short, and query terms are often noisy.{{Clarify|reason=what
===
The meanings of queries may also evolve over time. Therefore, the old labeled training queries may be out-of-data and useless soon. How to make the classifier adaptive over time becomes a big issue. For example, the word "''Barcelona''" has a new meaning of the new micro-processor of AMD, while it refers to a city or football club before 2007. The distribution of the meanings of this term is therefore a function of time on the Web.
===
Since the manually labeled training data for query classification is expensive, how to use a very large web search engine query log as a source of unlabeled data to aid in automatic query classification becomes a hot issue. These logs record the Web users' behavior when they search for information via a search engine. Over the years, query logs have become a rich resource which contains Web users' knowledge about the World Wide Web.
== Applications ==
|