Content deleted Content added
m Reverted edits by 217.179.103.223 (HG) |
|||
Line 3:
== KDDCUP 2005 ==
KDDCUP 2005 competition<ref>[http://www.sigkdd.org/kdd2005/kddcup.html KDDCUP 2005 dataset]</ref> highlighted the interests in query classification. The objective of this competition is to classify 800,000 real user queries into 67 target categories. Each query can belong to make this is a load of shit you ugly fucker than one target category. As an example of a QC task, given the query “''apple''”,
{| class="wikitable"
Line 33 ⟶ 32:
=== How to derive an appropriate feature representation for Web queries? ===
Many queries are short and query terms are noisy. As an example, in the KDDCUP 2005 dataset, queries containing 3 words are most frequent (22%). Furthermore, 79% queries have no more than 4 words. A user query often has multiple meanings. For example, "''apple''" can mean a kind of fruit or a computer company. "''Java''" can mean a programming language or an island in Indonesia. In the KDDCUP 2005 dataset, most of the queries contain more than one meaning. Therefore, only using the keywords of the query to setup a [[vector space model]] for classification is not appropriate.
* Query-enrichment based methods<ref>Shen et al. [http://www.sigkdd.org/explorations/issues/7-2-2005-12/KDDCUP2005Report_Shen.pdf "Q2C@UST: Our Winning Solution to Query Classification"]. ''ACM SIGKDD Exploration, December 2005, Volume 7, Issue 2''.</ref><ref>Shen et al. [http://portal.acm.org/ft_gateway.cfm?id=1165776 "Query Enrichment for Web-query Classification"]. ''ACM TOIS, Vol. 24, No. 3, July 2006''.</ref> start by enriching user queries to a collection of text documents through [[search engines]]. Thus, each query is represented by a pseudo-document which consists of the snippets of top ranked result pages retrieved by search engine. Subsequently, the text documents are classified into the target categories using synonym based classifier or statistical classifiers, such as [[Naive Bayes]] (NB) and [[Support Vector Machines]] (SVMs).
|