Revision as of 21:43, 15 August 2024 edit RJFJR (talk \| contribs) Administrators 166,991 edits →Difficulties: fmt ← Previous edit		Revision as of 21:54, 15 August 2024 edit undo RJFJR (talk \| contribs) Administrators 166,991 edits →How to derive an appropriate feature representation for Web queries?: === Derive an appropriate feature representation for Web queries === Next edit →
Line 31: Web query topic classification is to automatically assign a query to some predefined categories. Different from the traditional document classification tasks, there are several major difficulties which hinder the progress of Web [[query understanding]]: === ~~How to derive~~Derive an appropriate feature representation for Web queries? === Many queries are short and query terms are noisy. As an example, in the KDDCUP 2005 dataset, queries containing 3 words are most frequent (22%). Furthermore, 79% queries have no more than 4 words. A user query often has multiple meanings. For example, "''apple''" can mean a kind of fruit or a computer company. "''Java''" can mean a programming language or an island in Indonesia. In the KDDCUP 2005 dataset, most of the queries contain more than one meaning. Therefore, only using the keywords of the query to set up a [[vector space model]] for classification is not appropriate.

Web query classification: Difference between revisions