Revision as of 23:06, 29 May 2011 edit 92.226.51.13 (talk) No edit summary ← Previous edit		Revision as of 17:28, 6 October 2011 edit undo Johndburger (talk \| contribs) Extended confirmed users, Rollbackers 4,230 edits More refs; Attempt to add clearer explanations; Many minor cleanups. Next edit →
Line 1: The '''winnow algorithm'''<ref name="littlestone88"> ~~The~~Nick ~~'''winnow algorithm'''<ref>~~Littlestone~~, N.~~ (1988). "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm", ''[http://www.springerlink.com/content/j0k7t38567325716/ Machine Learning 285-318(2)].</ref> is a technique from [[machine learning]] for learning a [[linear classifier]] from labeled examples. It is very similar to the [[perceptron\|perceptron algorithm]]. However, the perceptron algorithm uses an additive weight-update scheme, ~~but~~while ~~winnow~~WInnow uses a multiplicative ~~weight-update~~ scheme that allows it to perform much better when many dimensions are irrelevant (hence its name). It is not a sophisticated algorithm but it scales well to high-dimensional spaces. During training, ~~winnow~~Winnow is shown a sequence of positive and negative examples. From these it learns a decision [[hyperplane]] that can then be used to label novel examples as positive or negative. It The algorithm can also be used in the [[Online machine learning\|online learning]] setting, where the learning and the classification phase are not clearly separated. ==The algorithm== The basic algorithm, Winnow1, is given as follows. The instance space is <math>X=\{0,1\}^n</math>, that is, each instance is described as a set of [[Boolean-valued]] [[features (pattern recognition)\|features]]. The algorithm maintains non-negative weights <math>w_i</math> for <math>i\in \{1...n\}</math>, which are initially set to 1, one weight for each feature. When the learner is given an example <math>(x_1,...x_n)</math>, ~~the~~it ~~learner follows~~applies the ~~following~~typical prediction rule for linear classifiers: * '''If''' <math>\sum_{i=1}^n w_i x_i > \Theta </math>, '''then''' ~~it predicts~~predict 1 * '''Otherwise''' ~~it predicts~~predict 0 ~~Where~~Here <math>\Theta</math> is a real number that is called the ''threshold''. Together with the weights, the threshold defines a dividing hyperplane in the instance space. Good bounds are obtained if <math>\Theta=n/2</math> (see below). For each example with which it is presented, the learner apples the following update rule: ~~The update rule is (loosely):~~ * If an example is correctly classified, do nothing. * If an example is predicted to be 1 but the correct result was 0, all of the weights ~~involved~~implicated in the mistake are set to zero (demotion step). * If an example is predicted to be 0 but the correct result was 1, all of the weights ~~involved~~implicated in the mistake are multiplied by <math>\alpha</math> (promotion step). Here, "implicated" means weights on features of the instance that have value 1. A ~~good~~typical value for <math>\alpha</math> is 2. ~~Variations~~There are ~~also~~many ~~used.~~variations to ~~For~~this ~~example,~~basic ~~Winnow2~~approach. is ~~the~~''Winnow2''<ref ~~same~~name="littlestone88"/> asis ~~Winnow1~~similar except that in the demotion step the weights are divided by <math>\alpha</math> instead of being set to 0. ''Balanced Winnow'' maintains two sets of weights, and thus two hyperplanes. This can then be generalized for [[multi-label classification]]. ==Mistake bounds== IfIn ~~Winnow1~~certain circumstances, it can be shown that the number of mistakes Winnow makes as it learns has an [[Upper and lower bounds\|upper bound]] that is ~~run~~independent of the number of instances with which it is presented. If the Winnow1 algorithm uses <math>\alpha > 1</math> and <math>\Theta \geq 1/\alpha</math> on a target function that is a <math>k</math>-literal monotone disjunction given by <math>f(x_1,...x_n)=x_{i_1}\cup ... \cup x_{i_k}</math>, then for any sequence of instances the total number of mistakes is bounded by ~~<math>\alpha k ( \log_\alpha \Theta+1)+\frac{n}{\Theta}</math>.~~: <math>\alpha k ( \log_\alpha \Theta+1)+\frac{n}{\Theta}</math>.<ref> Nick Littlestone (1989). "Mistake bounds and logarithmic linear-threshold learning algorithms". Technical report UCSC-CRL-89-11, University of California, Santa Cruz.</ref> ==References== ~~===Citations and notes===~~ <references/> [[Category:Classification algorithms]] [[Category:Machine learning]] {{Algorithm-stub}}

Winnow (algorithm): Difference between revisions