Winnow (algorithm): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:10, 14 September 2008 edit JonathanWilliford (talk \| contribs) 169 edits No edit summary ← Previous edit		Latest revision as of 03:32, 13 February 2020 edit undo GreenC bot (talk \| contribs) Bots 3,053,580 edits Move 1 url. Wayback Medic 2.5
(46 intermediate revisions by 33 users not shown)
Line 1: The '''winnow algorithm''' <ref name="littlestone88"> Nick Littlestone~~, N.~~ (1988). ~~'''~~"Learning Quickly When Irrelevant Attributes ~~About~~Abound: A New Linear-threshold Algorithm", [https://doi.org/10.1023%2FA%3A1022869011914 ''' Machine Learning'' ~~285-318~~285–318(2)].</ref> is a technique from [[machine learning]] for learning a [[linear classifier]] from labeled examples. It is ~~closely~~very ~~related~~similar to the [[~~Perceptron~~perceptron\|perceptron algorithm]]. However, ~~but~~the itperceptron algorithm uses aan ~~multiplicative~~additive weight-update scheme, while Winnow uses a [[Multiplicative Weight Update Method\|multiplicative scheme]] that allows it to perform much better ~~than the perceptron~~ when many dimensions are irrelevant (hence its name [[winnowing\|winnow]]). It is ~~not~~ a ~~sophisticated~~simple algorithm ~~but it~~that scales well to high-dimensional ~~spaces~~data. During training, ~~winnow~~Winnow is shown a sequence of positive and negative examples. From these it learns a decision [[hyperplane]] that can then be used to label novel examples as positive or negative. The algorithm can also be used in the [[Online machine learning\|online learning]] setting, where the learning and the classification phase are not clearly separated.▼ ~~{{context}}~~ ▲The '''winnow algorithm''' <ref>Littlestone, N. (1988) '''Learning Quickly When Irrelevant Attributes About: A New Linear-threshold Algorithm''' Machine Learning 285-318(2)</ref> is a technique from machine learning. It is closely related to the [[Perceptron]], but it uses a multiplicative weight-update scheme that allows it perform much better than the perceptron when many dimensions are irrelevant (hence its name). It is not a sophisticated algorithm but it scales well to high-dimensional spaces. During training, winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane. ==~~The algorithm~~Algorithm== The basic algorithm, Winnow1, is as follows. The instance space is <math>X=\{0,1\}^n</math>, that is, each instance is described as a set of [[Boolean-valued]] [[features (pattern recognition)\|features]]. The algorithm maintains non-negative weights <math>w_i</math> for <math>i\in \{1~~...~~,\ldots,n\}</math>, which are initially set to 1, one weight for each feature. When the learner is given an example $<math>(x_1,~~...~~\ldots,x_n)$</math>, ~~the~~it ~~learner follows~~applies the ~~following~~typical prediction rule for linear classifiers:▼ ~~The basic algorithm, Winnow1, is given as follows.~~ ▲The instance space is <math>X=\{0,1\}^n</math>. The algorithm maintains non-negative weights <math>w_i</math> for <math>i\in \{1...n\}</math> which are initially set to 1. When the learner is given an example $(x_1,...x_n)$, the learner follows the following prediction rule: * '''If''' <math>\sum_{i=1}^n w_i x_i > \Theta </math>, '''then''' ~~it predicts~~predict 1 * '''Otherwise''' ~~it predicts~~predict 0 ~~Where~~Here <math>\Theta</math> is a real number that is called the ''threshold''. Together with the weights, the threshold defines a dividing hyperplane in the instance space. Good bounds are obtained if <math>\Theta=n/2</math> (see below). For each example with which it is presented, the learner applies the following update rule: ~~The update rule is (loosely):~~ * If an example is correctly classified, do nothing. * If an example is predicted toincorrectly ~~be 1 but~~and the correct result was 0, ~~all~~for ofeach ~~the~~feature ~~weights~~<math>x_{i}=1</math>, ~~involved~~the incorresponding ~~the~~weight ~~mistake~~<math>w_{i}</math> ~~are~~is set to ~~zero~~0 (demotion step). : <math>\forall x_{i} = 1, w_{i} = 0</math> If an example is predicted to be 0 but the correct result was 1, all of the weights involved in the mistake are multiplied by <math>\alpha</math>. * If an example is predicted incorrectly and the correct result was 1, for each feature <math>x_{i}=1</math>, the corresponding weight <math>w_{i}</math> multiplied by {{mvar\|α}}(promotion step). *: <math>\forall x_{i} = 1, w_{i} = \alpha w_{i}</math> A ~~good~~typical value for ~~<math>\~~{{mvar\|&alpha~~</math>~~;}} is 2. There are many variations to this basic approach. ''Winnow2''<ref name="littlestone88"/> is similar except that in the demotion step the weights are divided by {{mvar\|α}} instead of being set to 0. ''Balanced Winnow'' maintains two sets of weights, and thus two hyperplanes. This can then be generalized for [[multi-label classification]]. ~~Variations are also used.~~ ==Mistake bounds== In certain circumstances, it can be shown that the number of mistakes Winnow makes as it learns has an [[Upper and lower bounds\|upper bound]] that is independent of the number of instances with which it is presented. If the Winnow1 algorithm uses <math>\alpha > 1</math> and <math>\Theta \geq 1/\alpha</math> on a target function that is a <math>k</math>-literal monotone disjunction given by <math>f(x_1,\ldots,x_n)=x_{i_1}\cup \cdots \cup x_{i_k}</math>, then for any sequence of instances the total number of mistakes is bounded by: <math>\alpha k ( \log_\alpha \Theta+1)+\frac{n}{\Theta}</math>.<ref> Nick Littlestone (1989). "Mistake bounds and logarithmic linear-threshold learning algorithms". Technical report UCSC-CRL-89-11, University of California, Santa Cruz.</ref> ==References== ~~===Citations and notes===~~ <references/> [[Category:Classification algorithms]] ~~{{compsci-stub}}~~