Winnow (algorithm): Difference between revisions

Content deleted Content added
minor: typo in title
GreenC bot (talk | contribs)
Move 1 url. Wayback Medic 2.5
 
(39 intermediate revisions by 31 users not shown)
Line 1:
The '''winnow algorithm''' <ref name="littlestone88">Littlestone, N.Nick Littlestone (1988). "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm", [https://doi.org/10.1023%2FA%3A1022869011914 ''Machine Learning'' 285-318285–318(2)].</ref> is a technique from [[machine learning]] for learning a [[linear classifier]] from labeled examples. It is closelyvery relatedsimilar to the [[Perceptronperceptron|perceptron algorithm]]. However, butthe itperceptron algorithm uses aan multiplicativeadditive weight-update scheme, while Winnow uses a [[Multiplicative Weight Update Method|multiplicative scheme]] that allows it to perform much better than the perceptron when many dimensions are irrelevant (hence its name [[winnowing|winnow]]). It is not a sophisticatedsimple algorithm but itthat scales well to high-dimensional spacesdata. During training, winnowWinnow is shown a sequence of positive and negative examples. From these it learns a decision [[hyperplane]] that can then be used to label novel examples as positive or negative. It The algorithm can also be used in the [[Online machine learning|online learning]] setting, where the learning phaseand isthe notclassification separatedphase fromare thenot trainingclearly phaseseparated.
{{context}}
The '''winnow algorithm''' <ref>Littlestone, N. (1988) "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm" Machine Learning 285-318(2)</ref> is a technique from machine learning. It is closely related to the [[Perceptron]], but it uses a multiplicative weight-update scheme that allows it perform much better than the perceptron when many dimensions are irrelevant (hence its name). It is not a sophisticated algorithm but it scales well to high-dimensional spaces. During training, winnow is shown a sequence of positive and negative examples. From these it learns a decision hyperplane. It can also be used in the online learning setting, where the learning phase is not separated from the training phase.
 
==The algorithmAlgorithm==
 
The basic algorithm, Winnow1, is as follows. The instance space is <math>X=\{0,1\}^n</math>, that is, each instance is described as a set of [[Boolean-valued]] [[features (pattern recognition)|features]]. The algorithm maintains non-negative weights <math>w_i</math> for <math>i\in \{1...,\ldots,n\}</math>, which are initially set to 1, one weight for each feature. When the learner is given an example <math>(x_1,...\ldots,x_n)</math>, theit learner followsapplies the followingtypical prediction rule for linear classifiers:
The basic algorithm, Winnow1, is given as follows.
The instance space is <math>X=\{0,1\}^n</math>. The algorithm maintains non-negative weights <math>w_i</math> for <math>i\in \{1...n\}</math> which are initially set to 1. When the learner is given an example <math>(x_1,...x_n)</math>, the learner follows the following prediction rule:
 
* '''If''' <math>\sum_{i=1}^n w_i x_i > \Theta </math>, '''then''' it predictspredict 1
* '''Otherwise''' it predictspredict 0
 
WhereHere <math>\Theta</math> is a real number that is called the ''threshold''. Together with the weights, the threshold defines a dividing hyperplane in the instance space. Good bounds are obtained if <math>\Theta=n/2</math> (see below).
 
For each example with which it is presented, the learner applies the following update rule:
The update rule is (loosely):
 
* If an example is correctly classified, do nothing.
* If an example is predicted to be 1incorrectly butand the correct result was 0, allfor ofeach thefeature weights<math>x_{i}=1</math>, involvedthe incorresponding theweight mistake<math>w_{i}</math> areis set to zero0 (demotion step).
*: <math>\forall x_{i} = 1, w_{i} = 0</math>
* If an example is predicted toincorrectly be 0 butand the correct result was 1, allfor ofeach thefeature weights involved in<math>x_{i}=1</math>, the mistakecorresponding are multiplied byweight <math>\alphaw_{i}</math> multiplied by {{mvar|&alpha;}}(promotion step).
*: <math>\forall x_{i} = 1, w_{i} = \alpha w_{i}</math>
 
A goodtypical value for <math>\{{mvar|&alpha</math>;}} is 2.
 
VariationsThere are alsomany used.variations to Forthis example,basic Winnow2approach. is the''Winnow2''<ref samename="littlestone88"/> asis Winnow1similar except that in the demotion step the weights are multiplieddivided by <math>\frac{1}{\mvar|&alpha;}}</math> instead of being set to 0. ''Balanced Winnow'' maintains two sets of weights, and thus two hyperplanes. This can then be generalized for [[multi-label classification]].
 
==Mistake bounds==
IfIn Winnow1certain circumstances, it can be shown that the number of mistakes Winnow makes as it learns has an [[Upper and lower bounds|upper bound]] that is runindependent of the number of instances with which it is presented. If the Winnow1 algorithm uses <math>\alpha > 1</math> and <math>\Theta \geq 1/\alpha</math> on a target function that is a <math>k</math>-literal monotone disjunction given by <math>f(x_1,...\ldots,x_n)=x_{i_1}\cup ...\cdots \cup x_{i_k}</math>, then for any sequence of instances the total number of mistakes is bounded by <math>\alpha k ( \log_\alpha \Theta+1)+\frac{n}{\Theta}</math>.:
<math>\alpha k ( \log_\alpha \Theta+1)+\frac{n}{\Theta}</math>.<ref>
Nick Littlestone (1989). "Mistake bounds and logarithmic linear-threshold learning algorithms".
Technical report UCSC-CRL-89-11, University of California, Santa Cruz.</ref>
 
==References==
===Citations and notes===
<references/>
 
[[Category:Classification algorithms]]
 
{{compsci-stub}}