Sample complexity

This is an old revision of this page, as edited by BG19bot (talk | contribs) at 07:56, 20 December 2014 (WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (10514)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In machine learning, sample complexity is the number of examples needed for the estimate of a target function to be within a given error rate.[1] The sample complexity of a machine learning algorithm characterizes its rate of consistency.

Mathematical Setup

Given a set of samples   drawn i.i.d. according to a distribution   from some input space  , a supervised learning algorithm chooses a function   from some hypothesis class  . A desirable property of the algorithm is that it chooses functions with small expected prediction error with respect to   and some loss function  . Specifically, it is desirable to have a consistent algorithm, or an algorithm that generates functions whose expected risks converge to the best possible expected risk. Formally, let

 

and let   be the functions generated by an algorithm as the number of data points   grows. The algorithm is consistent if

 

for all  , where   denotes the probability measure  .

The consistency property is nice, but it says nothing about how fast the expected risks converge. Since in practice one always deals with finite data, it is important to answer the question of how many samples are needed to achieve a risk that is close, in the   sense, to the best possible for the function class. The notion of sample complexity answers this question. The sample complexity of a learning algorithm is a function   such that for all  ,

 

In words, the sample complexity   defines the rate of consistency of the algorithm. Given a desired accuracy   and confidence  , one needs at most   samples to guarantee that the expected risk of the output function is within   of the best possible expected risk with probability at least  .[2]

No Free Lunch Theorem (Machine Learning)

Optimistically one could hope for a stronger notion of sample complexity that is independent of the distribution   on the input and output spaces. However, it has been shown that without restrictions on the hypothesis class  , there always exists "bad" distributions for which the sample complexity is arbitrarily large.[3] Thus in order to make statements about the rate of convergence of the quantity

 

one must either

  • Constrain the set of probability distributions  , e.g. via a parametric approach, or
  • Constrain the set   to be small, as in distribution free approaches.

The latter approach leads to concepts such as VC dimension and Rademacher complexity which control the complexity of the space  . A smaller hypothesis space introduces more bias into the inference process, meaning that   may be larger than the best possible expected risk in a larger space. However, by restricting the complexity of the hypothesis space it becomes possible for an algorithm to produce functions converging in expected risk to  . This trade-off leads to the concept of regularization.[2]

Other Settings

In addition to the supervised learning setting, sample complexity is relevant to semi-supervised learning problems including active learning,[1] where the algorithm can ask for labels to specifically chosen inputs in order to reduce the cost of obtaining many labels. The concept of sample complexity also shows up in reinforcement learning,[4] online learning, and unsupervised algorithms, e.g. for dictionary learning.[5]

References

  1. ^ a b Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/s10994-010-5174-y, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/s10994-010-5174-y instead.
  2. ^ a b Rosasco, Lorenzo (2014), Consistency, Learnability, and Regularization, Lecture Notes for MIT Course 9.520.
  3. ^ Vapnik, Vladimir (1998), Statistical Learning Theory, New York: Wiley.
  4. ^ Kakade, Sham (2003), On the Sample Complexity of Reinforcement Learning (PDF), PhD Thesis, University College London: Gatsby Computational Neuroscience Unit.
  5. ^ Vainsencher, Daniel; Mannor, Shie; Bruckstein, Alfred (2011). "The Sample Complexity of Dictionary Learning" (PDF). Journal of Machine Learning Research. 12: 3259–3281.