Revision as of 15:26, 29 August 2024 edit Dewritech (talk \| contribs) Extended confirmed users, New page reviewers, Rollbackers 175,278 edits m →top: clean up, typo(s) fixed: e.g, → e.g., Tag: AWB ← Previous edit		Revision as of 10:35, 22 February 2025 edit undo 2a00:a041:e537:4a00:a1e4:c673:a1be:46f4 (talk) →fixing a typo Tag: Visual edit Next edit →
Line 16: Let <math>X</math> be a space which we call the input space, and <math>Y</math> be a space which we call the output space, and let <math>Z</math> denote the product <math>X\times Y</math>. For example, in the setting of binary classification, <math>X</math> is typically a finite-dimensional vector space and <math>Y</math> is the set <math>\{-1,1\}</math>. Fix a hypothesis space <math>\mathcal H</math> of functions <math>h\colon X\to Y</math>. A learning algorithm over <math>\mathcal H</math> is a computable map from <math>Z^</math> to <math>\mathcal H</math>. In other words, it is an algorithm that takes as input a finite sequence of training samples and outputs a function from <math>X</math> to <math>Y</math>. Typical learning algorithms include [[empirical risk minimization]], without or with [[Tikhonov regularization]]. Fix a loss function <math>\mathcal{L}\colon Y\times Y\to\R_{\geq 0}</math>, for example, the square loss <math>\mathcal{L}(y, y') = (y - y')^2</math>, where <math>h(x) = y'</math>. For a given distribution <math>\rho</math> on <math>X\times Y</math>, the '''expected risk''' of a hypothesis (a function) <math>h\in\mathcal H</math> is Line 28: <math display="block"> \Pr_{\rho^n}[\mathcal E(h_n) - \mathcal E^_\mathcal{H}\geq\varepsilon]<\delta. </math>The '''sample complexity''' of <math>\mathcal{A}</math> is then the minimum <math>N</math> for which this holds, as a function of <math>\rho, \epsilon</math>, and <math>\delta</math>. We write the sample complexity as <math>N(\rho, \epsilon, \delta)</math> to emphasize that this value of <math>N</math> depends on <math>\rho, \epsilon</math>, and <math>\delta</math>. If <math>\mathcal{A}</math> is '''not consistent''', then we set <math>N(\rho,\epsilon,\delta)=\infty</math>. If there exists an algorithm for which <math>N(\rho,\epsilon,\delta)</math> is finite, then we say that the hypothesis space <math> \mathcal H</math> is '''learnable'''.▼ ~~</math>~~ ▲The '''sample complexity''' of <math>\mathcal{A}</math> is then the minimum <math>N</math> for which this holds, as a function of <math>\rho, \epsilon</math>, and <math>\delta</math>. We write the sample complexity as <math>N(\rho, \epsilon, \delta)</math> to emphasize that this value of <math>N</math> depends on <math>\rho, \epsilon</math>, and <math>\delta</math>. If <math>\mathcal{A}</math> is '''not consistent''', then we set <math>N(\rho,\epsilon,\delta)=\infty</math>. If there exists an algorithm for which <math>N(\rho,\epsilon,\delta)</math> is finite, then we say that the hypothesis space <math> \mathcal H</math> is '''learnable'''. In others words, the sample complexity <math>N(\rho,\epsilon,\delta)</math> defines the rate of consistency of the algorithm: given a desired accuracy <math>\epsilon</math> and confidence <math>\delta</math>, one needs to sample <math>N(\rho,\epsilon,\delta)</math> data points to guarantee that the risk of the output function is within <math>\epsilon</math> of the best possible, with probability at least <math>1 - \delta</math> .<ref name="Rosasco">{{citation \|last = Rosasco \| first = Lorenzo \| title = Consistency, Learnability, and Regularization \| series = Lecture Notes for MIT Course 9.520. \| year = 2014 }}</ref>

Sample complexity: Difference between revisions