Content deleted Content added
No edit summary Tags: Mobile edit Mobile web edit Advanced mobile edit |
HeyElliott (talk | contribs) |
||
Line 23:
In our setting, we have <math>h=\mathcal{A}(S_n)</math>, where <math>\mathcal{A}</math> is a learning algorithm and <math>S_n = ((x_1,y_1),\ldots,(x_n,y_n))\sim \rho^n</math> is a sequence of vectors which are all drawn independently from <math>\rho</math>. Define the optimal risk<math display="block">
\mathcal E^*_\mathcal{H} = \underset{h \in \mathcal H}{\inf}\mathcal E(h).
</math>Set <math>h_n=\mathcal{A}(S_n)</math>, for each [[sample size]] <math>n</math>.
<math display="block">
Line 32:
In others words, the sample complexity <math>N(\rho,\epsilon,\delta)</math> defines the rate of consistency of the algorithm: given a desired accuracy <math>\epsilon</math> and confidence <math>\delta</math>, one needs to sample <math>N(\rho,\epsilon,\delta)</math> data points to guarantee that the risk of the output function is within <math>\epsilon</math> of the best possible, with probability at least <math>1 - \delta</math> .<ref name="Rosasco">{{citation |last = Rosasco | first = Lorenzo | title = Consistency, Learnability, and Regularization | series = Lecture Notes for MIT Course 9.520. | year = 2014 }}</ref>
In [[Probably approximately correct learning|probably approximately correct (PAC) learning]], one is concerned with whether the sample complexity is ''polynomial'', that is, whether <math>N(\rho,\epsilon,\delta)</math> is bounded by a polynomial in <math>1/\epsilon</math> and <math>1/\delta</math>. If <math>N(\rho,\epsilon,\delta)</math> is polynomial for some learning algorithm, then one says that the hypothesis space <math> \mathcal H </math> is '''PAC-learnable'''.
==Unrestricted hypothesis space: infinite sample complexity==
Line 61:
=== An example of a PAC-learnable hypothesis space ===
<math>X = \R^d, Y = \{-1, 1\}</math>, and let <math>\mathcal H</math> be the space of affine functions on <math>X</math>, that is, functions of the form <math>x\mapsto \langle w,x\rangle+b</math> for some <math>w\in\R^d,b\in\R</math>. This is the linear classification with offset learning problem. Now,
=== Sample-complexity bounds ===
|