Revision as of 07:13, 24 December 2023 edit Biggerj1 (talk \| contribs) Extended confirmed users 2,170 edits No edit summary Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 04:15, 31 January 2024 edit undo HeyElliott (talk \| contribs) Extended confirmed users 121,470 edits MOS:NOTE Tag: 2017 wikitext editor Next edit →
Line 23: In our setting, we have <math>h=\mathcal{A}(S_n)</math>, where <math>\mathcal{A}</math> is a learning algorithm and <math>S_n = ((x_1,y_1),\ldots,(x_n,y_n))\sim \rho^n</math> is a sequence of vectors which are all drawn independently from <math>\rho</math>. Define the optimal risk<math display="block"> \mathcal E^_\mathcal{H} = \underset{h \in \mathcal H}{\inf}\mathcal E(h). </math>Set <math>h_n=\mathcal{A}(S_n)</math>, for each [[sample size]] <math>n</math>. ~~Note that~~ <math>h_n</math> is a [[random variable]] and depends on the random variable <math>S_n</math>, which is drawn from the distribution <math>\rho^n</math>. The algorithm <math>\mathcal{A}</math> is called '''consistent''' if <math> \mathcal E(h_n) </math> probabilistically converges to <math> \mathcal E_\mathcal H^</math>. In other words, for all <math>\epsilon, \delta > 0</math>, there exists a positive integer <math>N</math>, such that, for all sample sizes <math>n \geq N</math>, we have <math display="block"> Line 32: In others words, the sample complexity <math>N(\rho,\epsilon,\delta)</math> defines the rate of consistency of the algorithm: given a desired accuracy <math>\epsilon</math> and confidence <math>\delta</math>, one needs to sample <math>N(\rho,\epsilon,\delta)</math> data points to guarantee that the risk of the output function is within <math>\epsilon</math> of the best possible, with probability at least <math>1 - \delta</math> .<ref name="Rosasco">{{citation \|last = Rosasco \| first = Lorenzo \| title = Consistency, Learnability, and Regularization \| series = Lecture Notes for MIT Course 9.520. \| year = 2014 }}</ref> In [[Probably approximately correct learning\|probably approximately correct (PAC) learning]], one is concerned with whether the sample complexity is ''polynomial'', that is, whether <math>N(\rho,\epsilon,\delta)</math> is bounded by a polynomial in <math>1/\epsilon</math> and <math>1/\delta</math>. If <math>N(\rho,\epsilon,\delta)</math> is polynomial for some learning algorithm, then one says that the hypothesis space <math> \mathcal H </math> is '''PAC-learnable'''. ~~Note that this~~This is a stronger notion than being learnable. ==Unrestricted hypothesis space: infinite sample complexity== Line 61: === An example of a PAC-learnable hypothesis space === <math>X = \R^d, Y = \{-1, 1\}</math>, and let <math>\mathcal H</math> be the space of affine functions on <math>X</math>, that is, functions of the form <math>x\mapsto \langle w,x\rangle+b</math> for some <math>w\in\R^d,b\in\R</math>. This is the linear classification with offset learning problem. Now, ~~note that~~ four coplanar points in a square cannot be shattered by any affine function, since no affine function can be positive on two diagonally opposite vertices and negative on the remaining two. Thus, the VC dimension of <math>\mathcal H</math> is <math>d + 1</math>, so it is finite. It follows by the above characterization of PAC-learnable classes that <math>\mathcal H</math> is PAC-learnable, and by extension, learnable. === Sample-complexity bounds ===

Sample complexity: Difference between revisions