Given a sample <math>S_n = \{(x_1,y_1),\ldots,(x_n,y_n)\}</math> drawn i.i.d. according tofrom a distribution <math>\rho</math> fromon some input space <math>\mathcal X \times \mathcal Y</math>, a supervised learning algorithm chooses a function <math>f:\mathcal X \to \mathcal Y</math> from some hypothesis class <math>\mathcal H</math>. A desirable property of the algorithm is that it chooses functions with small expected prediction error with respect to <math>\rho</math> and some loss function <math>V:\mathcal Y \times \mathcal Y \to \mathbb R_+</math>. Specifically, it is desirable to have a consistent algorithm, or an algorithm that generates functions whose expected loss or [[empirical risk minimization|risk]]