Prior knowledge for pattern recognition: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:08, 28 April 2007 edit SmackBot (talk \| contribs) 3,734,324 edits m Date/fix the maintenance tags or gen fixes ← Previous edit		Latest revision as of 08:20, 17 May 2025 edit undo Flod logic (talk \| contribs) Autopatrolled, Extended confirmed users 26,310 edits m Spelling/grammar/punctuation/typographical correction Tag: Visual edit
(35 intermediate revisions by 24 users not shown)
Line 1: [[Pattern recognition]] is a very active field of research intimately bound to [[machine learning]]. Also known as classification or [[statistical classification]], pattern recognition aims at building a [[classifier (mathematics)\|classifier]] that can determine the class of an input pattern. This procedure, known as training, corresponds to learning an unknown decision function based only on a set of input-output pairs <math>(\boldsymbol{x}_i,y_i)</math> that form the training data (or training set). Nonetheless, in real world applications such as [[character recognition]], a certain amount of information on the problem is usually known beforehand. The incorporation of this prior knowledge into the training is the key element that will allow an increase of performance in many applications.▼ ~~{{Orphan\|date=November 2006}}~~ ▲[[Pattern recognition]] is a very active field of research intimately bound to [[machine learning]]. Also known as classification or [[statistical classification]], pattern recognition aims at building a [[classifier]] that can determine the class of an input pattern. This procedure, known as training, corresponds to learning an unknown decision function based only on a set of input-output pairs <math>(\boldsymbol{x}_i,y_i)</math> that form the training data (or training set). Nonetheless, in real world applications such as [[character recognition]], a certain amount of information on the problem is usually known beforehand. The incorporation of this prior knowledge into the training is the key element that will allow an increase of performance in many applications. == ~~Definition~~Prior knowledge == Prior knowledge,<ref>B. asScholkopf ~~defined~~and inA. Smola, "[~~Scholkopf02~~https://books.google.com/books?id=y8ORL3DWt4sC&q=%22prior+knowledge%22 Learning with Kernels]", MIT Press 2002.</ref> refers to all information about the problem available in addition to the training data. However, in this most general form, determining a [[Model (abstract)\|model]] from a finite set of samples without prior knowledge is an [[ill-posed]] problem, in the sense that a unique model may not exist. Many classifiers incorporate the general smoothness assumption that a test pattern similar to one of the training samples tends to be assigned to the same class. InThe importance of prior knowledge in machine learning, ~~the~~is ~~importance~~suggested ofby ~~prior~~its ~~knowledge~~role ~~can~~in besearch ~~seen~~and ~~from~~optimization. Loosely, the [[No free lunch in search and optimization\|no free lunch theorem]] ~~which~~ states that all ~~the~~search algorithms have the same average performance over all ~~the~~ problems, and thus implies that to gain in performance on a certain application one must use a specialized algorithm that includes some prior knowledge about the problem. <!-- This sentence is still not right. Read the "no free lunch" article to see why. David Wolpert actually published NFL-like results for machine learning before moving to optimization with Bill Macready. Check his web site at NASA for a list of his publications.--> The different types of prior knowledge encountered in pattern recognition are now regrouped under two main categories: class-invariance and knowledge on the data. == Class-invariance == A very common type of prior knowledge in pattern recognition is the invariance of the class (or the output of the classifier) to a [[Transformation (geometry)\|transformation]] of the input pattern. This type of knowledge is referred to as '''transformation-invariance'''. The mostly used transformations used in image recognition are: * [[Translation (geometry)\|translation]]; * [[Rotation (mathematics)\|rotation]]; * [[Shear mapping\|skewing]]; * [[Scaling (geometry)\|scaling]]. Incorporating the invariance to a transformation <math>T_{\theta}: \boldsymbol{x} \mapsto T_{\theta}\boldsymbol{x}</math> parametrized in <math>\theta</math> into a classifier of output <math>f(\boldsymbol{x})</math> for an input pattern <math>\boldsymbol{x}</math> corresponds to ~~enforce~~enforcing the equality :<math> f(\boldsymbol{x}) = f(T_{\theta}\boldsymbol{x}), \quad \forall \boldsymbol{x}, \theta .</math> ~~</math>~~ Local invariance can also be considered for a transformation centered at <math>\theta=0</math>, so that <math>T_0\boldsymbol{x} = \boldsymbol{x}</math>, by using the constraint :<math> \left.\frac{\partial}{\partial \theta}\right\|_{\theta=0} f(T_{\theta} \boldsymbol{x}) = 0 . </math> ItThe ~~must be noted that~~function <math>f</math> in these ~~Equations~~equations can be either the decision function of the classifier or its real-valued output. Another approach is to consider ~~the~~ class-invariance with respect to a "___domain of the input space" instead of a transformation. In this case, the problem becomes finding <math>f</math> so that :<math> f(\boldsymbol{x}) = y_{\mathcal{P}},\ \forall \boldsymbol{x}\in \mathcal{P} , </math> where <math>y_{\mathcal{P}}</math> is the membership class of the region <math>\mathcal{P}</math> of the input space. A different type of class-invariance found in pattern recognition is ~~the~~ '''permutation-invariance''', i.e. invariance of the class to a permutation of elements in a structured input. A typical application of this type of prior knowledge is a classifier invariant to permutations of rows inof the matrix inputs. == Knowledge onof the data == Other forms of prior knowledge than class-invariance concern the data more specifically and are thus of particular interest for real-world applications. The three particular cases that most often occur when gathering data are: * '''Unlabeled samples''' are available with supposed class-memberships; * '''Imbalance''' of the training set due to a high proportion of samples of a class; * '''Quality of the data''' may vary from a sample to another. Prior knowledge onof these can enhance the quality of the recognition if included in the learning. Moreover, not taking into account the poor quality of some data or a large imbalance between the classes can mislead the decision of a classifier.▼ == Notes == <references/> ▲Prior knowledge on these can enhance the quality of the recognition if included in the learning. Moreover, not taking into account the poor quality of some data or a large imbalance between the classes can mislead the decision of a classifier. == References == * E. Krupka and N. Tishby, "[https://proceedings.mlr.press/v2/krupka07a.html Incorporating Prior Knowledge on Features into Learning]", Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 07) * [Scholkopf02], B. Scholkopf and A. Smola, "Learning with Kernels", MIT Press 2002. [[Category:Machine learning]] [[Category:Statistical classification]]