Indicator function: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:45, 4 April 2022 edit D.Lazard (talk \| contribs) Extended confirmed users 35,608 edits →top: clearer ← Previous edit		Latest revision as of 13:47, 8 May 2025 edit undo TTWIDEE (talk \| contribs) Extended confirmed users 2,260 edits m Fixed punctuation
(30 intermediate revisions by 20 users not shown)
Line 1: {{Short description\|Mathematical function characterizing set membership}} {{About\|the 0-–1 indicator function\|the 0-–infinity indicator function\|characteristic function (convex analysis)}} {{More footnotes\|date=December 2009}} {{Use American English\|date = March 2019}} [[Image:Indicator function illustration.png\|right\|thumb\|A three-dimensional plot of an indicator function, shown over a square two-dimensional ___domain (set {{mvar\|X}}): the "raised" portion overlays those two-dimensional points which are members of the "indicated" subset ({{mvar\|A}}).]] In [[mathematics]], an '''indicator function''' or a '''characteristic function''' of a [[subset]] of a [[Set (mathematics)\|set]] is a [[Function (mathematics)\|function]] that maps elements of the subset to one, and all other elements to zero. That is, if {{mvar\|A}} is a subset of some set {{mvar\|X}}, ~~one~~then ~~has~~the indicator function of {{mvar\|A}} is the function <math>\mathbf{1}_A</math> defined by <math>\mathbf{1}_{A}\!(x) = 1</math> if <math>x \in A,</math> and <math>\mathbf{1}_{A}\!(x) = 0</math> otherwise~~, where <math>\mathbf{1}_A</math> is a common notation for the indicator function~~. Other common notations are <{{math~~>I_A,</math>~~\|𝟙{{sub\|''A''}}}} and <math>\chi_A.</math>{{efn\|name=χαρακτήρ}} The indicator function of {{mvar\|A}} is the [[Iverson bracket]] of the property of belonging to {{mvar\|A}}; that is, :<math display="block">\mathbf{1}_{A}(x) = \left[\ x\in A\ \right].</math> For example, the [[Dirichlet function]] is the indicator function of the [[rational number]]s as a subset of the [[real number]]s. ==Definition== ~~The~~Given an arbitrary set {{mvar\|X}}, the indicator function of a subset {{mvar\|A}} of ~~a set~~ {{mvar\|X}} is athe function <math display=block>\mathbf{1}_A \colon X \tomapsto \{ 0, 1 \} </math>▼ defined asby▼ ▲<math display=block>\mathbf{1}_A \colon X \to \{ 0, 1 \} </math> <math display="block" qid="Q371983">\operatorname\mathbf{1}_A\!( x ) :=▼ ▲defined as ▲<math display=block>\mathbf{1}_A(x) := \begin{cases} 1 ~& \text{ if }~ x \in A~, \\ 0 ~& \text{ if }~ x \notin A~ \,. \end{cases} </math> The [[Iverson bracket]] provides the equivalent notation, <math>\left[\ x\in A\ \right]</math> or {{~~nowrap~~nobr\|{{math\|⧙ ⟦ ''x'' ϵ∈ ''A'' ⧘ ⟧}},}} tothat can be used instead of <math>\mathbf{1}_{A}\!(x)\,.</math> The function <math>\mathbf{1}_A</math> is sometimes denoted {{~~mvar~~math\|I<𝟙{{sub>\|''A~~</sub>~~''}}}}, {{mvar\|~~χ~~I<sub>A</sub>}}, {{mvar\|Kχ<sub>A</sub>}}~~, or even just {{mvar\|A}}.~~{{efn\|name=χαρακτήρ\| The [[Greek alphabet\|Greek letter]] {{mvar\|χ}} appears because it is the initial letter of the Greek word {{lang\|grc\|{{math\|χαρακτήρ}}}}, which is the ultimate origin of the word ''characteristic''. }} or even just {{mvar\|A}}.{{efn\| The set of all indicator functions on {{mvar\|X}} can be identified with the set operator <math>\mathcal{P}(X),</math> the [[power set]] of {{mvar\|X}}. Consequently, both sets are ~~sometimes~~ denoted by the conventional [[abuse of notation]] as <math>2^X.,</math> in analogy to the relation for the count of elements in the powerset and the original set. This is a special case (<math>\left(Y = \{0,\, 1\} ~~= 2~~\right)</math>) of the notation <math>Y^X</math> for the set of all functions <math>f</math> such that <math>f: X \tomapsto Y \,.</math> }} ==Notation and terminology== Line 35 ⟶ 37: A related concept in [[statistics]] is that of a [[dummy variable (statistics)\|dummy variable]]. (This must not be confused with "dummy variables" as that term is usually used in mathematics, also called a [[free variables and bound variables\|bound variable]].) The term "[[characteristic function (probability theory)\|characteristic function]]" has an unrelated meaning in [[probability theory\|classic probability theory]]. For this reason, [[List of probabilists\|traditional probabilists]] use the term '''indicator function''' for the function defined here almost exclusively, while mathematicians in other fields are more likely to use the term ''characteristic function''~~{{efn\|name=χαρακτήρ}}~~ to describe the function that indicates membership in a set. In [[fuzzy logic]] and [[Many-valued logic\|modern many-valued logic]], predicates are the [[characteristic function (probability theory)\|characteristic functions]] of a [[probability distribution]]. That is, the strict true/false valuation of the predicate is replaced by a quantity interpreted as the degree of truth. ==Basic properties== The ''indicator'' or ''characteristic'' [[function (mathematics)\|function]] of a subset {{mvar\|A}} of some set {{mvar\|X}} [[Map (mathematics)\|maps]] elements of {{mvar\|X}} to the [[~~Range of a function\|range~~codomain]] <math>\{~~{closed-closed\|~~0,\, 1}\}.</math> This mapping is [[surjective]] only when {{mvar\|A}} is a non-empty [[proper subset]] of {{mvar\|X}}. If <math>A ~~\equiv~~= X,</math> then <math>\mathbf{1}_A= \equiv 1.</math> By a similar argument, if <math>A~~\equiv~~ = \emptyset</math> then <math>\mathbf{1}_A= \equiv 0.</math> In the following, the dot represents multiplication, <math>1\cdot1 = 1,</math> <math>1\cdot0 = 0,</math> etc. "+" and "−" represent addition and subtraction. "<math>\cap </math>" and "<math>\cup </math>" is intersection and union, respectively. If <math>A</math> and <math>B</math> are two subsets of <math>X,</math> then <math display=block>\begin{align} \mathbf{1}_{A\cap B}(x) ~&=~ \min\bigl\{\mathbf{1}_A(x),\ \mathbf{1}_B(x)\bigr\} ~~=~ \mathbf{1}_A(x) \cdot\mathbf{1}_B(x), \\ \mathbf{1}_{A\cup B}(x) ~&=~ \max\{bigl\{\mathbf{1}_A(x),\ \mathbf{1}_B}(x)\bigr\} ~=~ \mathbf{1}_A(x) + \mathbf{1}_B(x) - \mathbf{1}_A(x) \cdot \mathbf{1}_B(x)\,, \end{align}</math> and the indicator function of the [[Complement (set theory)\|complement]] of <math>A</math> i.e. <math>A^C\complement</math> is: <math display=block>\mathbf{1}_{A^\complement} = 1 - \mathbf{1}_A.</math> More generally, suppose <math>A_1, \dotsc, A_n</math> is a collection of subsets of {{mvar\|X}}. For any <math>x \in X:</math> <math display=block> \prod_{k \in I} \left(\ 1 - \mathbf{1}_{A_k}\!\left( x \right)\ \right)</math> is ~~clearly~~ a product of {{math\|0}}s and {{math\|1}}s. This product has the value {{math\|1}} at precisely those <math>x \in X</math> that belong to none of the sets <math>A_k</math> and is 0 otherwise. That is <math display=block> \prod_{k \in I} ( 1 - \mathbf{1}_{A_k}) = \mathbf{1}_{X - \bigcup_{k} A_k} = 1 - \mathbf{1}_{\bigcup_{k} A_k}.</math> Line 65: Expanding the product on the left hand side, <math display=block> \mathbf{1}_{\bigcup_{k} A_k}= 1 - \sum_{F \subseteq \{1, 2, \dotsc, n\}} (-1)^{\|F\|} \mathbf{1}_{\bigcap_F A_k} = \sum_{\emptyset \neq F \subseteq \{1, 2, \dotsc, n\}} (-1)^{\|F\|+1} \mathbf{1}_{\bigcap_F A_k} </math> where <math>\|F\|</math> is the [[cardinality]] of {{mvar\|F}}. This is one form of the principle of [[inclusion-exclusion]]. As suggested by the previous example, the indicator function is a useful notational device in [[combinatorics]]. The notation is used in other places as well, for instance in [[probability theory]]: if {{mvar\|X}} is a [[probability space]] with probability measure <math>\~~operatorname~~mathbb{P}</math> and {{mvar\|A}} is a [[Measure (mathematics)\|measurable set]], then <math>\mathbf{1}_A</math> becomes a [[random variable]] whose [[expected value]] is equal to the probability of {{mvar\|A}}: <math display=block>\operatorname\mathbb{E}(_X\left\{\ \mathbf{1}_A(x)\ \right\}\ =\ \int_{X} \mathbf{1}_A( x )\,d \operatorname{d\ \mathbb{P} }(x) = \int_{A} d\operatorname{d\ \mathbb{P} }(x) = \operatorname\mathbb{P}(A).</math> This identity is used in a simple proof of [[Markov's inequality]]. Line 80: Given a [[probability space]] <math>\textstyle (\Omega, \mathcal F, \operatorname{P})</math> with <math>A \in \mathcal F,</math> the indicator random variable <math>\mathbf{1}_A \colon \Omega \rightarrow \mathbb{R}</math> is defined by <math>\mathbf{1}_A (\omega) = 1 </math> if <math> \omega \in A,</math> otherwise <math>\mathbf{1}_A (\omega) = 0.</math> ;[[Mean]]: <math>\ \operatorname\mathbb{E}(\mathbf{1}_A (\omega)) = \operatorname\mathbb{P}(A)\ </math> (also called "Fundamental Bridge"). ;[[Variance]]: <math>\ \operatorname{Var}(\mathbf{1}_A (\omega)) = \operatorname\mathbb{P}(A)(1 - \operatorname\mathbb{P}(A)) .</math> ;[[Covariance]]: <math>\ \operatorname{Cov}(\mathbf{1}_A (\omega), \mathbf{1}_B (\omega)) = \operatorname\mathbb{P}(A \cap B) - \operatorname\mathbb{P}(A) \operatorname\mathbb{P}(B) .</math> ==Characteristic function in recursion theory, Gödel's and Kleene's representing function== [[Kurt Gödel]] described the ''representing function'' in his 1934 paper "On undecidable propositions of formal mathematical systems" (the symbol "{{math\|¬}}" indicates logical inversion, i.e. "NOT"):<ref name=Martin-1965>{{cite book \|pages=41–74 \|editor-link=Martin Davis (mathematician) \|editor-first=Martin \|editor-last=Davis \|year=1965 \|title=The Undecidable \|publisher=Raven Press Books \|place=New York, NY}}</ref>{{rp\|page=42}} {{blockquote\|1=There shall correspond to each class or relation {{mvar\|R}} a representing function <math>\phi(x_1, \ldots x_n) = 0</math> if <math>R(x_1,\ldots x_n)</math> and <math>\phi(x_1,\ldots x_n) = 1</math> if <math>\neg R(x_1,\ldots x_n).</math>}} Line 93: [[Stephen Kleene\|Kleene]] offers up the same definition in the context of the [[primitive recursive function]]s as a function {{mvar\|φ}} of a predicate {{mvar\|P}} takes on values {{math\|0}} if the predicate is true and {{math\|1}} if the predicate is false.<ref name=Kleene1952>{{cite book \|last=Kleene \|first=Stephen \|author-link=Stephen Kleene \|year=1971 \|orig-year=1952 \|title=Introduction to Metamathematics \|page=227 \|publisher=Wolters-Noordhoff Publishing and North Holland Publishing Company \|___location=Netherlands \|edition=Sixth reprint, with corrections}}</ref> For example, because the product of characteristic functions <math>\phi_1 * \phi_2 * \cdots * \phi_n = 0</math> whenever any one of the functions equals {{math\|0}}, it plays the role of logical OR: IF <math>\phi_1 = 0\ </math> OR <math>\ \phi_2 = 0</math> OR~~ ~~ ... OR <math>\phi_n = 0</math> THEN their product is {{math\|0}}. What appears to the modern reader as the representing function's logical inversion, i.e. the representing function is {{math\|0}} when the function {{mvar\|R}} is "true" or satisfied", plays a useful role in Kleene's definition of the logical functions OR, AND, and IMPLY,<ref name=Kleene1952 />{{rp\|228}} the bounded-<ref name=Kleene1952 />{{rp\|228}} and unbounded-<ref name=Kleene1952 />{{rp\|279 ff}} [[mu operator]]s and the CASE function.<ref name=Kleene1952 />{{rp\|229}} ==Characteristic function in fuzzy set theory== In classical mathematics, characteristic functions of sets only take values {{math\|1}} (members) or {{math\|0}} (non-members). In ''[[fuzzy set theory]]'', characteristic functions are generalized to take value in the real unit interval {{closed-closed\|0, 1}}, or more generally, in some [[universal algebra\|algebra]] or [[structure (mathematical logic)\|structure]] (usually required to be at least a [[partially ordered set\|poset]] or [[lattice (order)\|lattice]]). Such generalized characteristic functions are more usually called [[membership function (mathematics)\|membership function]]s, and the corresponding "sets" are called ''fuzzy'' sets. Fuzzy sets model the gradual change in the membership [[degree of truth\|degree]] seen in many real-world [[predicate (mathematics)\|predicate]]s like "tall", "warm", etc. ==Smoothness== ~~==Derivatives of the indicator function==~~ {{~~Main~~See also\|Laplacian of the indicator}} In general, the indicator function of a set is not smooth; it is continuous if and only if its [[support (math)\|support]] is a [[connected component (topology)\|connected component]]. In the [[algebraic geometry]] of [[finite fields]], however, every [[affine variety]] admits a ([[Zariski topology\|Zariski]]) continuous indicator function.<ref>{{Cite book\|title=Course in Arithmetic\|last=Serre\|pages=5}}</ref> Given a [[finite set]] of functions <math>f_\alpha \in \mathbb{F}_q\left[\ x_1, \ldots, x_n\right]</math> let <math>V = \bigl\{\ x \in \mathbb{F}_q^n : f_\alpha(x) = 0\ \bigr\}</math> be their vanishing locus. Then, the function <math display="inline">\mathbb{P}(x) = \prod\left(\ 1 - f_\alpha(x)^{q-1}\right)</math> acts as an indicator function for <math>V.</math> If <math>x \in V</math> then <math>\mathbb{P}(x) = 1,</math> otherwise, for some <math>f_\alpha,</math> we have <math>f_\alpha(x) \neq 0</math> which implies that <math>f_\alpha(x)^{q-1} = 1,</math> hence <math>\mathbb{P}(x) = 0.</math> A particular indicator function is the [[Heaviside step function]]. The Heaviside step function {{math\|''H''(''x'')}} is the indicator function of the one-dimensional positive half-line, i.e. the ___domain {{closed-open\|0, ∞}}. The [[distributional derivative]] of the Heaviside step function is equal to the [[Dirac delta function]], i.e. Although indicator functions are not smooth, they admit [[weak derivative]]s. For example, consider [[Heaviside step function]] <math display="block">H(x) \equiv \operatorname\mathbb{I}\!\bigl(x > 0\bigr)</math> The [[distributional derivative]] of the Heaviside step function is equal to the [[Dirac delta function]], i.e. <math display=block>\frac{\mathrm{d}H(x)}{\mathrm{d}x}= \delta(x)</math> ~~<math display=block>\delta(x)=\tfrac{d H(x)}{dx}</math>~~ and similarly the distributional derivative of <math display="block">G(x) := \operatorname\mathbb{I}\!\bigl(x < 0\bigr)</math> is <math display=block>\frac{\mathrm{d}G(x)}{\mathrm{d}x} = -\delta(x).</math> ~~with the following property:~~ ~~<math display=block>\int_{-\infty}^\infty f(x) \, \delta(x) dx = f(0).</math>~~ The derivative of the Heaviside step function can be seen as the ''inward normal derivative'' at the ''boundary'' of the ___domain given by the positive half-line. In higher dimensions, the derivative naturally generalises to the inward normal derivative, while the Heaviside step function naturally generalises to the indicator function of some ___domain {{mvar\|D}}. The surface of {{mvar\|D}} will be denoted by {{mvar\|S}}. Proceeding, it can be derived that the [[Laplacian of the indicator#Dirac surface delta function\|inward normal derivative of the indicator]] gives rise to a 'surface delta function', which can be indicated by <math>\delta_S(\mathbf{x}):</math>▼ <math display=block>\delta_S(\mathbf{x}) = -\mathbf{n}_x \cdot \nabla_x\mathbf{1}_{\mathbf{x}\in D}</math>▼ ▲~~The~~Thus the derivative of the Heaviside step function can be seen as the ''inward normal derivative'' at the ''boundary'' of the ___domain given by the positive half-line. In higher dimensions, the derivative naturally generalises to the inward normal derivative, while the Heaviside step function naturally generalises to the indicator function of some ___domain {{mvar\|D}}. The surface of {{mvar\|D}} will be denoted by {{mvar\|S}}. Proceeding, it can be derived that the inward [[~~Laplacian of the indicator#Dirac surface delta function\|inward~~ normal derivative]] of the indicator]] gives rise to a ''[[surface delta function]]'', which can be indicated by <math>\delta_S(\mathbf{x}):</math>: ▲<math display=block>\delta_S(\mathbf{x}) = -\mathbf{n}_x \cdot \nabla_x \~~mathbf~~operatorname\mathbb{1I}_{\!\bigl(\ \mathbf{x}\in D}\ \bigr)\ </math> where {{mvar\|n}} is the outward [[Normal (geometry)\|normal]] of the surface {{mvar\|S}}. This 'surface delta function' has the following property:<ref>{{cite journal \|last=Lange \|first=Rutger-Jan \|year=2012 \|title=Potential theory, path integrals and the Laplacian of the indicator \|journal=Journal of High Energy Physics \|volume=2012 \|issue=11 \|pages=29–30 \|arxiv=1302.0864 \|bibcode=2012JHEP...11..032L \|doi=10.1007/JHEP11(2012)032\|s2cid=56188533 }}</ref> <math display=block>-\int_{\R^n}f(\mathbf{x})\,\mathbf{n}_x\cdot\nabla_x \~~mathbf~~operatorname\mathbb{1I}_{\!\bigl(\ \mathbf{x}\in D}\ \bigr) \; \operatorname{d}^{n}\mathbf{x} = \oint_{S}\,f(\mathbf{\beta}) \; \operatorname{d}^{n-1}\mathbf{\beta}.</math>▼ ▲<math display=block>-\int_{\R^n}f(\mathbf{x})\,\mathbf{n}_x\cdot\nabla_x\mathbf{1}_{\mathbf{x}\in D}\;d^{n}\mathbf{x} = \oint_{S}\,f(\mathbf{\beta})\;d^{n-1}\mathbf{\beta}.</math> By setting the function {{mvar\|f}} equal to one, it follows that the [[Laplacian of the indicator#Dirac surface delta function\|inward normal derivative of the indicator]] integrates to the numerical value of the [[surface area]] {{mvar\|S}}. ==See also== {{Div col\|colwidth=~~40em~~15em}} * [[Dirac measure]] * [[Laplacian of the indicator]] Line 126 ⟶ 120: * [[Free variables and bound variables]] * [[Heaviside step function]] * [[Identity function]] * [[Iverson bracket]] * [[Kronecker delta]], a function that can be viewed as an indicator for the [[Equality (mathematics)\|identity relation]] Line 135 ⟶ 130: * [[Statistical classification]] * [[Zero-one loss function]] [[Subobject classifier]], a related concept from [[Topos theory\|topos theory]].{{div col end}} ~~{{div col end}}~~ ==Notes== {{notelist\|1}} ==References== Line 144 ⟶ 139: ==Sources== {{refbegin\|~~30em~~25em}} {{cite book \|last=Folland \|first=G.B. \|title=Real Analysis: Modern Techniques and Their Applications \|publisher=John Wiley & Sons, Inc. \|year=1999 \|isbn=978-0-471-31716-6 \|edition=Second}} * {{cite book \|last1=Cormen \|first1=Thomas H. \|title=Introduction to Algorithms \|title-link=Introduction to Algorithms \|last2=Leiserson \|first2=Charles E. \|last3=Rivest \|first3=Ronald L. \|last4=Stein \|first4=Clifford \|publisher=MIT Press and McGraw-Hill \|year=2001 \|isbn=978-0-262-03293-3 \|edition=Second \|pages=[https://archive.org/details/introductiontoal00corm_691/page/n116 94]–99 \|chapter=Section 5.2: Indicator random variables \|author-link=Thomas H. Cormen \|author-link2=Charles E. Leiserson \|author-link3=Ronald L. Rivest \|author-link4=Clifford Stein}}