Submodular set function: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
 
(33 intermediate revisions by 14 users not shown)
Line 1:
{{Short description|FunctionSet-to-real onmap subsets that has awith diminishing returns property}}
{{Use American English|date = January 2019}}
In mathematics, a '''submodular set function''' (also known as a '''submodular function''') is a [[set function]] whose valuethat, informally, hasdescribes the propertyrelationship thatbetween thea difference in the incremental valueset of theinputs functionand thatan aoutput, singlewhere elementadding makesmore whenof added to anone input sethas decreasesa asdecreasing theadditional sizebenefit of([[diminishing the input set increasesreturns]]). Submodular functions have aThe natural [[diminishing returns]] property which makes them suitable for many applications, including [[approximation algorithms]], [[game theory]] (as functions modeling user preferences) and [[electrical network]]s. Recently, submodular functions have also found immense utility in several real world problems in [[machine learning]] and [[artificial intelligence]], including [[automatic summarization]], [[multi-document summarization]], [[feature selection]], [[Active learning (machine learning)|active learning]], sensor placement, image collection summarization and many other domains.<ref name="LB" /><ref name="TIWB" /><ref name="KG1" /><ref name="KG" />
{{Short description|Function on subsets that has a diminishing returns property}}
In mathematics, a '''submodular set function''' (also known as a '''submodular function''') is a [[set function]] whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an input set decreases as the size of the input set increases. Submodular functions have a natural [[diminishing returns]] property which makes them suitable for many applications, including [[approximation algorithms]], [[game theory]] (as functions modeling user preferences) and [[electrical network]]s. Recently, submodular functions have also found immense utility in several real world problems in [[machine learning]] and [[artificial intelligence]], including [[automatic summarization]], [[multi-document summarization]], [[feature selection]], [[Active learning (machine learning)|active learning]], sensor placement, image collection summarization and many other domains.<ref name="LB" /><ref name="TIWB" /><ref name="KG1" /><ref name="KG" />
 
== Definition ==
Line 14:
satisfies the first condition above, but the second condition fails when <math>S</math> and <math>T</math> are infinite sets with finite intersection.
 
== Types and examples of submodular functions ==
 
=== Monotone ===
A submodularset function <math>f</math> is ''monotone'' if for every <math>T\subseteq S</math> we have that <math>f(T)\leq f(S)</math>. Examples of monotone submodular functions include:
; Linear (Modular) functions : Any function of the form <math>f(S)=\sum_{i\in S}w_i</math> is called a linear function. Additionally if <math>\forall i,w_i\geq 0</math> then f is monotone.
; [[Budget-additive valuation|Budget-additive functions]] : Any function of the form <math>f(S)=\min\left\{B,~\sum_{i\in S}w_i\right\}</math> for each <math>w_i\geq 0</math> and <math>B\geq 0</math> is called budget additive.{{citation<ref needed|datename=August"BF" 2014}}/>
; Coverage functions : Let <math>\Omega=\{E_1,E_2,\ldots,E_n\}</math> be a collection of subsets of some [[matroid|ground set]] <math>\Omega'</math>. The function <math>f(S)=\left|\bigcup_{E_i\in S}E_i\right|</math> for <math>S\subseteq \Omega</math> is called a coverage function. This can be generalized by adding non-negative weights to the elements.
; [[Entropy (information theory)|Entropy]] : Let <math>\Omega=\{X_1,X_2,\ldots,X_n\}</math> be a set of [[random variables]]. Then for any <math>S\subseteq \Omega</math> we have that <math>H(S)</math> is a submodular function, where <math>H(S)</math> is the entropy of the set of random variables <math>S</math>, a fact known as [[Entropic_vectorEntropic vector#Shannon-type_inequalities_and_Γntype inequalities and Γn|Shannon's inequality]].<ref>{{Cite web|url = https://www.cs.cmu.edu/~aarti/Class/10704_Spring15/lecs/lec3.pdf|title = Information Processing and Learning|publisher = cmu}}</ref> Further inequalities for the entropy function are known to hold, see [[entropic vector]].
; [[Matroid]] [[matroid rank|rank functions]] : Let <math>\Omega=\{e_1,e_2,\dots,e_n\}</math> be the ground set on which a matroid is defined. Then the rank function of the matroid is a submodular function.<ref name=F22>Fujishige (2005) p.22</ref>
 
=== Non-monotone ===
A submodular function that is not monotone is called ''non-monotone''. In particular, a function is called non-monotone if it has the property that adding more elements to a set can decrease the value of the function. More formally, the function <math> f </math> is non-monotone if there are sets <math>S,T</math> in its ___domain s.t. <math> S \subset T </math> and <math>f(S)> f(T)</math>.
A submodular function which is not monotone is called ''non-monotone''.
 
==== Symmetric ====
A non-monotone submodular function <math>f</math> is called ''symmetric'' if for every <math>S\subseteq \Omega</math> we have that <math>f(S)=f(\Omega-S)</math>.
Line 37 ⟶ 36:
; Directed cuts : Let <math>\Omega=\{v_1,v_2,\dots,v_n\}</math> be the vertices of a [[directed graph]]. For any set of vertices <math>S\subseteq \Omega</math> let <math>f(S)</math> denote the number of edges <math>e=(u,v)</math> such that <math>u\in S</math> and <math>v\in \Omega-S</math>. This can be generalized by adding non-negative weights to the directed edges.
 
== Continuous extensions of submodular set functions ==
Often, given a submodular set function that describes the values of various sets, we need to compute the values of ''fractional'' sets. For example: we know that the value of receiving house A and house B is V, and we want to know the value of receiving 40% of house A and 60% of house B. To this end, we need a ''continuous extension'' of the submodular set function.
 
Formally, a set function <math>f:2^{\Omega}\rightarrow \mathbb{R}</math> with <math>|\Omega|=n</math> can be represented as a function on <math>\{0, 1\}^{n}</math>, by associating each <math>S\subseteq \Omega</math> with a binary vector <math>x^{S}\in \{0, 1\}^{n}</math> such that <math>x_{i}^{S}=1</math> when <math>i\in S</math>, and <math>x_{i}^{S}=0</math> otherwise. A ''continuous [[Restriction (mathematics)#Extension of a function|extension]]'' of <math>f</math> is a continuous function <math>F:[0, 1]^{n}\rightarrow \mathbb{R}</math>, that matches the value of <math>f</math> on <math>x\in \{0, 1\}^{n}</math>, i.e. <math>F(x^{S})=f(S)</math>.
 
Several kinds of continuous extensions of submodular functions are commonly used, which are described below.
 
=== Lovász extension ===
This extension is named after mathematician [[László Lovász]].<ref name="L" /> Consider any vector <math>\mathbf{x}=\{x_1,x_2,\dots,x_n\}</math> such that each <math>0\leq x_i\leq 1</math>. Then the Lovász extension is defined as

<math>f^L(\mathbf{x})=\mathbb{E}(f(\{i|x_i\geq \lambda\}))</math>

where the expectation is over <math>\lambda</math> chosen from the [[uniform distribution (continuous)|uniform distribution]] on the interval <math>[0,1]</math>. The Lovász extension is a convex function if and only if <math>f</math> is a submodular function.
 
=== Multilinear extension ===
Consider any vector <math>\mathbf{x}=\{x_1,x_2,\ldots,x_n\}</math> such that each <math>0\leq x_i\leq 1</math>. Then the multilinear extension is defined as <ref>{{Cite book |last=Vondrak |first=Jan |title=Proceedings of the fortieth annual ACM symposium on Theory of computing |chapter=Optimal approximation for the submodular welfare problem in the value oracle model |date=2008-05-17 |chapter-url=https://doi.org/10.1145/1374376.1374389 |series=STOC '08 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=67–74 |doi=10.1145/1374376.1374389 |isbn=978-1-60558-047-0|s2cid=170510 }}</ref><ref>{{Cite journal |last1=Calinescu |first1=Gruia |last2=Chekuri |first2=Chandra |last3=Pál |first3=Martin |last4=Vondrák |first4=Jan |date=January 2011 |title=Maximizing a Monotone Submodular Function Subject to a Matroid Constraint |url=http://epubs.siam.org/doi/10.1137/080733991 |journal=SIAM Journal on Computing |language=en |volume=40 |issue=6 |pages=1740–1766 |doi=10.1137/080733991 |issn=0097-5397|url-access=subscription }}</ref><math>F(\mathbf{x})=\sum_{S\subseteq \Omega} f(S) \prod_{i\in S} x_i \prod_{i\notin S} (1-x_i)</math>.
 
Intuitively, ''x<sub>i</sub>'' represents the probability that item ''i'' is chosen for the set. For every set ''S'', the two inner products represent the probability that the chosen set is exactly ''S''. Therefore, the sum represents the expected value of ''f'' for the set formed by choosing each item ''i'' at random with probability xi, independently of the other items.
 
=== Convex closure ===
Consider any vector <math>\mathbf{x}=\{x_1,x_2,\dots,x_n\}</math> such that each <math>0\leq x_i\leq 1</math>. Then the convex closure is defined as <math>f^-(\mathbf{x})=\min\left(\sum_S \alpha_S f(S):\sum_S \alpha_S 1_S=\mathbf{x},\sum_S \alpha_S=1,\alpha_S\geq 0\right)</math>.

The convex closure of any set function is convex over <math>[0,1]^n</math>. It can be shown that <math>f^L(\mathbf{x})=f^-(\mathbf{x})</math> for submodular functions.
 
=== Concave closure ===
Consider any vector <math>\mathbf{x}=\{x_1,x_2,\dots,x_n\}</math> such that each <math>0\leq x_i\leq 1</math>. Then the concave closure is defined as <math>f^+(\mathbf{x})=\max\left(\sum_S \alpha_S f(S):\sum_S \alpha_S 1_S=\mathbf{x},\sum_S \alpha_S=1,\alpha_S\geq 0\right)</math>.
 
=== Relations between continuous extensions ===
For the extensions discussed above, it can be shown that <math>f^{+}(\mathbf{x}) \geq F(\mathbf{x}) \geq f^{-}(\mathbf{x})=f^L(\mathbf{x})</math> when <math>f</math> is submodular.<ref name="JV2" />
 
== Properties ==
Line 57 ⟶ 72:
# Consider a random process where a set <math>T</math> is chosen with each element in <math>\Omega</math> being included in <math>T</math> independently with probability <math>p</math>. Then the following inequality is true <math>\mathbb{E}[f(T)]\geq p f(\Omega)+(1-p) f(\varnothing)</math> where <math>\varnothing</math> is the empty set. More generally consider the following random process where a set <math>S</math> is constructed as follows. For each of <math>1\leq i\leq l, A_i\subseteq \Omega</math> construct <math>S_i</math> by including each element in <math>A_i</math> independently into <math>S_i</math> with probability <math>p_i</math>. Furthermore let <math>S=\cup_{i=1}^l S_i</math>. Then the following inequality is true <math>\mathbb{E}[f(S)]\geq \sum_{R\subseteq [l]} \Pi_{i\in R}p_i \Pi_{i\notin R}(1-p_i)f(\cup_{i\in R}A_i)</math>.{{Citation needed|date=November 2013}}
 
== Optimization problems{{Anchor|optimization}} ==
Submodular functions have properties which are very similar to [[convex function|convex]] and [[concave function]]s. For this reason, an [[optimization problem]] which concerns optimizing a convex or concave function can also be described as the problem of maximizing or minimizing a submodular function subject to some constraints.
 
=== Submodular set function minimization===
The hardness of minimizing a submodular set function depends on constraints imposed on the problem.
The simplest minimization problem is to find a set <math>S\subseteq \Omega</math> which minimizes a submodular function; this is the unconstrained problem. This problem is computable in (strongly)<ref name="IFF" /><ref name="Schrijver" /> [[polynomial time]].<ref name="GLS" /><ref name="Cunningham" /> Computing the [[minimum cut]] in a graph is a special case of this general minimization problem. However, adding even a simple constraint such as a cardinality lower bound makes the minimization problem [[NP hard]], with polynomial factor lower bounds on the approximation factor.<ref name="SF" /><ref name="IJB" />
 
# The unconstrained problem of minimizing a submodular function is computable in [[polynomial time]],<ref name="GLS" /><ref name="Cunningham" /> and even in [[Strongly polynomial|strongly-polynomial]] time.<ref name="IFF" /><ref name="Schrijver" /> Computing the [[minimum cut]] in a graph is a special case of this minimization problem.
# The problem of minimizing a submodular function with a cardinality lower bound is [[NP-hard]], with polynomial factor lower bounds on the approximation factor.<ref name="SF" /><ref name="IJB" />
 
=== Submodular set function maximization===
Unlike the case of minimization, maximizing a generic submodular function is [[NP-hard]] even in the unconstrained setting. Thus, most of the works in this field are concerned with polynomial-time approximation algorithms, including [[greedy algorithm]]s or [[local search (optimization)|local search algorithm]]s.
Unlike the case of minimization, maximizing a submodular functions is [[NP-hard]] even in the unconstrained setting. Theory and enumeration algorithms for finding local and global maxima (minima) of submodular (supermodular) functions can be found in B. Goldengorin. European Journal of Operational Research 198(1):102-112, DOI: 10.1016/j.ejor.2008.08.022. For instance [[max cut]] is a special case even when the function is required only to be non-negative. The unconstrained problem can be shown to be inapproximable if it is allowed to be negative. There has been extensive work on constrained submodular function maximization when the functions are non-negative. Typically, the approximation algorithms for these problems are based on either [[greedy algorithm]]s or [[local search (optimization)|local search algorithm]]s. The problem of maximizing a non-negative symmetric submodular function admits a 1/2 approximation algorithm.<ref name="FMV" /> Computing the [[maximum cut]] of a graph is a special case of this problem. The more general problem of maximizing a non-negative submodular function also admits a 1/2 approximation algorithm.<ref name="BFNS" /> The problem of maximizing a monotone submodular function subject to a cardinality constraint admits a <math>1 - 1/e</math> approximation algorithm.<ref name="NVF" />{{page needed|date=October 2020}}<ref>{{Cite web|last=Williamson|first=David P.|title=Bridging Continuous and Discrete Optimization: Lecture 23|url=https://people.orie.cornell.edu/dpw/orie6334/lecture23.pdf}}</ref> The [[maximum coverage problem]] is a special case of this problem. The more general problem of maximizing a monotone submodular function subject to a [[matroid]] constraint also admits a <math>1 - 1/e</math> approximation algorithm.<ref name="CCPV" /><ref name="FNS" /><ref name="FW" /> Many of these algorithms can be unified within a semi-differential based framework of algorithms.<ref name="IJB" />
 
# The problem of maximizing a non-negative submodular function admits a 1/2 approximation algorithm.<ref name="FMV" /><ref name="BFNS" /> Computing the [[maximum cut]] of a graph is a special case of this problem.
# The problem of maximizing a monotone submodular function subject to a cardinality constraint admits a <math>1 - 1/e</math> approximation algorithm.<ref name="NVF" /><ref>{{Cite web|last=Williamson|first=David P.|title=Bridging Continuous and Discrete Optimization: Lecture 23|url=https://people.orie.cornell.edu/dpw/orie6334/lecture23.pdf}}</ref> The [[maximum coverage problem]] is a special case of this problem.
# The problem of maximizing a monotone submodular function subject to a [[matroid]] constraint (which subsumes the case above) also admits a <math>1 - 1/e</math> approximation algorithm.<ref name="CCPV" /><ref name="FNS" /><ref name="FW" />
 
Many of these algorithms can be unified within a semi-differential based framework of algorithms.<ref name="IJB" />
 
===Related optimization problems===
Apart from submodular minimization and maximization, there are several other natural optimization problems related to submodular functions.
Apart from submodular minimization and maximization, another natural problem is Difference of Submodular Optimization.<ref name="NB" /><ref name="IBUAI" /> Unfortunately, this problem is not only NP hard, but also inapproximable.<ref name="IBUAI" /> A related optimization problem is minimize or maximize a submodular function, subject to a submodular level set constraint (also called submodular optimization subject to submodular cover or submodular knapsack constraint). This problem admits bounded approximation guarantees.<ref name="IB" /> Another optimization problem involves partitioning data based on a submodular function, so as to maximize the average welfare. This problem is called the submodular welfare problem.<ref name="JV" />
 
# Minimizing the difference between two submodular functions<ref name="NB" /> is not only NP hard, but also inapproximable.<ref name="IBUAI" />
# Minimization/maximization of a submodular function subject to a submodular level set constraint (also known as submodular optimization subject to submodular cover or submodular knapsack constraint) admits bounded approximation guarantees.<ref name="IB" />
# Partitioning data based on a submodular function to maximize the average welfare is known as the submodular welfare problem, which also admits bounded approximation guarantees (see [[welfare maximization]]).
 
== Applications ==
Submodular functions naturally occur in several real world applications, in [[economics]], [[game theory]], [[machine learning]] and [[computer vision]].<ref name="KG" /><ref name="JB" /> Owing to the diminishing returns property, submodular functions naturally model costs of items, since there is often a larger discount, with an increase in the items one buys. Submodular functions model notions of complexity, similarity and cooperation when they appear in minimization problems. In maximization problems, on the other hand, they model notions of diversity, information and coverage. For more information on applications of submodularity, particularly in machine learning, see <ref name="KG" /><ref name="ST" /><ref name="JB" />
 
== See also ==
Line 83 ⟶ 111:
<ref name="Cunningham">{{cite journal |first=W. H. |last=Cunningham |title=On submodular function minimization |journal=Combinatorica |volume=5 |issue=3 |year=1985 |pages=185–192 |doi=10.1007/BF02579361 |s2cid=33192360 }}</ref>
<ref name="IFF">{{cite journal |first1=S. |last1=Iwata |first2=L. |last2=Fleischer |first3=S. |last3=Fujishige |title=A combinatorial strongly polynomial algorithm for minimizing submodular functions |journal=J. ACM |volume=48 |year=2001 |issue=4 |pages=761–777 |doi=10.1145/502090.502096 |s2cid=888513 }}</ref>
<ref name="Schrijver">{{cite journal |author-link=Alexander Schrijver |first=A. |last=Schrijver |title=A combinatorial algorithm minimizing submodular functions in strongly polynomial time |journal=J. Combin. Theory Ser. B |volume=80 |year=2000 |issue=2 |pages=346–355 |doi=10.1006/jctb.2000.1989 |url=https://ir.cwi.nl/pub/2108 |doi-access=free }}</ref>
<ref name="IJB">R. Iyer, [[Stefanie Jegelka|S. Jegelka]] and J. Bilmes, Fast Semidifferential based submodular function optimization, Proc. ICML (2013).</ref>
<ref name="IB">R. Iyer and J. Bilmes, Submodular Optimization Subject to Submodular Cover and Submodular Knapsack Constraints, In Advances of NIPS (2013).</ref>
<ref name="IBUAI">R. Iyer and J. Bilmes, Algorithms for Approximate Minimization of the Difference between Submodular Functions, In Proc. UAI (2012).</ref>
<ref name="NB">M. Narasimhan and J. Bilmes, A submodular-supermodular procedure with applications to discriminative structure learning, In Proc. UAI (2005).</ref>
<ref name="FMV">[[Uriel Feige|U. Feige]], V. Mirrokni and J. Vondrák, Maximizing non-monotone submodular functions, Proc. of 48th FOCS (2007), pp. 461–471.</ref>
<ref name="NVF">[[{{cite journal|first1=George |last1=Nemhauser|G. L.author-link1=George Nemhauser]], |first2=L. A. |last2=Wolsey and |first3=M. L. |last3=Fisher, |title=An analysis of approximations for maximizing submodular set functions I, |journal=Mathematical Programming |issue=14 (|year=1978),|volume=14 |pages=265–294|doi=10.1007/BF01588971 |s2cid=206800425 }}</ref>
<ref name="CCPV">G. Calinescu, C. Chekuri, M. Pál and J. Vondrák, Maximizing a submodular set function subject to a matroid constraint, SIAM J. Comp. 40:6 (2011), 1740-1766.</ref>
<ref name="BFNS">N. Buchbinder, M. Feldman, J. Naor and R. Schwartz, A tight linear time (1/2)-approximation for unconstrained submodular maximization, Proc. of 53rd FOCS (2012), pp. 649-658.</ref>
<ref name="FW">Y. Filmus, J. Ward, A tight combinatorial algorithm for submodular maximization subject to a matroid constraint, Proc. of 53rd FOCS (2012), pp. 659-668.</ref>
<ref name="SF">Z. Svitkina and L. Fleischer, Submodular approximation: Sampling-based algorithms and lower bounds, SIAM Journal on Computing (2011).</ref>
 
<ref name="JV">J. Vondrák, Optimal approximation for the submodular welfare problem in the value oracle model, Proc. of STOC (2008), pp. 461–471.</ref>
<ref name="ST">http://submodularity.org/.</ref>
<ref name="KG">A. Krause and C. Guestrin, Beyond Convexity: Submodularity in Machine Learning, Tutorial at ICML-2008</ref>
<ref name="JB">J. Bilmes, Submodularity in Machine Learning Applications, Tutorial at AAAI-2015.</ref>
Line 102 ⟶ 129:
<ref name="KG1">A. Krause and C. Guestrin, Near-optimal nonmyopic value of information in graphical models, UAI-2005.</ref>
<ref name="FNS">M. Feldman, J. Naor and R. Schwartz, A unified continuous greedy algorithm for submodular maximization, Proc. of 52nd FOCS (2011).</ref>
<ref name="L">{{cite book |author-link1=László Lovász |last1=Lovász |first1=L. |title=Mathematical Programming the State of the Art |chapter=Submodular functions and convexity |date=1983 |chapter-url= |pages=235–257 |doi=10.1007/978-3-642-68874-4_10 |isbn=978-3-642-68876-8 |s2cid=117358746 }}</ref>
<ref name="BF">{{cite encyclopedia |last1=Buchbinder |first1=Niv |last2=Feldman |first2=Moran |title=Submodular Functions Maximization Problems |encyclopedia= Handbook of Approximation Algorithms and Metaheuristics, Second Edition: Methodologies and Traditional Applications |year=2018 |editor1-last=Gonzalez |editor1-first=Teofilo F. |publisher=Chapman and Hall/CRC |doi=10.1201/9781351236423 |isbn=9781351236423 |url=https://www.taylorfrancis.com/chapters/edit/10.1201/9781351236423-42/submodular-functions-maximization-problems-niv-buchbinder-moran-feldman|url-access=subscription }}</ref>
 
<ref name="JV2">{{Cite web|last=Vondrák|first=Jan|title=Polyhedral techniques in combinatorial optimization: Lecture 17|url=https://theory.stanford.edu/~jvondrak/CS369P/lec17.pdf}}</ref>
}}
 
Line 109 ⟶ 140:
*{{Citation|last=Lee|first=Jon|author-link=Jon Lee (mathematician)|year= 2004 |title=A First Course in Combinatorial Optimization |publisher=[[Cambridge University Press]]|isbn= 0-521-01012-8}}
*{{Citation|last=Fujishige|first=Satoru|year=2005|title=Submodular Functions and Optimization|publisher=[[Elsevier]]|isbn=0-444-52086-4}}
*{{Citation|last=Narayanan|first=H.|year= 1997 |title=Submodular Functions and Electrical Networks|publisher=Elsevier |isbn= 0-444-82523-1}}
*{{citation | last=Oxley | first=James G. | title=Matroid theory | series=Oxford Science Publications | ___location=Oxford | publisher=[[Oxford University Press]] | year=1992 | isbn=0-19-853563-5 | zbl=0784.05002 }}
 
==External links==
* http://www.cs.berkeley.edu/~stefje/references.html has a longer bibliography
* http://submodularity.org/ includes further material on the subject
 
<!--- Categories --->