Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Mgfbinae (talk | contribs)
Mgfbinae (talk | contribs)
No edit summary
Line 22:
We see that the proximity operator is important because <math> x^* </math> is a minimizer to the problem <math> \min_{x\in\mathcal{H}} F(x)+R(x)</math> if and only if
:<math>x^* = \operatorname{prox}_{\gamma R}\left(x^*-\gamma\nabla F(x^*)\right),</math> where <math>\gamma>0</math> is any positive real number.<ref name=combettes />
 
===Moreau decomposition===
 
InOne contrastimportant technique related to lasso,proximal thegradient derivationmethods ofis the proximity operator for group lasso relies on applying a technique known as '''Moreau decomposition,''' which decomposes the identity operator as the sum of two proximity operators.<ref name=combettes /> Namely, let <math>\varphi:\mathcal{X}\to\mathbb{R}</math> be a [[Semi-continuity|lower semicontinuous]], convex function on a vector space <math>\mathcal{X}</math>. We define its [[Convex_conjugate|Fenchel conjugate]] <math>\varphi^*:\mathcal{X}\to\mathbb{R}</math> to be the function
:<math>\varphi^*(u) := \sup_{x\in\mathcal{X}} \langle x,u\rangle - \varphi(x).</math>
The general form of Moreau's decomposition states that for any <math>x\in\mathcal{X}</math> and any <math>\gamma>0</math> that
:<math>x = \operatorname{prox}_{\gamma \varphi}(x) + \gamma\operatorname{prox}_{\varphi^*/\gamma}(x/\gamma),</math>
which for <math>\gamma=1</math> implies that <math>x = \operatorname{prox}_{\varphi}(x)+\operatorname{prox}_{\varphi^*}(x)</math>.<ref name=combettes /><ref name=moreau>{{cite journal|last=Moreau|first=J.-J.|title=Fonctions convexes duales et points proximaux dans un espace hilbertien|journal=C. R. Acad. Sci. Paris Ser. A Math.|year=1962|volume=255|pages=2987-2899}}</ref> The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections.<ref name=combettes />
 
In certain situations it may be easier to compute the proximity operator for the conjugate <math>\varphi^*</math> instead of the function <math>\varphi</math>, and therefore the Moreau decomposition can be applied. This is the case for [[#Exploiting_group_structure| group lasso]].
 
 
==Lasso regularization==
Line 108 ⟶ 119:
where <math>w_g</math> is the <math>g</math>th group.
 
In contrast to lasso, the derivation of the proximity operator for group lasso relies on
In contrast to lasso, the derivation of the proximity operator for group lasso relies on applying a technique known as ''Moreau decomposition,'' which decomposes the identity operator as the sum of two proximity operators.<ref name=combettes /> Namely, let <math>\varphi:\mathcal{X}\to\mathbb{R}</math> be a [[Semi-continuity|lower semicontinuous]], convex function on a vector space <math>\mathcal{X}</math>. We define its [[Convex_conjugate|Fenchel conjugate]] <math>\varphi^*:\mathcal{X}\to\mathbb{R}</math> to be the function
 
:<math>\varphi^*(u) := \sup_{x\in\mathcal{X}} \langle x,u\rangle - \varphi(x).</math>
FIX
The general form of Moreau's decomposition states that for any <math>x\in\mathcal{X}</math> and any <math>\gamma>0</math> that
 
:<math>x = \operatorname{prox}_{\gamma \varphi}(x) + \gamma\operatorname{prox}_{\varphi^*/\gamma}(x/\gamma),</math>
FIX
which for <math>\gamma=1</math> implies that <math>x = \operatorname{prox}_{\varphi}(x)+\operatorname{prox}_{\varphi^*}(x)</math>.<ref name=combettes /><ref name=moreau>{{cite journal|last=Moreau|first=J.-J.|title=Fonctions convexes duales et points proximaux dans un espace hilbertien|journal=C. R. Acad. Sci. Paris Ser. A Math.|year=1962|volume=255|pages=2987-2899}}</ref> The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections.<ref name=combettes />
FXI
 
In certain situations it may be easier to compute the proximity operator for the conjugate <math>\varphi^*</math> instead of the function <math>\varphi</math>, and therefore the Moreau decomposition can be applied. In the case of group lasso this is precisely the case, as the proximity operator of the conjugate becomes a projection onto the ball of a dual norm.<ref name=structSparse />
 
===Other group structures===