Multi-task learning: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.04b - Bot T21 - Fix errors for CW project (Missing whitespace before a link - Reference before punctuation)
Line 90:
* Letting <math display="inline">A^\dagger = \gamma I_T + ( \gamma - \lambda)\frac {1} T \mathbf{1}\mathbf{1}^\top </math> (where <math>I_T </math> is the ''T''x''T'' identity matrix, and <math display="inline">\mathbf{1}\mathbf{1}^\top </math> is the ''T''x''T'' matrix of ones) is equivalent to letting {{math|&Gamma;}} control the variance <math display="inline">\sum_t || f_t - \bar f|| _{\mathcal H_k} </math> of tasks from their mean <math display="inline">\frac 1 T \sum_t f_t </math>. For example, blood levels of some biomarker may be taken on {{mvar|T}} patients at <math>n_t</math> time points during the course of a day and interest may lie in regularizing the variance of the predictions across patients.
* Letting <math> A^\dagger = \alpha I_T +(\alpha - \lambda )M </math> , where <math> M_{t,s} = \frac 1 {|G_r|} \mathbb I(t,s\in G_r) </math> is equivalent to letting <math> \alpha </math> control the variance measured with respect to a group mean: <math> \sum _{r} \sum _{t \in G_r } ||f_t - \frac 1 {|G_r|} \sum _{s\in G_r)} f_s|| </math>. (Here <math> |G_r| </math> the cardinality of group r, and <math> \mathbb I </math> is the indicator function). For example, people in different political parties (groups) might be regularized together with respect to predicting the favorability rating of a politician. Note that this penalty reduces to the first when all tasks are in the same group.
* Letting <math> A^\dagger = \delta I_T + (\delta -\lambda)L </math>, where <math> L=D-M</math> is the [[Laplacian matrix|Laplacian]] for the graph with [[adjacency matrix]] ''M'' giving pairwise similarities of tasks. This is equivalent to giving a larger penalty to the distance separating tasks ''t'' and ''s'' when they are more similar (according to the weight <math> M_{t,s} </math>,) i.e. <math>\delta </math> regularizes <math> \sum _{t,s}||f_t - f_s ||_{\mathcal H _k }^2 M_{t,s} </math>.
* All of the above choices of A also induce the additional regularization term <math display="inline">\lambda \sum_t ||f|| _{\mathcal H_k} ^2 </math> which penalizes complexity in f more broadly.