Content deleted Content added
Restored revision 1192215366 by WikiCleanerBot (talk) |
IdanAchituve (talk | contribs) m Add a new optimization-based MTL method |
||
Line 25:
=== Multi-task optimization ===
[[Multitask optimization]]: In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models.<ref>{{Cite journal |last1=Standley |first1=Trevor |last2=Zamir |first2=Amir R. |last3=Chen |first3=Dawn |last4=Guibas |first4=Leonidas |last5=Malik |first5=Jitendra |last6=Savarese |first6=Silvio |date=2020-07-13 |title=Learning the Pareto Front with Hypernetworks |url=https://proceedings.mlr.press/v119/standley20a.html |journal=International Conference on Machine Learning (ICML)|arxiv=1905.07553 }}</ref> Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. Commonly, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. These methods include subtracting the projection of conflicted gradients
== Mathematics ==
|