Multi-task learning: Difference between revisions

Content deleted Content added
Restored revision 1192215366 by WikiCleanerBot (talk)
m Add a new optimization-based MTL method
Line 25:
=== Multi-task optimization ===
 
[[Multitask optimization]]: In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models.<ref>{{Cite journal |last1=Standley |first1=Trevor |last2=Zamir |first2=Amir R. |last3=Chen |first3=Dawn |last4=Guibas |first4=Leonidas |last5=Malik |first5=Jitendra |last6=Savarese |first6=Silvio |date=2020-07-13 |title=Learning the Pareto Front with Hypernetworks |url=https://proceedings.mlr.press/v119/standley20a.html |journal=International Conference on Machine Learning (ICML)|arxiv=1905.07553 }}</ref> Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. Commonly, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. These methods include subtracting the projection of conflicted gradients, <ref>{{Cite journal |last1=Yu |first1=Tianhe |last2=Kumar |first2=Saurabh |last3=Gupta |first3=Abhishek |last4=Levine |first4=Sergey |last5=Hausman |first5=Karol |last6=Finn |first6=Chelsea |date=2020 |title=Gradient Surgery for Multi-Task Learning |url=https://proceedings.neurips.cc/paper/2020/file/3fe78a8acf5fda99de95303940a2420c-Paper.pdf |journal=Advances in Neural Information Processing Systems |arxiv=2001.06782 }}</ref> and, applying techniques from game theory. <ref>{{Cite journal |last1=Navon |first1=Aviv |last2=Shamsian |first2=Aviv |last3=Achituve |first3=Idan |last4=Maron |first4=Haggai |last5=Kawaguchi |first5=Kenji |last6=Chechik |first6=Gal |last7=Fetaya |first7=Ethan |date=2022 |title=Multi-Task Learning as a Bargaining Game |url=https://proceedings.mlr.press/v162/navon22a.html |journal=International Conference on Machine Learning |arxiv=2202.01017 }}</ref>, and using Bayesian modeling to get a distribution over gradients <ref>{{Cite journal |last1=Achituve |first1=Idan |last2=Maron |first2=Haggai |last3=Kawaguchi |first3=Kenji |last4=Chechik |first4=Gal |last5=Fetaya |first5=Ethan |date=2024 |title=Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning |url=https://arxiv.org/abs/2402.04005 |journal=arxiv=2402.04005 }}</ref>.
 
== Mathematics ==