Multi-task learning: Difference between revisions

Content deleted Content added
Rescuing 3 sources and tagging 0 as dead.) #IABot (v2.0.9.5) (AManWithNoPlan - 15896
No edit summary
Line 21:
=== Group online adaptive learning ===
Traditionally Multi-task learning and transfer of knowledge are applied to stationary learning settings. Their extension to non-stationary environments is termed Group online adaptive learning (GOAL).<ref>Zweig, A. & Chechik, G. Group online adaptive learning. Machine Learning, DOI 10.1007/s10994-017- 5661-5, August 2017. http://rdcu.be/uFSv</ref> Sharing information could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to quickly adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting [[Financial modeling|financial time-series]], through content recommendation systems, to visual understanding for adaptive autonomous agents.
 
=== Multi-task optimization ===
 
In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models <ref>{{Cite journal |last1=Standley |first1=Trevor |last2=Zamir |first2=Amir R. |last3=Chen |first3=Dawn |last4=Guibas |first4=Leonidas |last5=Malik |first5=Jitendra |last6=Savarese |first6=Silvio |date=2020-07-13 |title=Learning the Pareto Front with Hypernetworks |url=https://proceedings.mlr.press/v119/standley20a.html |journal=International Conference on Machine Learning (ICML)|arxiv=1905.07553 }}</ref>. Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. Commonly, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. These methods include subtracting the projection of conflicted gradients <ref>{{Cite journal |last1=Yu |first1=Tianhe |last2=Kumar |first2=Saurabh |last3=Gupta |first3=Abhishek |last4=Levine |first4=Sergey |last5=Hausman |first5=Karol |last6=Finn |first6=Chelsea |date=2020 |title=Gradient Surgery for Multi-Task Learning |url=https://proceedings.neurips.cc/paper/2020/file/3fe78a8acf5fda99de95303940a2420c-Paper.pdf |journal=Advances in Neural Information Processing Systems |arxiv=2001.06782 }}</ref>, and applying techniques from game theory <ref>{{Cite journal |last1=Navon |first1=Aviv |last2=Shamsian |first2=Aviv |last3=Achituve |first3=Idan |last4=Maron |first4=Haggai |last5=Kawaguchi |first5=Kenji |last6=Chechik |first6=Gal |last7=Fetaya |first7=Ethan |date=2022 |title=Multi-Task Learning as a Bargaining Game |url=https://proceedings.mlr.press/v162/navon22a.html |journal=International Conference on Machine Learning |arxiv=2202.01017 }}</ref>.
 
== Mathematics ==