Revision as of 14:57, 8 September 2024 edit IdanAchituve (talk \| contribs) 5 edits m Add publication venue for BayesAgg-MTL ← Previous edit		Revision as of 15:02, 8 September 2024 edit undo MrOllie (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 255,671 edits clean up apparent COI / refspam Next edit →
Line 29: === Multi-task optimization === [[Multitask optimization]]: In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models.<ref>{{Cite journal \|last1=Standley \|first1=Trevor \|last2=Zamir \|first2=Amir R. \|last3=Chen \|first3=Dawn \|last4=Guibas \|first4=Leonidas \|last5=Malik \|first5=Jitendra \|last6=Savarese \|first6=Silvio \|date=2020-07-13 \|title=Learning the Pareto Front with Hypernetworks \|url=https://proceedings.mlr.press/v119/standley20a.html \|journal=International Conference on Machine Learning (ICML)\|pages=9120–9132 \|arxiv=1905.07553 }}</ref> Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. Commonly, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. These methods include subtracting the projection of conflicted gradients,<ref>{{Cite journal \|last1=Yu \|first1=Tianhe \|last2=Kumar \|first2=Saurabh \|last3=Gupta \|first3=Abhishek \|last4=Levine \|first4=Sergey \|last5=Hausman \|first5=Karol \|last6=Finn \|first6=Chelsea \|date=2020 \|title=Gradient Surgery for Multi-Task Learning \|url=https://proceedings.neurips.cc/paper/2020/file/3fe78a8acf5fda99de95303940a2420c-Paper.pdf \|journal=Advances in Neural Information Processing Systems \|arxiv=2001.06782 }}</ref> applying techniques from game theory,<ref>{{Cite journal \|last1=Navon \|first1=Aviv \|last2=Shamsian \|first2=Aviv \|last3=Achituve \|first3=Idan \|last4=Maron \|first4=Haggai \|last5=Kawaguchi \|first5=Kenji \|last6=Chechik \|first6=Gal \|last7=Fetaya \|first7=Ethan \|date=2022 \|title=Multi-Task Learning as a Bargaining Game \|url=https://proceedings.mlr.press/v162/navon22a.html \|journal=International Conference on Machine Learning \|pages=16428–16446 \|arxiv=2202.01017 }}</ref> and using Bayesian modeling to get a distribution over gradients.<ref>{{Cite journal \|last1=Achituve \|first1=Idan \|last2=Diamant \|first2=Idit \|last3=Netzer \|first3=Arnon \|last4=Chechik \|first4=Gal \|last5=Fetaya \|first5=Ethan \|date=2024 \|title=Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning \|url=https://proceedings.mlr.press/v235/achituve24a.html \|journal=International Conference on Machine Learning \|pages=117--134\|arxiv=2402.04005}}</ref> == Mathematics ==

Multi-task learning: Difference between revisions