Multi-task learning: Difference between revisions

Content deleted Content added
ce
Bender the Bot (talk | contribs)
m top: HTTP to HTTPS for Cornell University
 
(18 intermediate revisions by 11 users not shown)
Line 1:
{{short description|Solving multiple machine learning tasks at the same time}}
'''Multi-task learning''' (MTL) is a subfield of [[machine learning]] in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately.<ref>Baxter, J. (2000). A model of inductive bias learning" ''Journal of Artificial Intelligence Research'' 12:149--198, [http://www-2.cs.cmu.edu/afs/cs/project/jair/pub/volume12/baxter00a.pdf On-line paper]</ref><ref>[[Sebastian Thrun|Thrun, S.]] (1996). Is learning the n-th thing any easier than learning the first?. In Advances in Neural Information Processing Systems 8, pp. 640--646. MIT Press. [http://citeseer.ist.psu.edu/thrun96is.html Paper at Citeseer]</ref><ref name=":2">{{Cite journal|url = httphttps://www.cs.cornell.edu/~caruana/mlj97.pdf|title = Multi-task learning|last = Caruana|first = R.|date = 1997|journal = Machine Learning|doi = 10.1023/A:1007379606734|volume=28|pages=41–75|doi-access = free}}</ref>
Inherently, Multi-task learning is a [[multi-objective optimization]] problem having [[trade-off]]s between different tasks.<ref>Multi-Task Learning as Multi-Objective Optimization
Part of Advances in Neural Information Processing Systems 31 (NeurIPS 2018), https://proceedings.neurips.cc/paper/2018/hash/432aca3a1e345e339f35a30c8f65edce-Abstract.html</ref>
Early versions of MTL were called "hints".<ref>Suddarth, S., Kergosien, Y. (1990). Rule-injection hints as a means of improving network performance and learning time. EURASIP Workshop. Neural Networks pp. 120-129. Lecture Notes in Computer Science. Springer.</ref><ref>{{cite journal | last1 = Abu-Mostafa | first1 = Y. S. | year = 1990 | title = Learning from hints in neural networks | journal = Journal of Complexity | volume = 6 | issue = 2| pages = 192–198 | doi=10.1016/0885-064x(90)90006-y| doi-access = free }}</ref>
 
In a widely cited 1997 paper, Rich Caruana gave the following characterization:<blockquote>Multitask Learning is an approach to [[inductive transfer]] that improves [[Generalization error|generalization]] by using the ___domain information contained in the training signals of related tasks as an [[inductive bias]]. It does this by learning tasks in parallel while using a shared [[Representation learning|representation]]; what is learned for each task can help other tasks be learned better.<ref name=":2"/></blockquote>
Line 9 ⟶ 12:
 
==Methods==
The key challenge in multi-task learning, is how to combine learning signals from multiple tasks into a single model. This may strongly depend on how well different task agree with each other, or contradict each other. There are several ways to address this challenge:
 
===Task grouping and overlap===
Within the MTL paradigm, information can be shared across some or all of the tasks. Depending on the structure of task relatedness, one may want to share information selectively across the tasks. For example, tasks may be grouped or exist in a hierarchy, or be related according to some general metric. Suppose, as developed more formally below, that the parameter vector modeling each task is a [[linear combination]] of some underlying basis. Similarity in terms of this basis can indicate the relatedness of the tasks. For example, with [[Sparse array|sparsity]], overlap of nonzero coefficients across tasks indicates commonality. A task grouping then corresponds to those tasks lying in a subspace generated by some subset of basis elements, where tasks in different groups may be disjoint or overlap arbitrarily in terms of their bases.<ref>Kumar, A., & Daume III, H., (2012) Learning Task Grouping and Overlap in Multi-Task Learning. http://icml.cc/2012/papers/690.pdf</ref> Task relatedness can be imposed a priori or learned from the data.<ref name=":1"/><ref>Jawanpuria, P., & Saketha Nath, J., (2012) A Convex Feature Learning Formulation for Latent Task Structure Discovery. http://icml.cc/2012/papers/90.pdf</ref> Hierarchical task relatedness can also be exploited implicitly without assuming a priori knowledge or learning relations explicitly.<ref name=":bmdl">Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-___domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada. {{ArXiv|1810.09433}}</ref><ref>Zweig, A. & Weinshall, D. Hierarchical Regularization Cascade for Joint Learning. Proceedings: of 30th International Conference on Machine Learning (ICML), Atlanta GA, June 2013. http://www.cs.huji.ac.il/~daphna/papers/Zweig_ICML2013.pdf</ref> For example, the explicit learning of sample relevance across tasks can be done to guarantee the effectiveness of joint learning across multiple domains.<ref name=":bmdl"/>
 
===Exploiting unrelated tasks===
Line 24 ⟶ 27:
 
=== Multi-task optimization ===
'''Multi-task optimization''' focuses on solving optimizing the whole process.<ref name=TO>{{cite journal | doi=10.1109/TETCI.2017.2769104 | title=Insights on Transfer Optimization: Because Experience is the Best Teacher | year=2018 | last1=Gupta | first1=Abhishek | last2=Ong | first2=Yew-Soon | last3=Feng | first3=Liang | journal=IEEE Transactions on Emerging Topics in Computational Intelligence | volume=2 | pages=51–64 | hdl=10356/147980 | s2cid=11510470 | hdl-access=free }}</ref><ref name=mfo>{{cite journal | doi=10.1109/TEVC.2015.2458037 | title=Multifactorial Evolution: Toward Evolutionary Multitasking | year=2016 | last1=Gupta | first1=Abhishek | last2=Ong | first2=Yew-Soon | last3=Feng | first3=Liang | journal=IEEE Transactions on Evolutionary Computation | volume=20 | issue=3 | pages=343–357 | hdl=10356/148174 | s2cid=13767012 | hdl-access=free }}</ref> The paradigm has been inspired by the well-established concepts of [[transfer learning]]<ref>{{cite journal | doi=10.1109/TKDE.2009.191 | title=A Survey on Transfer Learning | year=2010 | last1=Pan | first1=Sinno Jialin | last2=Yang | first2=Qiang | journal=IEEE Transactions on Knowledge and Data Engineering | volume=22 | issue=10 | pages=1345–1359 | s2cid=740063 }}</ref> and multi-task learning in [[predictive analytics]].<ref>Caruana, R., "Multitask Learning", pp. 95-134 in Sebastian Thrun, Lorien Pratt (eds.) ''Learning to Learn'', (1998) Springer {{ISBN|9780792380474}}</ref>
 
The key motivation behind multi-task optimization is that if optimization tasks are related to each other in terms of their optimal solutions or the general characteristics of their function landscapes,<ref>{{cite journal | doi=10.1016/j.engappai.2017.05.008 | title=Coevolutionary multitasking for concurrent global optimization: With case studies in complex engineering design | year=2017 | last1=Cheng | first1=Mei-Ying | last2=Gupta | first2=Abhishek | last3=Ong | first3=Yew-Soon | last4=Ni | first4=Zhi-Wei | journal=Engineering Applications of Artificial Intelligence | volume=64 | pages=13–24 | s2cid=13767210 | doi-access=free }}</ref> the search progress can be transferred to substantially accelerate the search on the other.
[[Multitask optimization]]: In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models.<ref>{{Cite journal |last1=Standley |first1=Trevor |last2=Zamir |first2=Amir R. |last3=Chen |first3=Dawn |last4=Guibas |first4=Leonidas |last5=Malik |first5=Jitendra |last6=Savarese |first6=Silvio |date=2020-07-13 |title=Learning the Pareto Front with Hypernetworks |url=https://proceedings.mlr.press/v119/standley20a.html |journal=International Conference on Machine Learning (ICML)|arxiv=1905.07553 }}</ref> Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. Commonly, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. These methods include subtracting the projection of conflicted gradients,<ref>{{Cite journal |last1=Yu |first1=Tianhe |last2=Kumar |first2=Saurabh |last3=Gupta |first3=Abhishek |last4=Levine |first4=Sergey |last5=Hausman |first5=Karol |last6=Finn |first6=Chelsea |date=2020 |title=Gradient Surgery for Multi-Task Learning |url=https://proceedings.neurips.cc/paper/2020/file/3fe78a8acf5fda99de95303940a2420c-Paper.pdf |journal=Advances in Neural Information Processing Systems |arxiv=2001.06782 }}</ref> applying techniques from game theory,<ref>{{Cite journal |last1=Navon |first1=Aviv |last2=Shamsian |first2=Aviv |last3=Achituve |first3=Idan |last4=Maron |first4=Haggai |last5=Kawaguchi |first5=Kenji |last6=Chechik |first6=Gal |last7=Fetaya |first7=Ethan |date=2022 |title=Multi-Task Learning as a Bargaining Game |url=https://proceedings.mlr.press/v162/navon22a.html |journal=International Conference on Machine Learning |arxiv=2202.01017 }}</ref> and using Bayesian modeling to get a distribution over gradients.<ref>{{Cite arxiv |last1=Achituve |first1=Idan |last2=Diamant |first2=Idit |last3=Netzer |first3=Arnon |last4=Chechik |first4=Gal |last5=Fetaya |first5=Ethan |date=2024 |title=Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning |arxiv=2402.04005 }}</ref>
 
The success of the paradigm is not necessarily limited to one-way knowledge transfers from simpler to more complex tasks. In practice an attempt is to intentionally solve a more difficult task that may unintentionally solve several smaller problems.<ref name="DeFreitas">{{cite arXiv | eprint=1707.03300 | last1=Cabi | first1=Serkan | author2=Sergio Gómez Colmenarejo | last3=Hoffman | first3=Matthew W. | last4=Denil | first4=Misha | last5=Wang | first5=Ziyu | author6=Nando de Freitas | title=The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously | year=2017 | class=cs.AI }}</ref>
 
There is a direct relationship between multitask optimization and [[multi-objective optimization]].<ref>J. -Y. Li, Z. -H. Zhan, Y. Li and J. Zhang, "Multiple Tasks for Multiple Objectives: A New Multiobjective Optimization Method via Multitask Optimization," in IEEE Transactions on Evolutionary Computation, {{doi|10.1109/TEVC.2023.3294307}}</ref>
 
[[Multitask optimization]]: In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models.<ref>{{Cite journal |last1=Standley |first1=Trevor |last2=Zamir |first2=Amir R. |last3=Chen |first3=Dawn |last4=Guibas |first4=Leonidas |last5=Malik |first5=Jitendra |last6=Savarese |first6=Silvio |date=2020-07-13 |title=Learning the Pareto Front with Hypernetworks |url=https://proceedings.mlr.press/v119/standley20a.html |journal=International Conference on Machine Learning|pages=9120–9132 (ICML)|arxiv=1905.07553 }}</ref> Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. Commonly, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. These methods include subtracting the projection of conflicted gradients,<ref>{{Cite journal |last1=Yu |first1=Tianhe |last2=Kumar |first2=Saurabh |last3=Gupta |first3=Abhishek |last4=Levine |first4=Sergey |last5=Hausman |first5=Karol |last6=Finn |first6=Chelsea |date=2020 |title=Gradient Surgery for Multi-Task Learning |url=https://proceedings.neurips.cc/paper/2020/file/3fe78a8acf5fda99de95303940a2420c-Paper.pdf |journal=Advances in Neural Information Processing Systems |arxiv=2001.06782 }}</ref> applying techniques from game theory,<ref>{{Cite journal |last1=Navon |first1=Aviv |last2=Shamsian |first2=Aviv |last3=Achituve |first3=Idan |last4=Maron |first4=Haggai |last5=Kawaguchi |first5=Kenji |last6=Chechik |first6=Gal |last7=Fetaya |first7=Ethan |date=2022 |title=Multi-Task Learning as a Bargaining Game |url=https://proceedings.mlr.press/v162/navon22a.html |journal=International Conference on Machine Learning |arxiv=2202.01017 }}</ref> and using Bayesian modeling to get a distribution over gradients.<ref>{{Cite arxiv |last1=Achituve |first1=Idan |last2=Diamant |first2=Idit |last3=Netzer |first3=Arnon |last4=Chechik |first4=Gal |last5=Fetaya |first5=Ethan |date=2024 |title=Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning |arxiv=2402.04005 }}</ref>
 
There are several common approaches for multi-task optimization: [[Bayesian optimization]], [[evolutionary computation]], and approaches based on [[Game theory]].<ref name=TO/>
 
==== Multi-task Bayesian optimization ====
'''Multi-task Bayesian optimization''' is a modern model-based approach that leverages the concept of knowledge transfer to speed up the automatic [[hyperparameter optimization]] process of machine learning algorithms.<ref name=mtbo>Swersky, K., Snoek, J., & Adams, R. P. (2013). [http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf Multi-task bayesian optimization]. Advances in neural information processing systems (pp. 2004-2012).</ref> The method builds a multi-task [[Gaussian process]] model on the data originating from different searches progressing in tandem.<ref>Bonilla, E. V., Chai, K. M., & Williams, C. (2008). [http://papers.nips.cc/paper/3189-multi-task-gaussian-process-prediction.pdf Multi-task Gaussian process prediction]. Advances in neural information processing systems (pp. 153-160).</ref> The captured inter-task dependencies are thereafter utilized to better inform the subsequent sampling of candidate solutions in respective search spaces.
 
==== Evolutionary multi-tasking ====
'''Evolutionary multi-tasking''' has been explored as a means of exploiting the [[implicit parallelism]] of population-based search algorithms to simultaneously progress multiple distinct optimization tasks. By mapping all tasks to a unified search space, the evolving population of candidate solutions can harness the hidden relationships between them through continuous genetic transfer. This is induced when solutions associated with different tasks crossover.<ref name=mfo/><ref name=cognitive>Ong, Y. S., & Gupta, A. (2016). [http://www.cil.ntu.edu.sg/mfo/downloads/MultitaskOptimization_manuscript.pdf Evolutionary multitasking: a computer science view of cognitive multitasking]. Cognitive Computation, 8(2), 125-142.</ref> Recently, modes of knowledge transfer that are different from direct solution [[Crossover (genetic algorithm)|crossover]] have been explored.<ref>{{cite journal | doi=10.1109/TCYB.2018.2845361 | title=Evolutionary Multitasking via Explicit Autoencoding | year=2019 | last1=Feng | first1=Liang | last2=Zhou | first2=Lei | last3=Zhong | first3=Jinghui | last4=Gupta | first4=Abhishek | last5=Ong | first5=Yew-Soon | last6=Tan | first6=Kay-Chen | last7=Qin | first7=A. K. | journal=IEEE Transactions on Cybernetics | volume=49 | issue=9 | pages=3457–3470 | pmid=29994415 | s2cid=51613697 }}</ref><ref>{{Cite journal |last1=Jiang |first1=Yi |last2=Zhan |first2=Zhi-Hui |last3=Tan |first3=Kay Chen |last4=Zhang |first4=Jun |date=January 2024 |title=Block-Level Knowledge Transfer for Evolutionary Multitask Optimization |journal=IEEE Transactions on Cybernetics |volume=54 |issue=1 |pages=558–571 |doi=10.1109/TCYB.2023.3273625 |pmid=37216256 |issn=2168-2267}}</ref>
 
==== Game-theoretic optimization ====
Game-theoretic approaches to multi-task optimization propose to view the optimization problem as a game, where each task is a player. All players compete through the reward matrix of the game, and try to reach a solution that satisfies all players (all tasks). This view provide insight about how to build efficient algorithms based on [[gradient descent]] optimization (GD), which is particularly important for training [[deep neural networks]].<ref>{{Cite book |last1=Goodfellow |first1=Ian |title=Deep Learning |last2=Bengio |first2=Yoshua |last3=Courville |first3=Aaron |publisher=MIT Press |year=2016 |isbn=978-0-262-03561-3}}</ref> In GD for MTL, the problem is that each task provides its own loss, and it is not clear how to combine all losses and create a single unified gradient, leading to several different aggregation strategies.<ref>{{Cite web |last1=Liu |first1=L. |last2=Li |first2=Y. |last3=Kuang |first3=Z. |last4=Xue |first4=J. |last5=Chen |first5=Y. |last6=Yang |first6=W. |last7=Liao |first7=Q. |last8=Zhang |first8=W. |date=2021-05-04 |title=Towards Impartial Multi-task Learning |url=https://iclr.cc/ |access-date=2022-11-20 |website=In: Proceedings of the International Conference on Learning Representations (ICLR 2021). ICLR: Virtual event. (2021)}}</ref><ref>{{Cite journal |last1=Tianhe |first1=Yu |last2=Saurabh |first2=Kumar |last3=Abhishek |first3=Gupta |last4=Sergey |first4=Levine |last5=Karol |first5=Hausman |last6=Chelsea |first6=Finn |date=2020 |title=Gradient Surgery for Multi-Task Learning |url=https://proceedings.neurips.cc/paper/2020/hash/3fe78a8acf5fda99de95303940a2420c-Abstract.html |journal=Advances in Neural Information Processing Systems |language=en |volume=33|arxiv=2001.06782 }}</ref><ref>{{Cite arXiv |last1=Liu |first1=Bo |last2=Liu |first2=Xingchao |last3=Jin |first3=Xiaojie |last4=Stone |first4=Peter |last5=Liu |first5=Qiang |date=2021-10-26 |title=Conflict-Averse Gradient Descent for Multi-task Learning |class=cs.LG |eprint=2110.14048}}</ref> This aggregation problem can be solved by defining a game matrix where the reward of each player is the agreement of its own gradient with the common gradient, and then setting the common gradient to be the Nash [[Cooperative bargaining]]<ref>Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, Ethan Fetaya, (2022). [https://proceedings.mlr.press/v162/navon22a.html Multi-Task Learning as a Bargaining Game]. International conference on machine learning.</ref> of that system.
 
== Applications ==
Algorithms for multi-task optimization span a wide array of real-world applications. Recent studies highlight the potential for speed-ups in the optimization of engineering design parameters by conducting related designs jointly in a multi-task manner.<ref name=cognitive/> In [[machine learning]], the transfer of optimized features across related data sets can enhance the efficiency of the training process as well as improve the generalization capability of learned models.<ref>Chandra, R., Gupta, A., Ong, Y. S., & Goh, C. K. (2016, October). [http://www.cil.ntu.edu.sg/mfo/downloads/cvmultask.pdf Evolutionary multi-task learning for modular training of feedforward neural networks]. In International Conference on Neural Information Processing (pp. 37-46). Springer, Cham.</ref><ref>Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). [http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-n%E2%80%A6 How transferable are features in deep neural networks?] In Advances in neural information processing systems (pp. 3320-3328).</ref> In addition, the concept of multi-tasking has led to advances in automatic [[hyperparameter optimization]] of machine learning models and [[ensemble learning]].<ref>{{cite book | doi=10.1109/CEC.2016.7748363 | chapter=Learning ensemble of decision trees through multifactorial genetic programming | title=2016 IEEE Congress on Evolutionary Computation (CEC) | year=2016 | last1=Wen | first1=Yu-Wei | last2=Ting | first2=Chuan-Kang | pages=5293–5300 | isbn=978-1-5090-0623-6 | s2cid=2617811 }}</ref><ref>{{cite book | doi=10.1145/3205455.3205638 | chapter=Evolutionary feature subspaces generation for ensemble classification | title=Proceedings of the Genetic and Evolutionary Computation Conference | year=2018 | last1=Zhang | first1=Boyu | last2=Qin | first2=A. K. | last3=Sellis | first3=Timos | pages=577–584 | isbn=978-1-4503-5618-3 | s2cid=49564862 }}</ref>
 
Applications have also been reported in cloud computing,<ref>{{cite book | doi=10.1007/978-3-319-94472-2_10 | chapter=An Evolutionary Multitasking Algorithm for Cloud Computing Service Composition | title=Services – SERVICES 2018 | series=Lecture Notes in Computer Science | year=2018 | last1=Bao | first1=Liang | last2=Qi | first2=Yutao | last3=Shen | first3=Mengqing | last4=Bu | first4=Xiaoxuan | last5=Yu | first5=Jusheng | last6=Li | first6=Qian | last7=Chen | first7=Ping | volume=10975 | pages=130–144 | isbn=978-3-319-94471-5 }}</ref> with future developments geared towards cloud-based on-demand optimization services that can cater to multiple customers simultaneously.<ref name=mfo/><ref>Tang, J., Chen, Y., Deng, Z., Xiang, Y., & Joy, C. P. (2018). [https://www.ijcai.org/proceedings/2018/0538.pdf A Group-based Approach to Improve Multifactorial Evolutionary Algorithm]. In IJCAI (pp. 3870-3876).</ref> Recent work has additionally shown applications in chemistry.<ref>{{citation |mode=cs1 |doi=10.26434/chemrxiv.13250216.v2 |title=Multi-task Bayesian Optimization of Chemical Reactions |work=chemRxiv |date=2021 |last1=Felton |first1=Kobi |last2=Wigh |first2=Daniel |last3=Lapkin |first3=Alexei|doi-access=free }}</ref> In addition, some recent works have applied multi-task optimization algorithms in industrial manufacturing.<ref>{{Cite journal |last1=Jiang |first1=Yi |last2=Zhan |first2=Zhi-Hui |last3=Tan |first3=Kay Chen |last4=Zhang |first4=Jun |date=October 2023 |title=A Bi-Objective Knowledge Transfer Framework for Evolutionary Many-Task Optimization |journal=IEEE Transactions on Evolutionary Computation |volume=27 |issue=5 |pages=1514–1528 |doi=10.1109/TEVC.2022.3210783 |issn=1089-778X|doi-access=free }}</ref><ref>{{Cite journal |last1=Jiang |first1=Yi |last2=Zhan |first2=Zhi-Hui |last3=Tan |first3=Kay Chen |last4=Kwong |first4=Sam |last5=Zhang |first5=Jun |date=2024 |title=Knowledge Structure Preserving-Based Evolutionary Many-Task Optimization |journal=IEEE Transactions on Evolutionary Computation |volume=29 |issue=2 |pages=287–301 |doi=10.1109/TEVC.2024.3355781 |issn=1089-778X|doi-access=free }}</ref>
 
== Mathematics ==
Line 131 ⟶ 157:
 
==Software package==
A Matlab package called Multi-Task Learning via StructurAl Regularization (MALSAR) <ref>Zhou, J., Chen, J. and Ye, J. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2012. http://www.public.asu.edu/~jye02/Software/MALSAR. [http://www.public.asu.edu/~jye02/Software/MALSAR/Manual.pdf On-line manual]</ref> implements the following multi-task learning algorithms: Mean-Regularized Multi-Task Learning,<ref>Evgeniou, T., & Pontil, M. (2004). [https://web.archive.org/web/20171212193041/https://pdfs.semanticscholar.org/1ea1/91c70559d21be93a4d128f95943e80e1b4ff.pdf Regularized multi–task learning]. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 109–117).</ref><ref>{{cite journal | last1 = Evgeniou | first1 = T. | last2 = Micchelli | first2 = C. | last3 = Pontil | first3 = M. | year = 2005 | title = Learning multiple tasks with kernel methods | url = http://jmlr.org/papers/volume6/evgeniou05a/evgeniou05a.pdf | journal = Journal of Machine Learning Research | volume = 6 | page = 615 }}</ref> Multi-Task Learning with Joint Feature Selection,<ref>{{cite journal | last1 = Argyriou | first1 = A. | last2 = Evgeniou | first2 = T. | last3 = Pontil | first3 = M. | year = 2008a | title = Convex multi-task feature learning | journal = Machine Learning | volume = 73 | issue = 3| pages = 243–272 | doi=10.1007/s10994-007-5040-8| doi-access = free }}</ref> Robust Multi-Task Feature Learning,<ref>Chen, J., Zhou, J., & Ye, J. (2011). [https://www.academia.edu/download/44101186/Integrating_low-rank_and_group-sparse_st20160325-15067-1mftmbg.pdf Integrating low-rank and group-sparse structures for robust multi-task learning]{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.</ref> Trace-Norm Regularized Multi-Task Learning,<ref>Ji, S., & Ye, J. (2009). [http://www.machinelearning.org/archive/icml2009/papers/151.pdf An accelerated gradient method for trace norm minimization]. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 457–464).</ref> Alternating Structural Optimization,<ref>{{cite journal | last1 = Ando | first1 = R. | last2 = Zhang | first2 = T. | year = 2005 | title = A framework for learning predictive structures from multiple tasks and unlabeled data | url = http://www.jmlr.org/papers/volume6/ando05a/ando05a.pdf | journal = The Journal of Machine Learning Research | volume = 6 | pages = 1817–1853 }}</ref><ref>Chen, J., Tang, L., Liu, J., & Ye, J. (2009). [http://leitang.net/papers/ICML09_CASO.pdf A convex formulation for learning shared structures from multiple tasks]. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 137–144).</ref> Incoherent Low-Rank and Sparse Learning,<ref>Chen, J., Liu, J., & Ye, J. (2010). [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783291/ Learning incoherent sparse and low-rank patterns from multiple tasks]. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1179–1188).</ref> Robust Low-Rank Multi-Task Learning, Clustered Multi-Task Learning,<ref>Jacob, L., Bach, F., & Vert, J. (2008). [https://hal-ensmp.archives-ouvertes.fr/docs/00/32/05/73/PDF/cmultitask.pdf Clustered multi-task learning: A convex formulation]. Advances in Neural Information Processing Systems, 2008</ref><ref>Zhou, J., Chen, J., & Ye, J. (2011). [http://papers.nips.cc/paper/4292-clustered-multi-task-learning-via-alternating-structure-optimization.pdf Clustered multi-task learning via alternating structure optimization]. Advances in Neural Information Processing Systems.</ref> Multi-Task Learning with Graph Structures.
 
== Literature ==
* Multi-Target Prediction: A Unifying View on Problems and Methods Willem Waegeman, Krzysztof Dembczynski, Eyke Huellermeier https://arxiv.org/abs/1809.02352v1
 
==See also==
Line 139 ⟶ 168:
* [[Automated machine learning]] (AutoML)
* [[Evolutionary computation]]
* [[Foundation model]]
* [[General game playing]]
* [[Human-based genetic algorithm]]
* [[Kernel methods for vector output]]
* [[Multiple-criteria decision analysis]]
* [[MultitaskMulti-objective optimization]]
* [[Multicriteria classification]]
* [[Robot learning]]
* [[Transfer learning]]
* [[James–Stein estimator]]
{{div col end}}
 
Line 153 ⟶ 186:
==External links==
* [https://web.archive.org/web/20041118134329/http://big.cs.uiuc.edu/webpage/cumulativeLearning/cumulativeLearning.html The Biosignals Intelligence Group at UIUC]
* [http://www.cse.wustl.edu/~kilian/research/multitasklearning/multitasklearning.html Washington University in St. Louis Depart.Department of Computer Science]
 
===Software===
* [http://www.public.asu.edu/~jye02/Software/MALSAR/index.html The Multi-Task Learning via Structural Regularization Package]
* [https://web.archive.org/web/20131224113826/http://klcl.pku.edu.cn/member/sunxu/code.htm Online Multi-Task Learning Toolkit (OMT)] A general-purpose online multi-task learning toolkit based on [[conditional random field]] models and [[stochastic gradient descent]] training ([[C Sharp (programming language)|C#]], [[.NET Framework|.NET]])
 
{{Optimization algorithms}}
 
[[Category:Machine learning]]