Meta-learning (computer science): Difference between revisions

Content deleted Content added
Added hyphen, in keeping with major reviews of meta-learning and with good spelling principles
Citation bot (talk | contribs)
Altered template type. Add: pmc, pages, volume, journal. | Use this bot. Report bugs. | Suggested by Schützenpanzer | Category:CS1 errors: unsupported parameter | #UCB_Category 214/244
 
(4 intermediate revisions by 2 users not shown)
Line 47:
 
====Meta Networks====
Meta Networks (MetaNet) learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization.<ref name="paper3">{{cite arXivjournal|first1=Tsendsuren|last1=Munkhdalai|first2=Hong|last2=Yu|year=2017|title=Meta Networks|eprintjournal=1703.00837Proceedings of Machine Learning Research |classvolume=cs70 |pages=2554–2563 |pmid=31106300 |pmc=6519722 |arxiv=1703.LG00837|language=en}}</ref>
 
===Metric-Based===
Line 80:
Some approaches which have been viewed as instances of meta-learning:
 
* [[Recurrent neural networks]] (RNNs) are universal computers. In 1993, [[Jürgen Schmidhuber]] showed how "self-referential" RNNs can in principle learn by [[backpropagation]] to run their own weight change algorithm, which may be quite different from backpropagation.<ref name="sch1993">{{cite journal | last1 = Schmidhuber | first1 = Jürgen | year = 1993| title = A self-referential weight matrix | journal = Proceedings of ICANN'93, Amsterdam | pages = 446–451 | language = en}}</ref> In 2001, [[Sepp Hochreiter]] & A.S. Younger & P.R. Conwell built a successful supervised meta-learner based on [[Long short-term memory]] RNNs. It learned through backpropagation a learning algorithm for quadratic functions that is much faster than backpropagation.<ref name="hoch2001">{{cite journal | last1 = Hochreiter | first1 = Sepp | last2 = Younger | first2 = A. S. | last3 = Conwell | first3 = P. R. | year = 2001| title = Learning to Learn Using Gradient Descent | journal = Proceedings of ICANN'01| pages = 87–94| language = en}}</ref><ref name="scholarpedia" /> Researchers at [[Deepmind]] (Marcin Andrychowicz et al.) extended this approach to optimization in 2017.<ref name="marcin2017">{{cite journal | last1 = Andrychowicz | first1 = Marcin | last2 = Denil | first2 = Misha | last3 = Gomez | first3 = Sergio | last4 = Hoffmann | first4 = Matthew | last5 = Pfau | first5 = David | last6 = Schaul | first6 = Tom | last7 = Shillingford | first7 = Brendan | last8 = de Freitas | first8 = Nando | year = 2017| title = Learning to learn by gradient descent by gradient descent | journal = Proceedings of ICML'17, Sydney, Australia| arxiv = 1606.04474 }}</ref>
* In the 1990s, Meta [[Reinforcement Learning]] or Meta RL was achieved in Schmidhuber's research group through self-modifying policies written in a universal programming language that contains special instructions for changing the policy itself. There is a single lifelong trial. The goal of the RL agent is to maximize reward. It learns to accelerate reward intake by continually improving its own learning algorithm which is part of the "self-referential" policy.<ref name="sch1994">{{cite journal | last1 = Schmidhuber | first1 = Jürgen | year = 1994| title = On learning how to learn learning strategies | journal = Technical Report FKI-198-94, Tech. Univ. Munich | language = en | url = http://people.idsia.ch/~juergen/FKI-198-94ocr.pdf}}</ref><ref name="sch1997">{{cite journal | last1 = Schmidhuber | first1 = Jürgen | last2 = Zhao | first2 = J. | last3 = Wiering | first3 = M. | year = 1997| title = Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement | journal = Machine Learning | volume = 28 | pages = 105–130 | doi=10.1023/a:1007383707642| doi-access = free| language = en }}</ref>
* An extreme type of Meta [[Reinforcement Learning]] is embodied by the [[Gödel machine]], a theoretical construct which can inspect and modify any part of its own software which also contains a general [[Automated theorem proving|theorem prover]]. It can achieve [[recursive self-improvement]] in a provably optimal way.<ref name="goedelmachine">{{cite journal | last1 = Schmidhuber | first1 = Jürgen | year = 2006| title = Gödel machines: Fully Self-Referential Optimal Universal Self-Improvers | url=https://archive.org/details/arxiv-cs0309048| journal = In B. Goertzel & C. Pennachin, Eds.: Artificial General Intelligence | pages = 199–226 | language=en}}</ref><ref name="scholarpedia" />
* ''Model-Agnostic Meta-Learning'' (MAML) was introduced in 2017 by [[Chelsea Finn]] et al.<ref name="maml" /> Given a sequence of tasks, the parameters of a given model are trained such that few iterations of gradient descent with few training data from a new task will lead to good generalization performance on that task. MAML "trains the model to be easy to fine-tune."<ref name="maml" /> MAML was successfully applied to few-shot image classification benchmarks and to policy-gradient-based reinforcement learning.<ref name="maml">{{cite arXiv | last1 = Finn | first1 = Chelsea | last2 = Abbeel | first2 = Pieter | last3 = Levine | first3 = Sergey |year = 2017| title = Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks | eprint=1703.03400|class=cs.LG|language=en }}</ref>
* ''Variational Bayes-Adaptive Deep RL'' (VariBAD) was introduced in 2019.<ref>{{Cite journal |last1=Zintgraf |first1=Luisa |last2=Schulze |first2=Sebastian |last3=Lu |first3=Cong |last4=Feng |first4=Leo |last5=Igl |first5=Maximilian |last6=Shiarlis |first6=Kyriacos |last7=Gal |first7=Yarin |last8=Hofmann |first8=Katja |last9=Whiteson |first9=Shimon |date=2021 |title=VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning |url=http://jmlr.org/papers/v22/21-0657.html |journal=Journal of Machine Learning Research |volume=22 |issue=289 |pages=1–39 |issn=1533-7928}}</ref> While MAML is optimization-based, VariBAD is a model-based method for meta reinforcement learning, and leverages a [[variational autoencoder]] to capture the task information in an internal memory, thus conditioning its decision making on the task.
* When addressing a set of tasks, most meta learning approaches optimize the average score across all tasks. Hence, certain tasks may be sacrificed in favor of the average score, which is often unacceptable in real-world applications. By contrast, ''Robust Meta Reinforcement Learning'' (RoML) focuses on improving low-score tasks, increasing robustness to the selection of task.<ref>{{Cite journal |last1=Greenberg |first1=Ido |last2=Mannor |first2=Shie |last3=Chechik |first3=Gal |last4=Meirom |first4=Eli |date=2023-12-15 |title=Train Hard, Fight Easy: Robust Meta Reinforcement Learning |url=https://proceedings.neurips.cc/paper_files/paper/2023/hash/d74e6bfe9ce029526e69db14d2c281ec-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=36 |pages=68276–68299}}</ref> RoML works as a meta-algorithm, as it can be applied on top of other meta learning algorithms (such as MAML and VariBAD) to increase their robustness. It is applicable to both supervised meta learning and meta [[reinforcement learning]].
* ''Discovering [[meta-knowledge]]'' works by inducing knowledge (e.g. rules) that expresses how each learning method will perform on different learning problems. The metadata is formed by characteristics of the data (general, statistical, information-theoretic,... ) in the learning problem, and characteristics of the learning algorithm (type, parameter settings, performance measures,...). Another learning algorithm then learns how the data characteristics relate to the algorithm characteristics. Given a new learning problem, the data characteristics are measured, and the performance of different learning algorithms are predicted. Hence, one can predict the algorithms best suited for the new problem.
* ''Stacked generalisation'' works by combining multiple (different) learning algorithms. The metadata is formed by the predictions of those different algorithms. Another learning algorithm learns from this metadata to predict which combinations of algorithms give generally good results. Given a new learning problem, the predictions of the selected set of algorithms are combined (e.g. by (weighted) voting) to provide the final prediction. Since each algorithm is deemed to work on a subset of problems, a combination is hoped to be more flexible and able to make good predictions.