Symbolic regression: Difference between revisions

Content deleted Content added
m Difference from classical regression: The sentence was missing a word, and was not grammatically correct.
Citation bot (talk | contribs)
Altered pages. Add: article-number, arxiv, bibcode. Removed URL that duplicated identifier. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 605/967
 
(120 intermediate revisions by 69 users not shown)
Line 1:
{{Short description|Type of regression analysis}}
'''Symbolic regression''' is a type of [[regression analysis]] that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a starting point to the algorithm. Instead, initial expressions are formed by randomly combining mathematical building blocks such as [[Operation (mathematics)|mathematical operators]], [[analytic function]]s, [[Constant (mathematics)|constants]], and [[state variable]]s. (Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique.) New equations are then formed by recombining previous equations, using [[genetic programming]].
{{Use American English|date = January 2019}}
[[File:Genetic Program Tree.png|thumb|[[Expression tree]] as it can be used in symbolic regression to represent a function.]]
 
'''Symbolic regression''' ('''SR''') is a type of [[regression analysis]] that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.
By not requiring a specific model to be specified, symbolic regression isn't affected by human bias, or unknown gaps in ___domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The [[fitness function]] that drives the evolution of the models takes into account not only [[Residual_(numerical_analysis)|error metrics]] (to ensure the models accurately predict the data), but also special complexity measures,<ref name="complexity"/> thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system.
 
No particular model is provided as a starting point for symbolic regression. Instead, initial expressions are formed by randomly combining mathematical building blocks such as [[Operation (mathematics)|mathematical operators]], [[analytic function]]s, [[Constant (mathematics)|constants]], and [[state variable]]s. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using [[genetic programming]],<ref name="schmidt2009distilling"/> as well as more recent methods utilizing [[Bayesian statistics#Bayesian methods|Bayesian methods]]<ref name="bayesian"/> and [[Artificial neural network|neural networks]].<ref name="aifeynman"/> Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy.<ref name="ufo"/> Further methods such as Exact Learning attempt to transform the fitting problem into a [[Method of moments (statistics)|moments problem]] in a natural function space, usually built around generalizations of the [[Meijer G-function|Meijer-G function]].<ref name="exactlearning"/>
 
By not requiring ''a priori'' specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in [[___domain knowledge]]. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The [[fitness function]] that drives the evolution of the models takes into account not only [[Residual (numerical analysis)|error metrics]] (to ensure the models accurately predict the data), but also special complexity measures,<ref name="complexity"/> thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system, as well as improving generalisability and extrapolation behaviour by preventing [[overfitting]]. Accuracy and simplicity may be left as two separate objectives of the regression—in which case the optimum solutions form a [[Pareto front]]—or they may be combined into a single objective by means of a model selection principle such as [[minimum description length]].
 
It has been proven that symbolic regression is an [[NP-hardness|NP-hard]] problem.<ref>{{cite journal |last1=Virgolin |first1=Marco |last2=Pissis |first2=Solon P. |journal=Transactions on Machine Learning Research |date=2022 |title=Symbolic Regression is NP-hard |arxiv=2207.01018 |url=https://openreview.net/forum?id=LTiaPxqe2e }}</ref> Nevertheless, if the sought-for equation is not too complex it is possible to solve the symbolic regression problem exactly by generating every possible function (built from some predefined set of operators) and evaluating them on the dataset in question.<ref>{{cite journal |last1=Bartlett|first1=Deaglan|last2=Desmond|first2=Harry|last3=Ferreira|first3=Pedro|title=Exhaustive Symbolic Regression|journal=IEEE Transactions on Evolutionary Computation |year=2023 |volume=28 |issue=4 |page=1 |doi=10.1109/TEVC.2023.3280250 |arxiv=2211.11461|s2cid=253735380 }}</ref>
 
== Difference from classical regression ==
 
While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing a prioriprior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters.
 
This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system.
 
Nevertheless, this characteristic of symbolic regression also has advantages: because the [[evolutionary algorithm]] requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity.
 
== Benchmarking ==
 
=== SRBench ===
In 2021, [https://cavalab.org/srbench SRBench]<ref>{{cite journal |last1=La Cava |first1=William |last2=Orzechowski |first2=Patryk |last3=Burlacu |first3=Bogdan |last4=de Franca |first4=Fabricio |last5=Virgolin |first5=Marco |last6=Jin |first6=Ying |last7=Kommenda |first7=Michael |last8=Moore |first8=Jason |title=Contemporary Symbolic Regression Methods and their Relative Performance |journal=Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks |date=2021 |volume=1 |arxiv=2107.14351 |url=https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c0c7c76d30bd3dcaefc96f40275bdc0a-Abstract-round1.html}}</ref> was proposed as a large benchmark for symbolic regression.
In its inception, SRBench featured 14 symbolic regression methods, 7 other ML methods, and 252 datasets from [https://github.com/EpistasisLab/pmlb PMLB].
The benchmark intends to be a living project: it encourages the submission of improvements, new datasets, and new methods, to keep track of the state of the art in SR.
 
=== SRBench Competition 2022 ===
In 2022, SRBench announced the competition Interpretable Symbolic Regression for Data Science, which was held at the [[Genetic and Evolutionary Computation Conference|GECCO conference]] in Boston, MA. The competition pitted nine leading symbolic regression algorithms against each other on a novel set of data problems and considered different evaluation criteria. The competition was organized in two tracks, a synthetic track and a real-world data track.<ref name="srbench2022"/>
 
==== Synthetic Track ====
In the synthetic track, methods were compared according to five properties: re-discovery of exact expressions; feature selection; resistance to local optima; extrapolation; and sensitivity to noise. Rankings of the methods were:
# [[QLattice]]
# [https://github.com/MilesCranmer/PySR PySR (Python Symbolic Regression)]
# [https://github.com/brendenpetersen/deep-symbolic-optimization uDSR (Deep Symbolic Optimization)]
 
==== Real-world Track ====
In the real-world track, methods were trained to build interpretable predictive models for 14-day forecast counts of COVID-19 cases, hospitalizations, and deaths in New York State. These models were reviewed by a subject expert and assigned trust ratings and evaluated for accuracy and simplicity. The ranking of the methods was:
 
# [https://github.com/brendenpetersen/deep-symbolic-optimization uDSR (Deep Symbolic Optimization)]
# [[QLattice]]
# [https://github.com/alcides/GeneticEngine/ geneticengine (Genetic Engine)]
 
== Non-standard methods ==
Most symbolic regression algorithms prevent [[combinatorial explosion]] by implementing evolutionary algorithms that iteratively improve the best-fit expression over many generations. Recently, researchers have proposed algorithms utilizing other tactics in [[Artificial intelligence|AI]].
 
Silviu-Marian Udrescu and [[Max Tegmark]] developed the "AI Feynman" algorithm,<ref>{{Cite journal |last1=Udrescu |first1=Silviu-Marian |last2=Tegmark |first2=Max |date=2020-04-17 |title=AI Feynman: A physics-inspired method for symbolic regression |journal=Science Advances |language=en |volume=6 |issue=16 |pages=eaay2631 |doi=10.1126/sciadv.aay2631 |issn=2375-2548 |pmc=7159912 |pmid=32426452|arxiv=1905.11481 |bibcode=2020SciA....6.2631U }}</ref><ref>{{cite arXiv |last1=Udrescu |first1=Silviu-Marian |last2=Tan |first2=Andrew |last3=Feng |first3=Jiahai |last4=Neto |first4=Orisvaldo |last5=Wu |first5=Tailin |last6=Tegmark |first6=Max |date=2020-12-16 |title=AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity |class=cs.LG |eprint=2006.10782 }}</ref> which attempts symbolic regression by training a neural network to represent the mystery function, then runs tests against the neural network to attempt to break up the problem into smaller parts. For example, if <math>f(x_1, ..., x_i, x_{i+1}, ..., x_n) = g(x_1,..., x_i) + h(x_{i+1},..., x_n)</math>, tests against the neural network can recognize the separation and proceed to solve for <math>g</math> and <math>h</math> separately and with different variables as inputs. This is an example of [[Divide-and-conquer algorithm|divide and conquer]], which reduces the size of the problem to be more manageable. AI Feynman also transforms the inputs and outputs of the mystery function in order to produce a new function which can be solved with other techniques, and performs [[dimensional analysis]] to reduce the number of independent variables involved. The algorithm was able to "discover" 100 equations from [[The Feynman Lectures on Physics]], while a leading software using evolutionary algorithms, [[Eureqa]], solved only 71. AI Feynman, in contrast to classic symbolic regression methods, requires a very large dataset in order to first train the neural network and is naturally biased towards equations that are common in elementary physics.
 
Some researchers have pointed out that conventional symbolic regression techniques may struggle to generalize in systems with complex causal dependencies or non-explicit governing equations.<ref>{{cite journal |last1=Zenil |first1=Hector |last2=Kiani |first2=Narsis A. |last3=Zea |first3=Allan A. |last4=Tegnér |first4=Jesper |title=Causal deconvolution by algorithmic generative models |journal=Nature Machine Intelligence |volume=1 |issue=1 |year=2019 |pages=58–66 |doi=10.1038/s42256-018-0005-0 }}</ref> A more general approach was developed a conceptual framework for extracting generative rules from complex dynamical systems based on Algorithmic Information Theory (AIT).<ref>{{cite journal | last=Zenil | first=Hector | title=Algorithmic Information Dynamics | journal=Scholarpedia | date=25 July 2020 | volume=15 | issue=7 | doi=10.4249/scholarpedia.53143 | doi-access=free | bibcode=2020SchpJ..1553143Z | hdl=10754/666314 | hdl-access=free }}</ref> This framework, called Algorithmic Information Dynamics (AID), applies perturbation analysis to quantify the algorithmic complexity of system components and reconstruct phase spaces and causal mechanisms, including for discrete systems such as cellular automata. Unlike traditional symbolic regression, AID enables the inference of generative rules without requiring explicit kinetic equations, offering insights into the causal structure and reprogrammability of complex systems.<ref> {{cite book | last1=Zenil | first1=Hector | last2=Kiani | first2=Narsis A. | last3=Tegner | first3=Jesper | title=Algorithmic Information Dynamics: A Computational Approach to Causality with Applications to Living Systems | publisher=Cambridge University Press | year=2023 | doi=10.1017/9781108596619 | isbn=978-1-108-59661-9 | url=https://doi.org/10.1017/9781108596619}}</ref>
 
== Software ==
This approach has, of course, the disadvantage of having a much larger space to search — in fact, not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm much longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system.
 
=== End-user software ===
Nevertheless, this characteristic of symbolic regression also has advantages: because the [[evolutionary algorithm]] requires diversity in order to effectively explore the search space, the end result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity.
* [[QLattice]] is a quantum-inspired simulation and machine learning technology that helps search through an infinite list of potential mathematical models to solve a problem.<ref>{{Cite web|url=https://docs.abzu.ai|title=Feyn is a Python module for running the QLattice|date=June 22, 2022}}</ref><ref name="srfeyn" />
* [https://github.com/hengzhe-zhang/EvolutionaryForest Evolutionary Forest] is a Genetic Programming-based automated feature construction algorithm for symbolic regression.<ref>{{Cite journal |last1=Zhang |first1=Hengzhe |last2=Zhou |first2=Aimin |last3=Zhang |first3=Hu |date=August 2022 |title=An Evolutionary Forest for Regression |journal=IEEE Transactions on Evolutionary Computation |volume=26 |issue=4 |pages=735–749 |doi=10.1109/TEVC.2021.3136667 |bibcode=2022ITEC...26..735Z |issn=1089-778X}}</ref><ref>{{Cite journal |last1=Zhang |first1=Hengzhe |last2=Zhou |first2=Aimin |last3=Chen |first3=Qi |last4=Xue |first4=Bing |last5=Zhang |first5=Mengjie |date=2023 |title=SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method |journal=IEEE Transactions on Evolutionary Computation |volume=28 |issue=5 |pages=1484–1498 |doi=10.1109/TEVC.2023.3243172 |issn=1089-778X}}</ref>
* [https://github.com/brendenpetersen/deep-symbolic-optimization uDSR] is a deep learning framework for symbolic optimization tasks<ref>{{Cite web|url=https://github.com/brendenpetersen/deep-symbolic-optimization|title=Deep symbolic optimization|website=[[GitHub]] |date=June 22, 2022}}</ref>
* [https://github.com/darioizzo/dcgp/ dCGP], differentiable Cartesian Genetic Programming in python (free, open source) <ref>{{Cite web|url=https://darioizzo.github.io/dcgp/|title=Differentiable Cartesian Genetic Programming, v1.6 Documentation|date=June 10, 2022}}</ref><ref>{{Cite journal|title=Differentiable genetic programming|first1=Dario|last1=Izzo|first2=Francesco|last2=Biscani|first3=Alessio|last3=Mereta|journal=Proceedings of the European Conference on Genetic Programming|year=2016 |arxiv=1611.04766 }}</ref>
* [[HeuristicLab]], a software environment for heuristic and evolutionary algorithms, including symbolic regression (free, open source)
* [[Gene expression programming#Software|GeneXProTools]], - an implementation of [[Gene expression programming]] technique for various problems including symbolic regression (commercial)
* [[Multi expression programming#MEPX|Multi Expression Programming X]], an implementation of [[Multi expression programming]] for symbolic regression and classification (free, open source)
* [[Eureqa]], evolutionary symbolic regression software (commercial), and [[software library]]
* [https://turingbotsoftware.com/ TuringBot], symbolic regression software based on simulated annealing (commercial)
* [https://github.com/MilesCranmer/PySR PySR],<ref>{{cite web |title=High-Performance Symbolic Regression in Python |website=[[GitHub]] |date=18 August 2022 |url=https://github.com/MilesCranmer/PySR}}</ref> symbolic regression environment written in [[Python (programming language)|Python]] and [[Julia (programming language)|Julia]], using regularized evolution, [[simulated annealing]], and [[gradient]]-free optimization (free, open source)<ref>{{Cite web|url=https://www.quantamagazine.org/machine-scientists-distill-the-laws-of-physics-from-raw-data-20220510/|title='Machine Scientists' Distill the Laws of Physics From Raw Data|date=May 10, 2022|website=[[Quanta Magazine]]}}</ref>
* [https://github.com/marcovirgolin/GP-GOMEA GP-GOMEA], fast ([[C++]] back-end) [[genetic programming|evolutionary]] symbolic regression with [[Python (programming language)|Python]] [[scikit-learn]]-compatible interface, achieved one of the best trade-offs between accuracy and simplicity of discovered models on [https://cavalab.org/srbench/ SRBench] in 2021 (free, open source)
 
== See also ==
* [[Closed-form expression#Conversion from numerical forms|Closed-form expression § Conversion from numerical forms]]
* [[Eureqa]], software that implements symbolic regression
* [[DataMelt]], software that implements symbolic regression in Java and Python
* [http://dev.heuristiclab.com/] software for comparing various optimization techniques, including symbolic regression
* [http://cran.r-project.org/web/packages/rgp/index.html RGP] package for symbolic regression in R
* [https://sites.google.com/site/gptips4matlab/ GPTIPS] software that implements symbolic regression
* [[Closed-form expression#Conversion from numerical forms]]
* [[Genetic programming]]
* [[Gene expression programming]]
* [[Kolmogorov complexity]]
* [[Linear genetic programming]]
* [[Mathematical optimization]]
* [[Multi expression programming]]
* [[Regression analysis]]
* [[Reverse mathematics]]
* [[Discovery system (AI research)]]<ref name="aifeynman"/>
 
== References ==
{{reflist|refs=.<ref name="bayesian">{{cite arXiv
| title = Bayesian Symbolic Regression
<ref name="complexity">{{cite journal
| author1 = Ying Jin
| author2 = Weilin Fu
| author3 = Jian Kang
| author4 = Jiadong Guo
| author5 = Jian Guo
| year = 2019
| class = stat.ME
| eprint = 1910.08892
}}</ref><ref name="schmidt2009distilling">{{cite journal
| title = Distilling free-form natural laws from experimental data
| author1 = Michael Schmidt
| author2 = Hod Lipson
| journal = Science
| volume = 324
| number = 5923
| pages = 81–85
| year = 2009
| publisher = American Association for the Advancement of Science
| url = https://www.science.org/doi/10.1126/science.1165893
| doi=10.1126/science.1165893
| pmid = 19342586
| bibcode = 2009Sci...324...81S
| citeseerx = 10.1.1.308.2245
| s2cid = 7366016
}}</ref><ref name="aifeynman">{{cite journal
| title = AI Feynman: A physics-inspired method for symbolic regression
| author1 = Silviu-Marian Udrescu
| author2 = Max Tegmark
| journal = Science_Advances
| volume = 6
| number = 16
| year = 2020
| pages = eaay2631
| publisher = American Association for the Advancement of Science
| doi = 10.1126/sciadv.aay2631
| pmid = 32426452
| pmc = 7159912
| arxiv = 1905.11481
| bibcode = 2020SciA....6.2631U
}}</ref><ref name="ufo">{{cite journal
| title = Universal Functions Originator
| author1 = Ali R. Al-Roomi
| author2 = Mohamed E. El-Hawary
| journal = Applied Soft Computing
| volume = 94
| year = 2020
| article-number = 106417
| issn = 1568-4946
| url = https://www.sciencedirect.com/science/article/pii/S1568494620303574
| publisher = Elsevier B.V.
| doi = 10.1016/j.asoc.2020.106417
| s2cid = 219743405
| url-access= subscription
}}</ref><ref name="complexity">{{cite journal
| title = Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming
| author1 = Ekaterina J. Vladislavleva
Line 35 ⟶ 142:
| pages = 333–349
| year = 2009
| publisher = [[Institute of Electrical and Electronics Engineers|IEEE]]
| url = http://symbolicregression.com/sites/SRDocuments/NonlinearityPreprint.pdf
| doi=10.1109/tevc.2008.926486
| bibcode = 2009ITEC...13..333V
| s2cid = 12072764
}}</ref><ref name="exactlearning">{{cite web
| title = A Natural Representation of Functions for Exact Learning
| type = Preprint
| author = Benedict W. J. Irwin
| year = 2021
| url = https://assets.researchsquare.com/files/rs-149856/v1_covered.pdf?c=1631872748
| doi = 10.21203/rs.3.rs-149856/v1
| s2cid = 234014141
}}</ref><ref name="srbench2022">{{cite web
|title = SRBench Competition 2022: Interpretable Symbolic Regression for Data Science
|author1 = Michael Kommenda
|author2 = William La Cava
|author3 = Maimuna Majumder
|author4 = Fabricio Olivetti de França
|author5 = Marco Virgolin
|url = https://cavalab.org/srbench/competition-2022/
}}</ref><ref name="srfeyn">{{cite arXiv
| author1 = Kevin René Broløs
| author2 = Meera Vieira Machado
| author3 = Chris Cave
| author4 = Jaan Kasak
| author5 = Valdemar Stentoft-Hansen
| author6 = Victor Galindo Batanero
| author7 = Tom Jelen
| author8 = Casper Wilstrup
| date=2021-04-12
| title = An Approach to Symbolic Regression Using Feyn
| class = cs.LG
| eprint = 2104.05417
}}</ref>
}}
Line 49 ⟶ 186:
| author4 = Gary A. Montague
| author5 = Peter Marenbach
| booktitlebook-title = IEE Conference Publications
| number = 446
| pages = 314–319
| year = 1997
| publisher = [[Institution of Electrical Engineers|IEE]]
| url = http://www.staffcs.nclbham.ac.uk/d~wbl/biblio/cache/cache/.hidden_13-jun_1525733794/http___www.staff.ncl.ac.uk_d.p.searson/docs/galesia97surveyofGPsearson_docs_galesia97surveyofGP.pdf
}}
* {{cite journalthesis
|degree = M.Sc.
| title = Distilling free-form natural laws from experimental data
| author1 = MichaelWouter SchmidtMinnebo
| author2 = HodSean LipsonStijven
|year = 2011
| journal = [[Science (journal)|Science]]
|title = Empowering Knowledge Computing with Variable Selection
| volume = 324
|chapter = Chapter 4: Symbolic Regression
| number = 5923
|publisher = [[University of Antwerp]]
| pages = 81–85
|chapter-url = https://community.alteryx.com/pvsmt99345/attachments/pvsmt99345/product-ideas/1300/1/ThesisWouterSean_v2.pdf
| year = 2009
| publisher = [[American Association for the Advancement of Science|AAAS]]
| url = http://creativemachines.cornell.edu/sites/default/files/Science09_Schmidt.pdf
| doi=10.1126/science.1165893
}}
* {{cite conference
|title = Performance improvement of machine learning via automatic discovery of facilitating functions as applied to a problem of symbolic system identification
|author1 = John R. Koza
|author2 = Martin A. Keane
|author3 = James P. Rice
|book-title = IEEE International Conference on Neural Networks
|pages = 191–198
|year = 1993
|___location = San Francisco
|publisher = [[Institute of Electrical and Electronics Engineers|IEEE]]
|url = http://www.genetic-programming.com/jkpdf/icnn1993impulse.pdf
}}
 
== External links ==
 
* {{cite web |title = Symbolic regression — an overview |url = http://www.mafy.lut.fi/EcmiNL/older/ecmi35/node70.html |author = Ivan Zelinka |year = 2004 }}
* {{cite web |title = Simple Symbolic Regression Using Genetic Programming |url = http://alphard.ethz.ch/gerber/approx/default.html |author = Hansueli Gerber |year = 1998 }} (Java applet) — approximates a function by evolving combinations of simple arithmetic operators, using algorithms developed by [[John Koza]].
* {{cite web |title = Symbolic Regression: Function Discovery & More |url = http://www.symbolicregression.com |author = Katya Vladislavleva |archive-url = https://web.archive.org/web/20141218105301/http://symbolicregression.com/ |archive-date = 2014-12-18}}
* [http://www.symbolicregression.com SymbolicRegression.com]
 
 
 
[[Category:Regression analysis]]