Content deleted Content added
m →References: layout |
Link suggestions feature: 3 links added. |
||
(40 intermediate revisions by 30 users not shown) | |||
Line 1:
{{short description|Type of artificial neural network that uses radial basis functions as activation functions}}
In the field of [[mathematical modeling]], a '''radial basis function network''' is an [[artificial neural network]] that uses [[radial basis function]]s as [[activation function]]s. The output of the network is a [[linear combination]] of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including [[function approximation]], [[time series prediction]], [[Statistical classification|classification]], and system [[Control theory|control]]. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the [[Royal Signals and Radar Establishment]].<ref>{{cite |last1 = Broomhead
|first1 = D. S.
Line 9 ⟶ 10:
|number = 4148
|url = http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA196234
|archive-url = https://web.archive.org/web/20130409223044/http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA196234
|url-status = dead
|archive-date = April 9, 2013
}}</ref><ref>{{cite journal
|last1 = Broomhead
Line 20 ⟶ 24:
|pages = 321–355
|url = https://sci2s.ugr.es/keel/pdf/algorithm/articulo/1988-Broomhead-CS.pdf
|access-date = 2019-01-29
|archive-date = 2020-12-01
|archive-url = https://web.archive.org/web/20201201121028/https://sci2s.ugr.es/keel/pdf/algorithm/articulo/1988-Broomhead-CS.pdf
|url-status = live
}}</ref><ref name="schwenker"/>
==Network architecture==
[[
▲[[Image:Radial funktion network.svg|thumb|250px|right|Figure 1: Architecture of a radial basis function network. An input vector <math>x</math> is used as input to all radial basis functions, each with different parameters. The output of the network is a linear combination of the outputs from radial basis functions.]]
Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The input can be modeled as a vector of real numbers <math>\mathbf{x} \in \mathbb{R}^n</math>. The output of the network is then a scalar function of the input vector, <math> \varphi : \mathbb{R}^n \to \mathbb{R} </math>, and is given by
:<math>\varphi(\mathbf{x}) = \sum_{i=1}^N a_i \rho(||\mathbf{x}-\mathbf{c}_i||)</math>
where <math>N</math> is the number of neurons in the hidden layer, <math>\mathbf c_i</math> is the center vector for neuron <math>i</math>, and <math>a_i</math> is the weight of neuron <math>i</math> in the linear output neuron. Functions that depend only on the distance from a center vector are radially symmetric about that vector, hence the name radial basis function. In the basic form, all inputs are connected to each hidden neuron. The [[Norm (mathematics)|norm]] is typically taken to be the [[Euclidean distance]] (although the [[Mahalanobis distance]] appears to perform better
| |last2=Zitouni|first2=Adel
|last3=Belloir|first3=Fabien
|date=January 2004
|title=New RBF neural network classifier with optimized hidden neurons number
|url=https://www.researchgate.net/publication/254467552
}}</ref><ref>{{cite conference
|conference=Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society
|conference-url=https://ieeexplore.ieee.org/xpl/conhome/8844528/proceeding
|___location=Houston, TX, USA
|last1=Ibrikci|first1=Turgay
|last2=Brandt|first2=M.E.
|last3=Wang|first3=Guanyu
|last4=Acikkar|first4=Mustafa
|date=23–26 October 2002
|publication-date=6 January 2003
|volume=3
|pages=2184–5
|doi=10.1109/IEMBS.2002.1053230
|title=Mahalanobis distance with radial basis function network on protein secondary structures
|isbn=0-7803-7612-9
|issn=1094-687X
}}</ref>{{Editorializing|date=May 2020}}<!-- Was previously marked with a missing-citation tag asking in what sense using Mahalanobis distance is better and why the Euclidean distance is still normally used, but I found sources to support the first part, so it's likely salvageable. -->) and the radial basis function is commonly taken to be [[Normal distribution|Gaussian]]
:<math> \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) = \exp \left[ -\
The Gaussian basis functions are local to the center vector in the sense that
Line 40 ⟶ 69:
i.e. changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron.
Given certain mild conditions on the shape of the activation function, RBF networks are [[universal approximator]]s on a [[Compact space|compact]] subset of <math>\mathbb{R}^n</math>.<ref name="Park">{{cite journal|last=Park|first=J.|author2=I. W. Sandberg|s2cid=34868087|date=Summer 1991|title=Universal Approximation Using Radial-Basis-Function Networks|journal=Neural Computation
The parameters <math> a_i </math>, <math> \mathbf{c}_i </math>, and <math> \beta_i </math> are determined in a manner that optimizes the fit between <math> \varphi </math> and the data.
[[Image:Unnormalized radial basis functions.svg|thumb|250px|right|
]]
===
{{multiple images
| align = right
| direction = vertical
| width = 250
| image1 = Normalized radial basis functions.svg
| caption1
| image2 = 3 Normalized radial basis functions.svg
| caption2
| image3 = 4 Normalized radial basis functions.svg
| caption3
}}
Line 69 ⟶ 97:
:<math> u \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \ \stackrel{\mathrm{def}}{=}\ \frac { \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } { \sum_{j=1}^N \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_j \right \Vert \big ) } </math>
is known as a
====Theoretical motivation for normalization====
Line 95 ⟶ 123:
:<math> P\left ( y \mid \mathbf{x} \right ) </math>
is the conditional probability of y given <math> \mathbf{x} </math>.
The conditional probability is related to the joint probability through [[Bayes' theorem]]
:<math> P\left ( y \mid \mathbf{x} \right ) = \frac {P \left ( \mathbf{x} \land y \right )} {P \left ( \mathbf{x} \right )} </math>
Line 131 ⟶ 159:
:<math> v_{ij}\big ( \mathbf{x} - \mathbf{c}_i \big ) \ \stackrel{\mathrm{def}}{=}\ \begin{cases} \delta_{ij} \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) , & \mbox{if } i \in [1,N] \\ \left ( x_{ij} - c_{ij} \right ) \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) , & \mbox{if }i \in [N+1,2N] \end{cases} </math>
in the unnormalized case and in the normalized case.▼
▲in the unnormalized case and
Here <math> \delta_{ij} </math> is a [[Kronecker delta function]] defined as
:<math> \delta_{ij} = \begin{cases} 1, & \mbox{if }i = j \\ 0, & \mbox{if }i \ne j \end{cases} </math>.
==Training==
RBF networks are typically trained from pairs of input and target values <math>\mathbf{x}(t), y(t)</math>, <math>t = 1, \dots, T</math> by a two-step algorithm.
In the first step, the center vectors <math>\mathbf c_i</math> of the RBF functions in the hidden layer are chosen. This step can be performed in several ways; centers can be randomly sampled from some set of examples, or they can be determined using [[k-means clustering]]. Note that this step is [[unsupervised learning|unsupervised]].
Line 179 ⟶ 200:
|journal = Neural Networks
|volume = 14
|issue = 4–5
|pages = 439–458
|year = 2001
|citeseerx = 10.1.1.109.312
|doi=10.1016/s0893-6080(01)00027-2
|pmid = 11411631
}}</ref>
Line 205 ⟶ 228:
\end{matrix} \right]</math>
It can be shown that the interpolation matrix in the above equation is non-singular, if the points <math>\mathbf x_i</math> are distinct, and thus the weights <math>w</math> can be solved by simple [[linear algebra]]:
:<math>\mathbf{w} = \mathbf{G}^{-1} \mathbf{b}</math>
where <math>G = (g_{ij})</math>.
Line 273 ⟶ 296:
===Logistic map===
The basic properties of radial basis functions can be illustrated with a simple mathematical map, the [[logistic map]], which maps the [[unit interval]] onto itself. It can be used to generate a convenient prototype data stream. The logistic map can be used to explore [[function approximation]], [[time series prediction]], and [[control theory]]. The map originated from the field of [[population dynamics]] and became the prototype for [[chaos theory|chaotic]] time series. The map, in the fully chaotic regime, is given by
:<math> x(t+1)\ \stackrel{\mathrm{def}}{=}\ f\left [ x(t)\right ] = 4 x(t) \left [ 1-x(t) \right ] </math>
Line 348 ⟶ 371:
:<math> {x}^{ }_{ }(t+1) = 4 x(t) [1-x(t)] +c[x(t),t] </math>.
The goal is to choose the control parameter in such a way as to drive the time series to a desired output <math> d(t) </math>. This can be done if we choose the control
:<math> c^{ }_{ }[x(t),t] \ \stackrel{\mathrm{def}}{=}\ -\varphi [x(t)] + d(t+1) </math>
Line 368 ⟶ 391:
==See also==
* [[Radial basis function kernel]]
* [[instance-based learning]]
* [[In Situ Adaptive Tabulation]]
* [[Predictive analytics]]
Line 374 ⟶ 398:
* [[Cerebellar model articulation controller]]
* [[Instantaneously trained neural networks]]
* [[Support vector machine]]
==References==
Line 381 ⟶ 406:
* J. Moody and C. J. Darken, "Fast learning in networks of locally tuned processing units," Neural Computation, 1, 281-294 (1989). Also see [https://web.archive.org/web/20070302175857/http://www.ki.inf.tu-dresden.de/~fritzke/FuzzyPaper/node5.html Radial basis function networks according to Moody and Darken]
* T. Poggio and F. Girosi, "[http://courses.cs.tamu.edu/rgutier/cpsc636_s10/poggio1990rbf2.pdf Networks for approximation and learning]," Proc. IEEE 78(9), 1484-1487 (1990).
*
* {{cite book | author=Martin D. Buhmann | title=Radial Basis Functions: Theory and Implementations | publisher= Cambridge University| year=2003 | isbn=0-521-63338-9}}
* {{cite book |author1=Yee, Paul V. |author2=Haykin, Simon |
* {{cite book|first1=John R.
* {{cite book | author=Simon Haykin | title=Neural Networks: A Comprehensive Foundation | edition=2nd | ___location=Upper Saddle River, NJ | publisher=Prentice Hall| year=1999 | isbn=0-13-908385-5}}
* S. Chen, C. F. N. Cowan, and P. M. Grant, "[https://eprints.soton.ac.uk/251135/1/00080341.pdf Orthogonal Least Squares Learning Algorithm for Radial Basis Function Networks]", IEEE Transactions on Neural Networks, Vol 2, No 2 (Mar) 1991.
[[Category:
[[Category:Computational statistics]]
[[Category:Classification algorithms]]
[[Category:Machine learning algorithms]]
[[Category:Regression analysis]]
[[Category:1988 in artificial intelligence]]
|