Radial basis function network: Difference between revisions

Content deleted Content added
AnAj (talk | contribs)
split of Radial basis function, copy of the original article
 
AnAj (talk | contribs)
No edit summary
Line 1:
A '''radial basis function network''' (RBF) is aan real-valued[[artificial functionneural whosenetwork]] valuewhich dependsuses only[[radial onbasis thefunction]]s distanceas fromactivation the [[Origin (mathematics)|origin]]functions. They are used in [[function approximation]], [[time series prediction]], and [[Control theory|control]]. In [[artificial neural network]]s radial basis functions are utilized as activation functions.
 
==Network architecture==
==Interpolation problem==
Radial basis functions can be used to solve a function interpolation problem. A set of known input-output pairs values is needed:
:<math> \left \{ \left[ \mathbf{x}(t) , y(t) \right] : \left[ \mathbb{R}^n , \mathbb{R} \right] \right \} _{t=1}^{ K } </math>
where
:<math> \mathbf{x}(t) </math> is the input vector with index t (or the input at time t),
:<math> y(t) </math> is the output indexed with t,
:<math> n </math> is the dimension of the input space, and
:<math> K </math> is the number of points (<math>K</math> can be infinite).
 
[[Image:060804 architecture.png|thumb|350px|right|Figure 1: Architecture of a radial basis function network. An input vector '''x''' is used as input to severalall radial basis functions, each with different parameters. The output of the network is a linear combination of the outputs from radial basis functions.]]
 
Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The output, <math> \varphi : \mathbb{R}^n \to \mathbb{R} </math>, of the network is thus
In the [[deterministic]] case the data is drawn from the set
:<math> \left \{ \left[ \mathbf{x}(t) , y(t) = f \big( \mathbf{x}(t) \big) \right] \right \} _{t=1}^{ K } </math>.
The data can be noisy, in which case is drawn from the set
:<math> \left \{ \left[ \mathbf{x}(t) , y(t) = f \big( \mathbf{x}(t) \big) + \epsilon(t) \right] \right \} _{t=1}^{ K } </math>
where <math> \epsilon(t) </math> is a partially known random process.
 
:<math>\varphi(\mathbf{x}) = \sum_{i=1}^N a_i \rho(||\mathbf{x}-\mathbf{c}_i||)</math>
In general [[stochastic]] case, data is drawn from the joint probability distribution
:<math> P \left( \mathbf{x} \land y \right ) </math>.
 
where ''N'' is the number of neurons in the hidden layer, <math>c_i</math> is the center vector for neuron ''i'', and <math>a_i</math> are the weights of the linear output neuron. In the basic form all input are connected to each hidden neuron. The norm is typically is taken to be the [[Euclidean distance]] and the basis function is taken to be [[Normal distribution|Gaussian]]
==Architecture==
 
:<math> \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \propto \exp \left[ -\beta \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert ^2 \right] </math>.
RBF networks typically have 3 layers, the input layer, the hidden layer with the RBF non-linearity and a linear output layer. RBF networks have the advantage of not being locked into local minima as
do the [[Artificial_neural_network#Multi-layer_perceptron|MLP networks]]. RBF architectures come in two forms, normalized and unnormalized. The forms can be expanded into a superposition of local linear models.
 
The Gaussian basis functions are local in the sense that <math>\lim_{||x|| \to \infty}\rho(\left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert)</math>. Changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron.
=== RBF types ===
The most popular choice for the non-linearity is the Gaussian.
* Gaussian: <math>\rho(r) = \exp(-\beta r^2)</math> for some <math>\beta > 0</math>
 
RBF networks are universal approximators on a compact subset of <math>\mathbb{R}^n</math>. This means that a RBF network with enough hidden neurons can approximate any continuous function with arbitrary precision.
Other forms, such as a multiquadratic are also used.
 
* Multiquadratic: <math>\rho(r) = \sqrt{r^2 + \beta^2}</math> for some <math>\beta > 0</math>
The weights <math> a_i </math>, <math> \mathbf{c}_i </math>, and <math> \beta </math> are determined in a manner that optimizes the fit between <math> \varphi </math> and the data.
 
[[Image:060803 unnormalized radial basis functions.png|thumb|350px|right|Figure 2: Two unnormalized radial basis functions in one input dimension. The basis function centers are located at <math> c_1=0.75 </math> and <math> c_2=3.25 </math>.
]]
===Unnormalized===
The unnormalized radial basis function architecture, <math> \varphi : \mathbb{R}^n \to \mathbb{R} </math> , is
 
:<math> \varphi ( \mathbf{x} ) \ \stackrel{\mathrm{def}}{=}\ \sum_{i=1}^N a_i \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) </math>
 
where <math> \varphi </math> is the approximation to the data, <math> \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) </math>, known as a "radial basis function," is a local function of the distance <math> \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert </math> between the input vector <math> \mathbf{x} </math> and a "basis function center"
 
:<math> \mathbf{c}_i </math> <math> (i=1,N) </math>,
 
and
:<math> a_i </math> <math> (i=1,N) </math>
are weights to be determined by data. Typically the distance is taken to be the [[Euclidean distance]] and the basis function is taken to be [[Normal distribution|Gaussian]]
 
:<math> \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \propto \exp \left[ -\beta \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert ^2 \right] </math>.
 
The weights <math> a_i </math>, <math> \mathbf{c}_i </math>, and <math> \beta </math> are determined in a manner that optimizes the fit between <math> \varphi </math> and the data.
 
[[Image:060803 normalized radial basis functions.png|thumb|350px|right|Figure 3: Two normalized radial basis functions in one input dimension. The basis function centers are located at <math> c_1=0.75 </math> and <math> c_2=3.25 </math>.]]
===Normalized===
====Normalized architecture====
In addition to the above ''unnormalized'' architecture, RBF networks can be ''normalized''. In this case the mapping is
The normalized RBF architecture is
 
:<math> \varphi ( \mathbf{x} ) \ \stackrel{\mathrm{def}}{=}\ \frac { \sum_{i=1}^N a_i \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } { \sum_{i=1}^N \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } = \sum_{i=1}^N a_i u \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) </math>
Line 62 ⟶ 31:
:<math> u \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \ \stackrel{\mathrm{def}}{=}\ \frac { \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } { \sum_{i=1}^N \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } </math>
 
is known as a "normalized radial basis function.".
 
[[Image:060804 3 normalized basis functions.png|thumb|350px|right|Figure 4: Three normalized radial basis functions in one input dimension. The additional basis function has center at <math> c_3=2.75 </math> ]]
====Theoretical motivation for normalization====
There is theoretical justification for this architecture in the case of stochastic data flow. Assume a [[Stochasticstochastic kernel]] approximation for the joint probability density
 
:<math> P\left ( \mathbf{x} \land y \right ) = \sum_{i=1}^N \, \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \, \sigma \big ( \left \vert y - e_i \right \vert \big )</math>
Line 137 ⟶ 106:
:<math> \delta_{ij} = \begin{cases} 1, & \mbox{if }i = j \\ 0, & \mbox{if }i \ne j \end{cases} </math>.
 
==Training==
==Objective functions==
 
In a RBF network there are three types of parameters that need to be chosen to adapt the network for a particular task: the center vectors <math>c_i</math>, the output weights <math>w_i</math>, and the RBF width parameters <math>/beta_i</math>. In the sequential training of the weights are updated at each time step as data streams in.
 
For some tasks it makes sense to define an objective function and select the parameter values that minimize it's value. The most common objective function is the least squares function
{{main|Optimization (mathematics)}}
The weights, which we signify by <math> \mathbf{w} </math>, in the RBF architecture are found through optimization of an objective function. The most common objective function is the least squares function
 
:<math> K( \mathbf{w} ) \ \stackrel{\mathrm{def}}{=}\ \sum_{t=1}^\infty K_t( \mathbf{w} ) </math>
Line 161 ⟶ 131:
where optimization of S maximizes smoothness and <math> \lambda </math> is known as a [[regularization]] parameter.
 
==Training=Interpolation===
 
RBF networks can be used to interpolate a function <math>y: \mathbb{R}^n \to \mathbb{R}</math> when the values of that function are known on finite number of points: <math>y(x_i) = b_i, i=1, \ldots, N</math>. Taking the known points <math>x_i</math> to be the centers of the radial basis functions and evaluating the values of the basis functions at the same points <math>g_{ij} = \rho(|| x_j - x_i ||)</math> the weights can be solved from the equation
Training the centers and weights to optimize the objective function is typically done in hybrid fashion by first fixing the basis funtion centers and then optimizing the weights. In the sequential training the weights are updated at each time step as data streams in.
:<math>\left[ \begin{matrix}
g_{11} & g_{12} & \cdots & g_{1N} \\
g_{21} & g_{22} & \cdots & g_{2N} \\
\vdots & & \ddots & \vdots \\
g_{N1} & g_{N2} & \cdots & g_{NN}
\end{matrix}\right] \left[ \begin{matrix}
w_1 \\
w_2 \\
\vdots \\
w_N
\end{matrix} \right] = \left[ \begin{matrix}
b_1 \\
b_2 \\
\vdots \\
b_N
\end{matrix} \right]</math>
 
It can be shown that the interpolation matrix in the above equation is non-singular, if the points ''x_i'' are distinct, and thus the weights ''w'' can be solved by simple linear algebra:
===Training the basis function centers===
:<math>\mathbf{w} = \mathbf{G}^{-1} \mathbf{b}</math>
 
===Function approximation===
 
If the purpose is not to perform strict interpolation but instead more general [[function approximation]] or [[Statistical classification|classification]] the optimization is somewhat more complex because there is no obvious choice for the centers. The training is typically done in two phases first fixing the width and centers and then the weights. This can be justified by considering the different nature of the non-linear hidden neurons versus the linear output neuron.
 
====Training the basis function centers====
 
Basis function centers can be either randomly sampled among the input instances or found by [[data clustering|clustering]] the samples and choosing the the cluster means as the centers.
 
The RBF widths are usually all fixed to same value which is proportional to the maximum distance between the chosen centers.
===Gradient descent training of the linear weights===
 
====Pseudoinverse solution for the linear weights====
 
After the centers <math>c_i</math> have been fixed, the weights that minimize the error at the output are computed with a linear [[pseudoinverse]] solution:
:<math>\mathbf{w} = \mathbf{G}^+ \mathbf{b}</math>,
where the entries of ''G'' are the values of the radial basis functions evaluated at the points <math>x_i</math>: <math>g_{ji} = \rho(||x_j-c_i||)</math>.
 
The existence of this linear solution means that unlike [[Artificial_neural_network#Multi-layer_perceptron|Multi-layer perceptron (MLP) networks]] the RBF networks have an unique local minimum (when the centers are fixed).
 
{{main|====Gradient descent}} training of the linear weights====
 
TheAnother simplestpossible training algorithm is [[Gradientgradient descent]]. In gradient descent training the weights are adjusted at each time step by moving them in a direction opposite from the gradient of the objective function
 
:<math> \mathbf{w}(t+1) = \mathbf{w}(t) - \nu \frac {d} {d\mathbf{w}} H_t(\mathbf{w}) </math>
Line 193 ⟶ 194:
:<math> e_{ij} (t+1) = e_{ij}(t) + \nu \big [ y(t) - \varphi \big ( \mathbf{x}(t), \mathbf{w} \big ) \big ] v_{ij} \big ( \mathbf{x}(t) - \mathbf{c}_i \big ) </math>
 
====Projection operator training of the linear weights====
 
For the case of training the linear weights, <math> a_i </math> and <math> e_{ij} </math>, the algorithm becomes
Line 296 ⟶ 297:
where
 
:<math> \varphiy[x(t)] \approx f[x(t)] = x(t+1)- c[x(t),t] </math>
 
is an approximation to the underlying natural dynamics of the system.
Line 310 ⟶ 311:
==See also==
 
* [[Artificial neural networks]]
* [[Predictive analytics]]
* [[Chaos theory]]
* [[Autoregressive moving average model]]
* [[Autoregressive integrated moving average]]
* [[Autoregressive conditional heteroskedasticity ]]
* [[Mitchell Feigenbaum]]
 
==External links==
*[http://www-bd.fnal.gov/icalepcs/abstracts/PDF/th1ab.pdf Model Predictive Control with radial basis functions]
*[http://www.mib.sk/Handbook%20of%20Neural%20Computation/NCG2_7.pdf Control of a negative ion source]
 
==References==
* J. Moody and C. J. Darken, "Fast learning in networks of locally tuned processing units," Neural Computation, 1, 281-294 (1989). Also see [http://www.ki.inf.tu-dresden.de/~fritzke/FuzzyPaper/node5.html Radial basis function networks according to Moody and Darken]
* T. Poggio and F. Girosi, "Networks for approximation and learning," Proc. IEEE 78(9), 1484-1487 (1990).
* [[Roger Jones (physicist and entrepreneur) | Roger D. Jones]], Y. C. Lee, C. W. Barnes, G. W. Flake, K. Lee, P. S. Lewis, and S. Qian, ?[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=137644 Function approximation and time series prediction with neural networks],? Proceedings of the International Joint Conference on Neural Networks, June 17-21, p. I-649 (1990).
* {{cite book | author=Martin D. Buhmann, M. J. Ablowitz | title=Radial Basis Functions : Theory and Implementations | publisher= Cambridge University| year=2003 | id=ISBN 0-521-63338-9}}
* {{cite book | author=Yee, Paul V. and Haykin, Simon | title=Regularized Radial Basis Function Networks: Theory and Applications | publisher= John Wiley| year=2001 | id=ISBN 0-471-35349-3}}
* John R. Davies, Stephen V. Coggeshall, Roger D. Jones, and Daniel Schutzer, "Intelligent Security Systems," in {{cite book | author=Freedman, Roy S., Flein, Robert A., and Lederman, Jess, Editors | title=Artificial Intelligence in the Capital Markets | ___location= Chicago | publisher=Irwin| year=1995 | id=ISBN 1-55738-811-3}}
* {{cite book | author=Simon Haykin | title=Neural Networks: A Comprehensice Foundation | edition=2nd edition | ___location=Upper Saddle River, NJ | publisher=Prentice Hall| year=1999 | id=ISBN 0-13-908385-5}}
 
[[Category:Neural networks]]
[[Category:Information technology]]
[[Category:Computer network analysis]]
[[Category:Networks]]
[[Category:Cybernetics]]
[[Category:Artificial intelligence]]
[[Category:Interpolation]]
 
[[bg:Невронна мрежа]]
[[de:Radiale Basisfunktion]]
[[es:Red neuronal artificial]]
[[fr:Réseau de neurones]]
[[ko:신경망]]
[[hr:neuronska mreža]]
[[ja:ニューラルネットワーク]]
[[pl:Sieć neuronowa]]
[[pt:Rede neural]]
[[ro:Reţele neuronale]]
[[ru:Нейронная сеть]]
[[sl:nevronska mreža]]
[[zh:神经网络]]