Content deleted Content added
split of Radial basis function, copy of the original article |
No edit summary |
||
Line 1:
A '''radial basis function network'''
==Network architecture==
[[Image:060804 architecture.png|thumb|350px|right|Figure 1: Architecture of a radial basis function network. An input vector '''x''' is used as input to
Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The output, <math> \varphi : \mathbb{R}^n \to \mathbb{R} </math>, of the network is thus
:<math>\varphi(\mathbf{x}) = \sum_{i=1}^N a_i \rho(||\mathbf{x}-\mathbf{c}_i||)</math>
where ''N'' is the number of neurons in the hidden layer, <math>c_i</math> is the center vector for neuron ''i'', and <math>a_i</math> are the weights of the linear output neuron. In the basic form all input are connected to each hidden neuron. The norm is typically is taken to be the [[Euclidean distance]] and the basis function is taken to be [[Normal distribution|Gaussian]]
:<math> \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \propto \exp \left[ -\beta \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert ^2 \right] </math>.
The Gaussian basis functions are local in the sense that <math>\lim_{||x|| \to \infty}\rho(\left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert)</math>. Changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron.
RBF networks are universal approximators on a compact subset of <math>\mathbb{R}^n</math>. This means that a RBF network with enough hidden neurons can approximate any continuous function with arbitrary precision.
The weights <math> a_i </math>, <math> \mathbf{c}_i </math>, and <math> \beta </math> are determined in a manner that optimizes the fit between <math> \varphi </math> and the data.
[[Image:060803 unnormalized radial basis functions.png|thumb|350px|right|Figure 2: Two unnormalized radial basis functions in one input dimension. The basis function centers are located at <math> c_1=0.75 </math> and <math> c_2=3.25 </math>.
]]
[[Image:060803 normalized radial basis functions.png|thumb|350px|right|Figure 3: Two normalized radial basis functions in one input dimension. The basis function centers are located at <math> c_1=0.75 </math> and <math> c_2=3.25 </math>.]]
===Normalized===
====Normalized architecture====
In addition to the above ''unnormalized'' architecture, RBF networks can be ''normalized''. In this case the mapping is
:<math> \varphi ( \mathbf{x} ) \ \stackrel{\mathrm{def}}{=}\ \frac { \sum_{i=1}^N a_i \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } { \sum_{i=1}^N \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } = \sum_{i=1}^N a_i u \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) </math>
Line 62 ⟶ 31:
:<math> u \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \ \stackrel{\mathrm{def}}{=}\ \frac { \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } { \sum_{i=1}^N \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) } </math>
is known as a "normalized radial basis function
[[Image:060804 3 normalized basis functions.png|thumb|350px|right|Figure 4: Three normalized radial basis functions in one input dimension. The additional basis function has center at <math> c_3=2.75 </math> ]]
====Theoretical motivation for normalization====
There is theoretical justification for this architecture in the case of stochastic data flow. Assume a [[
:<math> P\left ( \mathbf{x} \land y \right ) = \sum_{i=1}^N \, \rho \big ( \left \Vert \mathbf{x} - \mathbf{c}_i \right \Vert \big ) \, \sigma \big ( \left \vert y - e_i \right \vert \big )</math>
Line 137 ⟶ 106:
:<math> \delta_{ij} = \begin{cases} 1, & \mbox{if }i = j \\ 0, & \mbox{if }i \ne j \end{cases} </math>.
==Training==
In a RBF network there are three types of parameters that need to be chosen to adapt the network for a particular task: the center vectors <math>c_i</math>, the output weights <math>w_i</math>, and the RBF width parameters <math>/beta_i</math>. In the sequential training of the weights are updated at each time step as data streams in.
For some tasks it makes sense to define an objective function and select the parameter values that minimize it's value. The most common objective function is the least squares function
:<math> K( \mathbf{w} ) \ \stackrel{\mathrm{def}}{=}\ \sum_{t=1}^\infty K_t( \mathbf{w} ) </math>
Line 161 ⟶ 131:
where optimization of S maximizes smoothness and <math> \lambda </math> is known as a [[regularization]] parameter.
==
RBF networks can be used to interpolate a function <math>y: \mathbb{R}^n \to \mathbb{R}</math> when the values of that function are known on finite number of points: <math>y(x_i) = b_i, i=1, \ldots, N</math>. Taking the known points <math>x_i</math> to be the centers of the radial basis functions and evaluating the values of the basis functions at the same points <math>g_{ij} = \rho(|| x_j - x_i ||)</math> the weights can be solved from the equation
:<math>\left[ \begin{matrix}
g_{11} & g_{12} & \cdots & g_{1N} \\
g_{21} & g_{22} & \cdots & g_{2N} \\
\vdots & & \ddots & \vdots \\
g_{N1} & g_{N2} & \cdots & g_{NN}
\end{matrix}\right] \left[ \begin{matrix}
w_1 \\
w_2 \\
\vdots \\
w_N
\end{matrix} \right] = \left[ \begin{matrix}
b_1 \\
b_2 \\
\vdots \\
b_N
\end{matrix} \right]</math>
It can be shown that the interpolation matrix in the above equation is non-singular, if the points ''x_i'' are distinct, and thus the weights ''w'' can be solved by simple linear algebra:
:<math>\mathbf{w} = \mathbf{G}^{-1} \mathbf{b}</math>
===Function approximation===
If the purpose is not to perform strict interpolation but instead more general [[function approximation]] or [[Statistical classification|classification]] the optimization is somewhat more complex because there is no obvious choice for the centers. The training is typically done in two phases first fixing the width and centers and then the weights. This can be justified by considering the different nature of the non-linear hidden neurons versus the linear output neuron.
====Training the basis function centers====
Basis function centers can be either randomly sampled among the input instances or found by [[data clustering|clustering]] the samples and choosing the the cluster means as the centers.
The RBF widths are usually all fixed to same value which is proportional to the maximum distance between the chosen centers.
====Pseudoinverse solution for the linear weights====
After the centers <math>c_i</math> have been fixed, the weights that minimize the error at the output are computed with a linear [[pseudoinverse]] solution:
:<math>\mathbf{w} = \mathbf{G}^+ \mathbf{b}</math>,
where the entries of ''G'' are the values of the radial basis functions evaluated at the points <math>x_i</math>: <math>g_{ji} = \rho(||x_j-c_i||)</math>.
The existence of this linear solution means that unlike [[Artificial_neural_network#Multi-layer_perceptron|Multi-layer perceptron (MLP) networks]] the RBF networks have an unique local minimum (when the centers are fixed).
:<math> \mathbf{w}(t+1) = \mathbf{w}(t) - \nu \frac {d} {d\mathbf{w}} H_t(\mathbf{w}) </math>
Line 193 ⟶ 194:
:<math> e_{ij} (t+1) = e_{ij}(t) + \nu \big [ y(t) - \varphi \big ( \mathbf{x}(t), \mathbf{w} \big ) \big ] v_{ij} \big ( \mathbf{x}(t) - \mathbf{c}_i \big ) </math>
====Projection operator training of the linear weights====
For the case of training the linear weights, <math> a_i </math> and <math> e_{ij} </math>, the algorithm becomes
Line 296 ⟶ 297:
where
:<math>
is an approximation to the underlying natural dynamics of the system.
Line 310 ⟶ 311:
==See also==
* [[Predictive analytics]]
* [[Chaos theory]]
==References==
* J. Moody and C. J. Darken, "Fast learning in networks of locally tuned processing units," Neural Computation, 1, 281-294 (1989). Also see [http://www.ki.inf.tu-dresden.de/~fritzke/FuzzyPaper/node5.html Radial basis function networks according to Moody and Darken]
* T. Poggio and F. Girosi, "Networks for approximation and learning," Proc. IEEE 78(9), 1484-1487 (1990).
* [[Roger Jones (physicist and entrepreneur) | Roger D. Jones]], Y. C. Lee, C. W. Barnes, G. W. Flake, K. Lee, P. S. Lewis, and S. Qian,
* {{cite book | author=Martin D. Buhmann, M. J. Ablowitz | title=Radial Basis Functions : Theory and Implementations | publisher= Cambridge University| year=2003 | id=ISBN 0-521-63338-9}}
* {{cite book | author=Yee, Paul V. and Haykin, Simon | title=Regularized Radial Basis Function Networks: Theory and Applications | publisher= John Wiley| year=2001 | id=ISBN 0-471-35349-3}}
* John R. Davies, Stephen V. Coggeshall, Roger D. Jones, and Daniel Schutzer, "Intelligent Security Systems," in {{cite book | author=Freedman, Roy S., Flein, Robert A., and Lederman, Jess, Editors | title=Artificial Intelligence in the Capital Markets | ___location= Chicago | publisher=Irwin| year=1995 | id=ISBN 1-55738-811-3}}
* {{cite book | author=Simon Haykin | title=Neural Networks: A Comprehensice Foundation | edition=2nd edition | ___location=Upper Saddle River, NJ | publisher=Prentice Hall| year=1999 | id=ISBN 0-13-908385-5}}
[[Category:Neural networks]]
[[Category:Information technology]]
[[Category:Interpolation]]
|