Hyper basis function network: Difference between revisions

Content deleted Content added
BG19bot (talk | contribs)
m WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (10514)
No edit summary
Line 1:
In [[machine learning]], a '''Hyper basis function network''', or '''HyperBF network''', is a generalization of [[Radial basis function network|radial basis function (RBF) networks]] concept, where the [[Mahalanobis distance|Mahalanobis]]-like distance is used instead of standard Euclidian distance measure. TheHyper [[activationbasis function]] networks were first considered by Poggio and Girosi in 1990 at the HyperBFpaper network“Networks takesfor Approximation and Learning” <ref name="PoggioGirosi1990"> T. Poggio and F. Girosi (1990). "Networks for Approximation and Learning". ''Proc. of the followingIEEE'' form'''Vol. 78, No. 9''':1481-1497.</ref><ref name="Mahdi">R.N. Mahdi, E.C. Rouchka (2011). [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5733426 "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training"]. ''IEEE Transactions of Neural Networks'' '''2''':673–686.</ref>.
 
==Network Architecture==
 
AsThe attypical the RBFHyperBF network casestructure consists of a real input vector <math>x\in \mathbb{R}^n</math>, thea hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, <math>\phi: \mathbb{R}^n\to\mathbb{R}</math>, is given by
<div style="text-align: center;"><math>\phi(x)=\sum_{j=1}^{N}a_ia_j\rho_j(||x-\mu_j||)</math></div>
where <math>N</math> is a number of neurons in the hidden layer, <math>\mu_j</math> and <math>a_j</math> are the center and weight of neuron <math>j</math>. The [[activation function]] <math>\rho_j(||x-\mu_j||)</math> at the HyperBF network takes the following form
<div style="text-align: center;"><math>\rho_j(||x-\mu_j||)=e^{(x-\mu_j)^T R_j(x-\mu_j)}</math></div>
where <math>R_j</math> is a positive definite <math>d\times d</math> matrix. Depending on the application, the following types of matrices <math>R_j</math> are usually considered<ref name="Schwenker">F. Schwenker, H.A. Kestler and G. Palm (2001). [http://www.sciencedirect.com/science/article/pii/S0893608001000272# "Three Learning Phases for Radial-Basis-Function Network"]. ''Neural Netw.'' '''14''':439-458.</ref>
* <math>R_j=\frac{1}{2\sigma^2}\mathbb{I}_{d\times d}</math>, where <math>\sigma>0</math>. This case corresponds to the regular RBF network.
* <math>R_j=\frac{1}{2\sigma_j^2}\mathbb{I}_{d\times d}</math>, where <math>\sigma_j>0</math>. In this case, the basis functions are radially symmetric, but are scaled with different width.
Line 7 ⟶ 13:
* Positive definite matrix, but not diagonal.
 
==Training==
As at the RBF network case, the output of the network is a scalar function of the input vector, <math>\phi: \mathbb{R}^n\to\mathbb{R}</math>, is given by
 
<div style="text-align: center;"><math>\phi=\sum_{j=1}^{N}a_i\rho_j(||x-\mu_j||)</math></div>
Training HyperBF networks involves estimation of weights <math>a_j</math>, shape and centers of neurons <math>R_j</math> and <math>\mu_j</math>. Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below.
Training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.<ref name="Mahdi">R.N. Mahdi, E.C. Rouchka (2011). [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5733426 "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training"]. ''IEEE Transactions of Neural Networks'' '''2''':673–686.</ref>
 
Consider the quadratic loss of the network <math>H[\phi^*]=\sum_{i=1}^{N}(y_i-\phi^* (x_i))^2</math>. The following conditions must be satisfied at the optimum:
<div style="text-align: center;"><math>\frac{\partial H(\phi^*)}{\partial a_j}=0 </math>, <math>\frac{\partial H(\phi^*)}{\partial \mu_j}=0 </math>, <math>\frac{\partial H(\phi^*)}{\partial W}=0 </math></div>
 
where <math>R_j=W^TW</math>. Then in the gradient descent method the values of <math>a_j, \mu_j, W</math> that minimize <math>H[\phi^*]</math> can be found as a stable fixed point of the following dynamic system:
 
<div style="text-align: center;"><math>\dot{a_j}=-\omega\frac{\partial H(\phi^*)}{\partial a_j}</math>, <math>\dot{\mu_j}=-\omega\frac{\partial H(\phi^*)}{\partial \mu_j} </math>, <math>\dot{W}=-\omega\frac{\partial H(\phi^*)}{\partial W} </math></div>
 
where <math>\omega</math> determines the rate of convergence.
 
TrainingOverall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.<ref name="Mahdi">R.N. Mahdi, E.C. Rouchka (2011). [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5733426 "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training"]. ''IEEE Transactions of Neural Networks'' '''2''':673–686.</ref>
 
==References==