Revision as of 07:39, 17 December 2014 edit BG19bot (talk \| contribs) 1,005,055 edits m WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (10514) ← Previous edit		Revision as of 05:40, 21 December 2014 edit undo ArtemTimoshenko (talk \| contribs) 2 edits No edit summary Next edit →
Line 1: In [[machine learning]], a '''Hyper basis function network''', or '''HyperBF network''', is a generalization of [[Radial basis function network\|radial basis function (RBF) networks]] concept, where the [[Mahalanobis distance\|Mahalanobis]]-like distance is used instead of ~~standard~~ Euclidian distance measure. ~~The~~Hyper ~~[[activation~~basis function]] networks were first considered by Poggio and Girosi in 1990 at the ~~HyperBF~~paper ~~network~~“Networks ~~takes~~for Approximation and Learning” <ref name="PoggioGirosi1990"> T. Poggio and F. Girosi (1990). "Networks for Approximation and Learning". ''Proc. of the ~~following~~IEEE'' ~~form~~'''Vol. 78, No. 9''':1481-1497.</ref><ref name="Mahdi">R.N. Mahdi, E.C. Rouchka (2011). [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5733426 "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training"]. ''IEEE Transactions of Neural Networks'' '''2''':673–686.</ref>. ==Network Architecture== AsThe attypical ~~the RBF~~HyperBF network ~~case~~structure consists of a real input vector <math>x\in \mathbb{R}^n</math>, ~~the~~a hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, <math>\phi: \mathbb{R}^n\to\mathbb{R}</math>, is given by ▼ <div style="text-align: center;"><math>\phi(x)=\sum_{j=1}^{N}~~a_i~~a_j\rho_j(\|\|x-\mu_j\|\|)</math></div>▼ where <math>N</math> is a number of neurons in the hidden layer, <math>\mu_j</math> and <math>a_j</math> are the center and weight of neuron <math>j</math>. The [[activation function]] <math>\rho_j(\|\|x-\mu_j\|\|)</math> at the HyperBF network takes the following form <div style="text-align: center;"><math>\rho_j(\|\|x-\mu_j\|\|)=e^{(x-\mu_j)^T R_j(x-\mu_j)}</math></div> where <math>R_j</math> is a positive definite <math>d\times d</math> matrix. Depending on the application, the following types of matrices <math>R_j</math> are usually considered<ref name="Schwenker">F. Schwenker, H.A. Kestler and G. Palm (2001). ~~[http://www.sciencedirect.com/science/article/pii/S0893608001000272#~~ "Three Learning Phases for Radial-Basis-Function Network"]. ''Neural Netw.'' '''14''':439-458.</ref> * <math>R_j=\frac{1}{2\sigma^2}\mathbb{I}_{d\times d}</math>, where <math>\sigma>0</math>. This case corresponds to the regular RBF network. * <math>R_j=\frac{1}{2\sigma_j^2}\mathbb{I}_{d\times d}</math>, where <math>\sigma_j>0</math>. In this case, the basis functions are radially symmetric, but are scaled with different width. Line 7 ⟶ 13: * Positive definite matrix, but not diagonal. ==Training== ▲As at the RBF network case, the output of the network is a scalar function of the input vector, <math>\phi: \mathbb{R}^n\to\mathbb{R}</math>, is given by ▲<div style="text-align: center;"><math>\phi=\sum_{j=1}^{N}a_i\rho_j(\|\|x-\mu_j\|\|)</math></div> Training HyperBF networks involves estimation of weights <math>a_j</math>, shape and centers of neurons <math>R_j</math> and <math>\mu_j</math>. Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below. Training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.<ref name="Mahdi">R.N. Mahdi, E.C. Rouchka (2011). [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5733426 "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training"]. ''IEEE Transactions of Neural Networks'' '''2''':673–686.</ref>▼ Consider the quadratic loss of the network <math>H[\phi^]=\sum_{i=1}^{N}(y_i-\phi^ (x_i))^2</math>. The following conditions must be satisfied at the optimum: <div style="text-align: center;"><math>\frac{\partial H(\phi^)}{\partial a_j}=0 </math>, <math>\frac{\partial H(\phi^)}{\partial \mu_j}=0 </math>, <math>\frac{\partial H(\phi^)}{\partial W}=0 </math></div> where <math>R_j=W^TW</math>. Then in the gradient descent method the values of <math>a_j, \mu_j, W</math> that minimize <math>H[\phi^]</math> can be found as a stable fixed point of the following dynamic system: <div style="text-align: center;"><math>\dot{a_j}=-\omega\frac{\partial H(\phi^)}{\partial a_j}</math>, <math>\dot{\mu_j}=-\omega\frac{\partial H(\phi^)}{\partial \mu_j} </math>, <math>\dot{W}=-\omega\frac{\partial H(\phi^*)}{\partial W} </math></div> where <math>\omega</math> determines the rate of convergence. ▲~~Training~~Overall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.<ref name="Mahdi">R.N. Mahdi, E.C. Rouchka (2011). [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5733426 "Reduced HyperBF Networks: Regularization by Explicit Complexity Reduction and Scaled Rprop-Based Training"]. ''IEEE Transactions of Neural Networks'' '''2''':673–686.</ref> ==References==

Hyper basis function network: Difference between revisions