Multiple kernel learning: Difference between revisions

Content deleted Content added
Tamhok (talk | contribs)
Tamhok (talk | contribs)
Line 80:
 
===Semisupervised learning===
[[Semisupervised learning]] approaches to multiple kernel learning are similar to other extensions of supervised learning approaches. An inductive procedure has been developed that uses a log-likelihood empirical loss and group LASSO regularization with conditional expectation consensus on unlabeled data for image categorization. <ref>We Wang,can Shuhuidefine etthe al.problem [http://ieeexploreas follows.ieee.org/stamp/stamp.jsp?arnumber=6177671 Let S3MKL:<math>L={(x_i,y_i)}</math> Scalablebe Semi-Supervisedthe Multiplelabeled Kerneldata, Learningand forlet Real-World<math>U={x_i}</math> Imagebe Applications].the IEEEset TRANSACTIONSof ONunlabeled MULTIMEDIA, VOLdata. 14Then, NO.we 4,can AUGUSTwrite 2012the decision function as follows. </ref>
 
:<math>f(x)=\alpha_0+\sum_{i=1}^{|L|}\alpha_iK_i(x)</math>
 
The problem can be written as
 
:<math>\min_f L(f) + \lambdaR(f)+\gamma\Theta(f)</math>
 
where <math>L</math> is the loss function (weighted negative log-likelihood in this case), <math>R</math> is the regularization parameter ([[Proximal_gradient_methods_for_learning#Exploiting_group_structure|Group LASSO]] in this case), and <math>\Theta</math> is the conditional expectation consensus (CEC) penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be
 
:<math>g^{\pi}_m(x)=<\phi^{\pi}_m,\psi_m(x)></math>
 
where <math>\psi_m(x)=[K_m(x_1,x),\ldots,K_m(x_L,x)]^T</math> (the kernel distance between the labeled data and all of the labeled and unlabeled data) and <math>\phi^{\pi}_m</math> is a non-negative random vector with a 2-norm of 1. The value of <math>\Pi</math> is the number of times each kernel is projected. Expectation regularization is then performed on the MKD, resulting in a reference expectation <math>q^{pi}_m(y|g^{\pi}_m(x))</math> and model expectation <math>p^{\pi}_m(f(x)|g^{\pi}_m(x))</math>. Then, we define
:<math>\Theta=\frac{1}{\Pi}\sum_{\Pi}_{\pi=1}\sum^{M}_{m=1}D(q^{pi}_m(y|g^{\pi}_m(x))||p^{\pi}_m(f(x)|g^{\pi}_m(x)))</math>
 
 
 
 
<ref> Wang, Shuhui et al. [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6177671 S3MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications]. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012 </ref>
 
===Unsupervised learning===