Multiple kernel learning: Difference between revisions

Content deleted Content added
TutoCX (talk | contribs)
mNo edit summary
Bender the Bot (talk | contribs)
m Libraries: HTTP to HTTPS for Cornell University
 
(14 intermediate revisions by 12 users not shown)
Line 1:
{{short description|Set of machine learning methods}}
{{Machine learning bar}}
 
Line 19 ⟶ 20:
Fixed rules approaches such as the linear combination algorithm described above use rules to set the combination of the kernels. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm. Other examples of fixed rules include pairwise kernels, which are of the form
:<math>k((x_{1i}, x_{1j}),(x_{2i},x_{2j}))=k(x_{1i},x_{2i})k(x_{1j},x_{2j})+k(x_{1i},x_{2j})k(x_{1j},x_{2i})</math>.
These pairwise approaches have been used in predicting protein-protein interactions.<ref>Ben-Hur, A. and Noble W.S. [httphttps://www.ncbi.nlm.nih.gov/pubmed/15961482?dopt=Abstract Kernel methods for predicting protein-protein interactions.] Bioinformatics. 2005 Jun;21 Suppl 1:i38-46.</ref>
 
====Heuristic approaches====
Line 30 ⟶ 31:
Other approaches use a definition of kernel similarity, such as
 
:<math>A(K_1,K_2)=\frac{<\langle K_1,K_2> \rangle}{\sqrt{<\langle K_1,K_1>< \rangle\langle K_2,K_2>\rangle}}</math>
 
Using this measure, Qui and Lane (2009)<ref>Shibin Qiu and Terran Lane. A framework for multiple kernel support vector regression and its
Line 74 ⟶ 75:
:<math>f(x)=\sum_{i=1}^N\sum_{m=1}^P\alpha_i^mK_m(x_i^m,x^m)+b</math>
 
The parameters <math>\alpha_i^m</math> and <math>b</math> are learned by [[gradient descent]] on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that to the combined kernel. The model is then rerun to generate the optimal weights <math>\alpha_i</math> and <math>b</math>.
 
===Semisupervised learning===
Line 87 ⟶ 88:
where <math>L</math> is the loss function (weighted negative log-likelihood in this case), <math>R</math> is the regularization parameter ([[Proximal gradient methods for learning#Exploiting group structure|Group LASSO]] in this case), and <math>\Theta</math> is the conditional expectation consensus (CEC) penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be
 
:<math>g^{\pi}_m(x)=<\langle\phi^{\pi}_m,\psi_m(x)>\rangle</math>
 
where <math>\psi_m(x)=[K_m(x_1,x),\ldots,K_m(x_L,x)]^T</math> (the kernel distance between the labeled data and all of the labeled and unlabeled data) and <math>\phi^{\pi}_m</math> is a non-negative random vector with a 2-norm of 1. The value of <math>\Pi</math> is the number of times each kernel is projected. Expectation regularization is then performed on the MKD, resulting in a reference expectation <math>q^{pi}_m(y|g^{\pi}_m(x))</math> and model expectation <math>p^{\pi}_m(f(x)|g^{\pi}_m(x))</math>. Then, we define
:<math>\Theta=\frac{1}{\Pi} \sum^{\Pi}_{\pi=1}\sum^{M}_{m=1} D(q^{pi}_m(y|g^{\pi}_m(x))||p^{\pi}_m(f(x)|g^{\pi}_m(x)))</math>
 
where <math>D(Q||P)=\sum_iQ(i)\ln\frac{Q(i)}{P(i)}</math> is the [[Kullback–Leibler divergence]]. The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.<ref>Wang, Shuhui et al. [httphttps://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6177671 S3MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications]. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012</ref>
where <math>D(Q||P)=\sum_iQ(i)\ln\frac{Q(i)}{P(i)}</math> is the [[Kullback-Leibler divergence]].
The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.<ref>Wang, Shuhui et al. [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6177671 S3MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications]. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012</ref>
 
===Unsupervised learning===
Line 102:
where . One formulation of this is defined as follows. Let <math>D\in {0,1}^{n\times n}</math> be a matrix such that <math>D_{ij}=1</math> means that <math>x_i</math> and <math>x_j</math> are neighbors. Then, <math>B_i={x_j:D_{ij}=1}</math>. Note that these groups must be learned as well. Zhuang et al. solve this problem by an alternating minimization method for <math>K</math> and the groups <math>B_i</math>. For more information, see Zhuang et al.<ref>J. Zhuang, J. Wang, S.C.H. Hoi & X. Lan. [http://jmlr.csail.mit.edu/proceedings/papers/v20/zhuang11/zhuang11.pdf Unsupervised Multiple Kernel Learning]. Jour. Mach. Learn. Res. 20:129–144, 2011</ref>
 
==MKL Libraries==
Available MKL libraries include
* [httphttps://www.cs.cornell.edu/~ashesh/pubs/code/SPG-GMKL/download.html SPG-GMKL]: A scalable C++ MKL SVM library that can handle a million kernels.<ref>Ashesh Jain, S. V. N. Vishwanathan and Manik Varma. SPG-GMKL: Generalized multiple kernel learning with a million kernels. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, China, August 2012</ref>
* [http://research.microsoft.com/en-us/um/people/manik/code/GMKL/download.html GMKL]: Generalized Multiple Kernel Learning code in [[MATLAB]], does <math>\ell_1</math> and <math>\ell_2</math> regularization for supervised learning.<ref>M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In Proceedings of the International Conference on Machine Learning, Montreal, Canada, June 2009</ref>
* [https://archive.today/20141208195618/http://appsrv.cse.cuhk.edu.hk/~hqyang/doku.php?id=gmkl (Another) GMKL]: A different MATLAB MKL code that can also perform elastic net regularization<ref>Yang, H., Xu, Z., Ye, J., King, I., & Lyu, M. R. (2011). Efficient Sparse Generalized Multiple Kernel Learning. IEEE Transactions on Neural Networks, 22(3), 433-446</ref>
* [http://research.microsoft.com/en-us/um/people/manik/code/smo-mkl/download.html SMO-MKL]: C++ source code for a Sequential Minimal Optimization MKL algorithm. Does <math>p</math>-n orm regularization.<ref>S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt and M. Varma. Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems, Vancouver, B. C., Canada, December 2010.</ref>
* [http://asi.insa-rouen.fr/enseignants/~arakoto/code/mklindex.html SimpleMKL]: A MATLAB code based on the SimpleMKL algorithm for MKL SVM.<ref>Alain Rakotomamonjy, Francis Bach, Stephane Canu, Yves Grandvalet. SimpleMKL. Journal of Machine Learning Research, Microtome Publishing, 2008, 9, pp.2491-2521.</ref>
* [https://pypi.python.org/pypi/MKLpy MKLPy]: A Python framework for MKL and kernel machines scikit-compliant with different algorithms, e.g. EasyMKL<ref>Fabio Aiolli, Michele Donini. [https://www.sciencedirect.com/science/article/pii/S0925231215003653 EasyMKL: a scalable multiple kernel learning algorithm]. Neurocomputing, 169, pp.215-224.</ref> and others.
 
==References==