Multiple kernel learning: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:55, 3 November 2017 edit KolbertBot (talk \| contribs) Bots 1,166,042 edits m Bot: HTTP→HTTPS (v475) ← Previous edit		Latest revision as of 15:55, 19 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →Libraries: HTTP to HTTPS for Cornell University Tag: AWB
(12 intermediate revisions by 10 users not shown)
Line 1: {{short description\|Set of machine learning methods}} {{Machine learning bar}} Line 30 ⟶ 31: Other approaches use a definition of kernel similarity, such as :<math>A(K_1,K_2)=\frac{<\langle K_1,K_2> \rangle}{\sqrt{<\langle K_1,K_1>< \rangle\langle K_2,K_2>\rangle}}</math> Using this measure, Qui and Lane (2009)<ref>Shibin Qiu and Terran Lane. A framework for multiple kernel support vector regression and its Line 74 ⟶ 75: :<math>f(x)=\sum_{i=1}^N\sum_{m=1}^P\alpha_i^mK_m(x_i^m,x^m)+b</math> The parameters <math>\alpha_i^m</math> and <math>b</math> are learned by [[gradient descent]] on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that to the combined kernel. The model is then rerun to generate the optimal weights <math>\alpha_i</math> and <math>b</math>. ===Semisupervised learning=== Line 87 ⟶ 88: where <math>L</math> is the loss function (weighted negative log-likelihood in this case), <math>R</math> is the regularization parameter ([[Proximal gradient methods for learning#Exploiting group structure\|Group LASSO]] in this case), and <math>\Theta</math> is the conditional expectation consensus (CEC) penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be :<math>g^{\pi}_m(x)=<\langle\phi^{\pi}_m,\psi_m(x)>\rangle</math> where <math>\psi_m(x)=[K_m(x_1,x),\ldots,K_m(x_L,x)]^T</math> (the kernel distance between the labeled data and all of the labeled and unlabeled data) and <math>\phi^{\pi}_m</math> is a non-negative random vector with a 2-norm of 1. The value of <math>\Pi</math> is the number of times each kernel is projected. Expectation regularization is then performed on the MKD, resulting in a reference expectation <math>q^{pi}_m(y\|g^{\pi}_m(x))</math> and model expectation <math>p^{\pi}_m(f(x)\|g^{\pi}_m(x))</math>. Then, we define :<math>\Theta=\frac{1}{\Pi} \sum^{\Pi}_{\pi=1}\sum^{M}_{m=1} D(q^{pi}_m(y\|g^{\pi}_m(x))\|\|p^{\pi}_m(f(x)\|g^{\pi}_m(x)))</math> where <math>D(Q\|\|P)=\sum_iQ(i)\ln\frac{Q(i)}{P(i)}</math> is the [[Kullback–Leibler divergence]]. The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.<ref>Wang, Shuhui et al. [~~http~~https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6177671 S3MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications]. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012</ref>▼ ~~where <math>D(Q\|\|P)=\sum_iQ(i)\ln\frac{Q(i)}{P(i)}</math> is the [[Kullback-Leibler divergence]].~~ ▲The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.<ref>Wang, Shuhui et al. [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6177671 S3MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications]. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012</ref> ===Unsupervised learning=== Line 102: where . One formulation of this is defined as follows. Let <math>D\in {0,1}^{n\times n}</math> be a matrix such that <math>D_{ij}=1</math> means that <math>x_i</math> and <math>x_j</math> are neighbors. Then, <math>B_i={x_j:D_{ij}=1}</math>. Note that these groups must be learned as well. Zhuang et al. solve this problem by an alternating minimization method for <math>K</math> and the groups <math>B_i</math>. For more information, see Zhuang et al.<ref>J. Zhuang, J. Wang, S.C.H. Hoi & X. Lan. [http://jmlr.csail.mit.edu/proceedings/papers/v20/zhuang11/zhuang11.pdf Unsupervised Multiple Kernel Learning]. Jour. Mach. Learn. Res. 20:129–144, 2011</ref> ==~~MKL~~ Libraries== Available MKL libraries include * [~~http~~https://www.cs.cornell.edu/~ashesh/pubs/code/SPG-GMKL/download.html SPG-GMKL]: A scalable C++ MKL SVM library that can handle a million kernels.<ref>Ashesh Jain, S. V. N. Vishwanathan and Manik Varma. SPG-GMKL: Generalized multiple kernel learning with a million kernels. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, China, August 2012</ref> * [http://research.microsoft.com/en-us/um/people/manik/code/GMKL/download.html GMKL]: Generalized Multiple Kernel Learning code in [[MATLAB]], does <math>\ell_1</math> and <math>\ell_2</math> regularization for supervised learning.<ref>M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In Proceedings of the International Conference on Machine Learning, Montreal, Canada, June 2009</ref> * [https://archive.today/20141208195618/http://appsrv.cse.cuhk.edu.hk/~hqyang/doku.php?id=gmkl (Another) GMKL]: A different MATLAB MKL code that can also perform elastic net regularization<ref>Yang, H., Xu, Z., Ye, J., King, I., & Lyu, M. R. (2011). Efficient Sparse Generalized Multiple Kernel Learning. IEEE Transactions on Neural Networks, 22(3), 433-446</ref> * [http://research.microsoft.com/en-us/um/people/manik/code/smo-mkl/download.html SMO-MKL]: C++ source code for a Sequential Minimal Optimization MKL algorithm. Does <math>p</math>-n orm regularization.<ref>S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt and M. Varma. Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems, Vancouver, B. C., Canada, December 2010.</ref> * [http://asi.insa-rouen.fr/enseignants/~arakoto/code/mklindex.html SimpleMKL]: A MATLAB code based on the SimpleMKL algorithm for MKL SVM.<ref>Alain Rakotomamonjy, Francis Bach, Stephane Canu, Yves Grandvalet. SimpleMKL. Journal of Machine Learning Research, Microtome Publishing, 2008, 9, pp.2491-2521.</ref>