Content deleted Content added
KolbertBot (talk | contribs) m Bot: HTTP→HTTPS (v475) |
m →Libraries: HTTP to HTTPS for Cornell University |
||
(12 intermediate revisions by 10 users not shown) | |||
Line 1:
{{short description|Set of machine learning methods}}
{{Machine learning bar}}
Line 30 ⟶ 31:
Other approaches use a definition of kernel similarity, such as
:<math>A(K_1,K_2)=\frac{
Using this measure, Qui and Lane (2009)<ref>Shibin Qiu and Terran Lane. A framework for multiple kernel support vector regression and its
Line 74 ⟶ 75:
:<math>f(x)=\sum_{i=1}^N\sum_{m=1}^P\alpha_i^mK_m(x_i^m,x^m)+b</math>
The parameters <math>\alpha_i^m</math> and <math>b</math> are learned by [[gradient descent]] on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that to the combined kernel. The model is then rerun to generate the optimal weights <math>\alpha_i</math> and <math>b</math>.
===Semisupervised learning===
Line 87 ⟶ 88:
where <math>L</math> is the loss function (weighted negative log-likelihood in this case), <math>R</math> is the regularization parameter ([[Proximal gradient methods for learning#Exploiting group structure|Group LASSO]] in this case), and <math>\Theta</math> is the conditional expectation consensus (CEC) penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be
:<math>g^{\pi}_m(x)=
where <math>\psi_m(x)=[K_m(x_1,x),\ldots,K_m(x_L,x)]^T</math> (the kernel distance between the labeled data and all of the labeled and unlabeled data) and <math>\phi^{\pi}_m</math> is a non-negative random vector with a 2-norm of 1. The value of <math>\Pi</math> is the number of times each kernel is projected. Expectation regularization is then performed on the MKD, resulting in a reference expectation <math>q^{pi}_m(y|g^{\pi}_m(x))</math> and model expectation <math>p^{\pi}_m(f(x)|g^{\pi}_m(x))</math>. Then, we define
:<math>\Theta=\frac{1}{\Pi} \sum^{\Pi}_{\pi=1}\sum^{M}_{m=1} D(q^{pi}_m(y|g^{\pi}_m(x))||p^{\pi}_m(f(x)|g^{\pi}_m(x)))</math>
where <math>D(Q||P)=\sum_iQ(i)\ln\frac{Q(i)}{P(i)}</math> is the [[Kullback–Leibler divergence]]. The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.<ref>Wang, Shuhui et al. [
▲The combined minimization problem is optimized using a modified block gradient descent algorithm. For more information, see Wang et al.<ref>Wang, Shuhui et al. [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6177671 S3MKL: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications]. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012</ref>
===Unsupervised learning===
Line 102:
where . One formulation of this is defined as follows. Let <math>D\in {0,1}^{n\times n}</math> be a matrix such that <math>D_{ij}=1</math> means that <math>x_i</math> and <math>x_j</math> are neighbors. Then, <math>B_i={x_j:D_{ij}=1}</math>. Note that these groups must be learned as well. Zhuang et al. solve this problem by an alternating minimization method for <math>K</math> and the groups <math>B_i</math>. For more information, see Zhuang et al.<ref>J. Zhuang, J. Wang, S.C.H. Hoi & X. Lan. [http://jmlr.csail.mit.edu/proceedings/papers/v20/zhuang11/zhuang11.pdf Unsupervised Multiple Kernel Learning]. Jour. Mach. Learn. Res. 20:129–144, 2011</ref>
==
Available MKL libraries include
* [
* [http://research.microsoft.com/en-us/um/people/manik/code/GMKL/download.html GMKL]: Generalized Multiple Kernel Learning code in [[MATLAB]], does <math>\ell_1</math> and <math>\ell_2</math> regularization for supervised learning.<ref>M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In Proceedings of the International Conference on Machine Learning, Montreal, Canada, June 2009</ref>
* [https://archive.today/20141208195618/http://appsrv.cse.cuhk.edu.hk/~hqyang/doku.php?id=gmkl (Another) GMKL]: A different MATLAB MKL code that can also perform elastic net regularization<ref>Yang, H., Xu, Z., Ye, J., King, I., & Lyu, M. R. (2011). Efficient Sparse Generalized Multiple Kernel Learning. IEEE Transactions on Neural Networks, 22(3), 433-446</ref>
* [http://research.microsoft.com/en-us/um/people/manik/code/smo-mkl/download.html SMO-MKL]: C++ source code for a Sequential Minimal Optimization MKL algorithm. Does <math>p</math>-n orm regularization.<ref>S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt and M. Varma. Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems, Vancouver, B. C., Canada, December 2010.</ref>
* [http://asi.insa-rouen.fr/enseignants/~arakoto/code/mklindex.html SimpleMKL]: A MATLAB code based on the SimpleMKL algorithm for MKL SVM.<ref>Alain Rakotomamonjy, Francis Bach, Stephane Canu, Yves Grandvalet. SimpleMKL. Journal of Machine Learning Research, Microtome Publishing, 2008, 9, pp.2491-2521.</ref>
|