Content deleted Content added
m WP:CHECKWIKI error fixes, added orphan tag using AWB (10790) |
m Typo fixing, replaced: the the → to the, typo(s) fixed: et al → et al. (4) using AWB |
||
Line 24:
====Heuristic approaches====
These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix. Examples of these include the kernel from Tenabe et al. (2008).<ref>Hiroaki Tanabe, Tu Bao Ho, Canh Hao Nguyen, and Saori Kawasaki. Simple but effective methods
for combining kernels in computational biology. In Proceedings of IEEE International Conference
on Research, Innovation and Vision for the Future, 2008.</ref> Letting <math>\pi_m</math> be the accuracy obtained using only <math>K_m</math>, and letting <math>\delta</math> be a threshold less than the minimum of the single-kernel accuracies, we can define
Line 48:
where <math>K'_{tra}</math> is the kernel of the training set.
[[Structural risk minimization]] approaches that have been used include linear approaches, such as that used by Lanckriet et al. (2002).<ref>Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan.
Learning the kernel matrix with semidefinite programming. In Proceedings of the 19th International
Conference on Machine Learning, 2002</ref> We can define the implausibility of a kernel <math>\omega(K)</math> to be the value of the objective function after solving a canonical SVM problem. We can then solve the following minimization problem:
Line 70:
====Boosting approaches====
Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached. An example of this is the MARK model developed by Bennett et al. (2002) <ref>Kristin P. Bennett, Michinari Momma, and Mark J. Embrechts. MARK: A boosting algorithm for
heterogeneous kernel models. In Proceedings of the 8th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, 2002</ref>
Line 76:
:<math>f(x)=\sum_{i=1}^N\sum_{m=1}^P\alpha_i^mK_m(x_i^m,x^m)+b</math>
The parameters <math>\alpha_i^m</math> and <math>b</math> are learned by gradient descent on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that
===Semisupervised learning===
Line 102:
:<math>\min_{\beta,B}\sum^n_{i=1}\left\Vert x_i - \sum_{x_j\in B_i} K(x_i,x_j)x_j\right\Vert^2 + \gamma_1\sum_{i=1}^n\sum_{x_j\in B_i}K(x_i,x_j)\left\Vert x_i - x_j \right\Vert^2 + \gamma_2\sum_i |B_i|</math>
where . One formulation of this is defined as follows. Let <math>D\in {0,1}^{n\times n}</math> be a matrix such that <math>D_{ij}=1</math> means that <math>x_i</math> and <math>x_j</math> are neighbors. Then, <math>B_i={x_j:D_{ij}=1}</math>. Note that these groups must be learned as well. Zhuang et al. solve this problem by an alternating minimization method for <math>K</math> and the groups <math>B_i</math>. For more information, see Zhuang et al.<ref>J. Zhuang, J. Wang, S.C.H. Hoi & X. Lan. [http://jmlr.csail.mit.edu/proceedings/papers/v20/zhuang11/zhuang11.pdf Unsupervised Multiple Kernel Learning]. Jour. Mach. Learn. Res. 20:129–144, 2011</ref>
==MKL Libraries==
|