Revision as of 08:27, 23 January 2015 edit Yobot (talk \| contribs) Bots 4,733,870 edits m WP:CHECKWIKI error fixes, added orphan tag using AWB (10790) ← Previous edit		Revision as of 16:41, 12 February 2015 edit undo John of Reading (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 787,571 edits m Typo fixing, replaced: the the → to the, typo(s) fixed: et al → et al. (4) using AWB Next edit →
Line 24: ====Heuristic approaches==== These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix. Examples of these include the kernel from Tenabe et al. (2008).<ref>Hiroaki Tanabe, Tu Bao Ho, Canh Hao Nguyen, and Saori Kawasaki. Simple but effective methods for combining kernels in computational biology. In Proceedings of IEEE International Conference on Research, Innovation and Vision for the Future, 2008.</ref> Letting <math>\pi_m</math> be the accuracy obtained using only <math>K_m</math>, and letting <math>\delta</math> be a threshold less than the minimum of the single-kernel accuracies, we can define Line 48: where <math>K'_{tra}</math> is the kernel of the training set. [[Structural risk minimization]] approaches that have been used include linear approaches, such as that used by Lanckriet et al. (2002).<ref>Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. Learning the kernel matrix with semidefinite programming. In Proceedings of the 19th International Conference on Machine Learning, 2002</ref> We can define the implausibility of a kernel <math>\omega(K)</math> to be the value of the objective function after solving a canonical SVM problem. We can then solve the following minimization problem: Line 70: ====Boosting approaches==== Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached. An example of this is the MARK model developed by Bennett et al. (2002) <ref>Kristin P. Bennett, Michinari Momma, and Mark J. Embrechts. MARK: A boosting algorithm for heterogeneous kernel models. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002</ref> Line 76: :<math>f(x)=\sum_{i=1}^N\sum_{m=1}^P\alpha_i^mK_m(x_i^m,x^m)+b</math> The parameters <math>\alpha_i^m</math> and <math>b</math> are learned by gradient descent on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that ~~the~~to the combined kernel. The model is then rerun to generate the optimal weights <math>\alpha_i</math> and <math>b</math>. ===Semisupervised learning=== Line 102: :<math>\min_{\beta,B}\sum^n_{i=1}\left\Vert x_i - \sum_{x_j\in B_i} K(x_i,x_j)x_j\right\Vert^2 + \gamma_1\sum_{i=1}^n\sum_{x_j\in B_i}K(x_i,x_j)\left\Vert x_i - x_j \right\Vert^2 + \gamma_2\sum_i \|B_i\|</math> where . One formulation of this is defined as follows. Let <math>D\in {0,1}^{n\times n}</math> be a matrix such that <math>D_{ij}=1</math> means that <math>x_i</math> and <math>x_j</math> are neighbors. Then, <math>B_i={x_j:D_{ij}=1}</math>. Note that these groups must be learned as well. Zhuang et al. solve this problem by an alternating minimization method for <math>K</math> and the groups <math>B_i</math>. For more information, see Zhuang et al.<ref>J. Zhuang, J. Wang, S.C.H. Hoi & X. Lan. [http://jmlr.csail.mit.edu/proceedings/papers/v20/zhuang11/zhuang11.pdf Unsupervised Multiple Kernel Learning]. Jour. Mach. Learn. Res. 20:129–144, 2011</ref> ==MKL Libraries==

Multiple kernel learning: Difference between revisions