Multiple kernel learning: Difference between revisions

Content deleted Content added
Tamhok (talk | contribs)
No edit summary
Tamhok (talk | contribs)
No edit summary
Line 4:
Multiple kernel learning refers to a set of machine learning methods that use a predefined set of [[Kernel method|kernels]] and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources (e.g. sound and images from a video) that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.
 
==Algorithms==
Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels. The basic idea behind multiple kernel learning algorithms is as follows: we begin with a set of <math>n</math> kernels <math>K</math>. In the linear case, we introduce a new kernel <math>K'=\sum_{i=1}^n\beta_iK_i</math>, where <math>\beta_i</math> is a vector of coefficients for each kernel. For a set of data <math>X</math> with labels <math>Y</math>, the minimization problem can then be written as
:<math>\min_{\beta,c}\Epsilon(Y, K'c)+R(K'c)</math>
 
where <math>\Epsilon</math> is an error function and <math>R</math> is a regularization term. Typically, <math>\Epsilon</math> is typically the square loss function (Tikhonov regularization) or the hinge loss function (for [[Support vector machine|SVM]] algorithms), and <math>R</math> is usually an <math>\ell_n</math> norm or some combination of the norms (i.e. [[elastic net regularization]]).