Multiple kernel learning

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources (e.g. sound and images from a video) that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.

Algorithms

Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels. The basic idea behind multiple kernel learning algorithms is as follows: we begin with a set of $n$ kernels $K$ . In the linear case, we introduce a new kernel $K'=\sum _{i=1}^{n}\beta _{i}K_{i}$ , where $\beta _{i}$ is a vector of coefficients for each kernel. For a set of data $X$ with labels $Y$ , the minimization problem can then be written as

\min _{\beta ,c}\mathrm {E} (Y,K'c)+R(K'c)

where $\mathrm {E}$ is an error function and $R$ is a regularization term. $\mathrm {E}$ is typically the square loss function (Tikhonov regularization) or the hinge loss function (for SVM algorithms), and $R$ is usually an $\ell _{n}$ norm or some combination of the norms (i.e. elastic net regularization).

For supervised learning, there are many other algorithms that use different methods to learn the form of the kernel. The following categorization has been proposed by Gonen and Alpaydın (2011) ^[1] 1. Fixed rules, such as the linear combination algorithm described above. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm. 2.

For more information on these methods, see Gonen and Alpaydın (2011) ^[1]

MKL Libraries

Available MKL libraries include

^ ^a ^b http://www.jmlr.org/papers/volume12/gonen11a/gonen11a.pdf

[review-1] ttp://www.jmlr.org/papers/volume12/gonen11a/gonen11a.pdf

[1]