Multiple kernel learning

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.

Background

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources (e.g. sound and images from a video) that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.

Multiple kernel learning approaches been used to

Algorithms

Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels. The basic idea behind multiple kernel learning algorithms is as follows: we begin with a set of $n$ kernels $K$ . In the linear case, we introduce a new kernel $K'=\sum _{i=1}^{n}\beta _{i}K_{i}$ , where $\beta _{i}$ is a vector of coefficients for each kernel. For a set of data $X$ with labels $Y$ , the minimization problem can then be written as

\min _{\beta ,c}\mathrm {E} (Y,K'c)+R(K'c)

where $\mathrm {E}$ is an error function and $R$ is a regularization term. $\mathrm {E}$ is typically the square loss function (Tikhonov regularization) or the hinge loss function (for SVM algorithms), and $R$ is usually an $\ell _{n}$ norm or some combination of the norms (i.e. elastic net regularization).

Supervised learning

For supervised learning, there are many other algorithms that use different methods to learn the form of the kernel. The following categorization has been proposed by Gonen and Alpaydın (2011) ^[1]

Fixed rules approaches, such as the linear combination algorithm described above. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm.

Heuristic approaches. These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix.

Optimization approaches. These approaches solve an optimization problem to determine parameters for the kernel combination function.

Bayesian approaches put priors on the kernel parameters and learn the parameter values from the priors and the base algorithm.

Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached.

For more information on these methods, see Gonen and Alpaydın (2011) ^[1]

Semisupervised learning

Semisupervised learning approaches to multiple kernel learning are similar to other extensions of

Unsupervised learning

Unsupervised multiple kernel learning algorithms have also been advanced. These algorithms use properties of the geometry to pick kernels that are capable of

MKL Libraries

Available MKL libraries include

GMKL: Generalized Multiple Kernel Learning code in MATLAB, does $\ell _{1}$ and $\ell _{2}$ regularization for supervised learning.

(Another) GMKL: A different MATLAB MKL code that can also perform elastic net regularization

SMO-MKL: C++ source code for a Sequential Minimal Optimization MKL algorithm. Does $p$ -n orm regularization.

References

^ ^a ^b http://www.jmlr.org/papers/volume12/gonen11a/gonen11a.pdf

[review-1] ttp://www.jmlr.org/papers/volume12/gonen11a/gonen11a.pdf

[1]