Multiple kernel learning

This is an old revision of this page, as edited by Tamhok (talk | contribs) at 19:55, 8 December 2014. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.

Background

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources (e.g. sound and images from a video) that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.

Multiple kernel learning approaches been used to

Algorithms

Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels. The basic idea behind multiple kernel learning algorithms is as follows: we begin with a set of   kernels  . In the linear case, we introduce a new kernel  , where   is a vector of coefficients for each kernel. For a set of data   with labels  , the minimization problem can then be written as

 

where   is an error function and   is a regularization term.   is typically the square loss function (Tikhonov regularization) or the hinge loss function (for SVM algorithms), and   is usually an   norm or some combination of the norms (i.e. elastic net regularization).

Supervised learning

For supervised learning, there are many other algorithms that use different methods to learn the form of the kernel. The following categorization has been proposed by Gonen and Alpaydın (2011) [1]

  1. Fixed rules approaches, such as the linear combination algorithm described above. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm.
  1. Heuristic approaches. These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix.
  1. Optimization approaches. These approaches solve an optimization problem to determine parameters for the kernel combination function.
  1. Bayesian approaches put priors on the kernel parameters and learn the parameter values from the priors and the base algorithm.
  1. Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached.

For more information on these methods, see Gonen and Alpaydın (2011) [1]

Semisupervised learning

Semisupervised learning approaches to multiple kernel learning are similar to other extensions of

Unsupervised learning

Unsupervised multiple kernel learning algorithms have also been advanced. These algorithms use properties of the geometry to pick kernels that are capable of

MKL Libraries

Available MKL libraries include

  • GMKL: Generalized Multiple Kernel Learning code in MATLAB, does   and   regularization for supervised learning.
  • (Another) GMKL: A different MATLAB MKL code that can also perform elastic net regularization
  • SMO-MKL: C++ source code for a Sequential Minimal Optimization MKL algorithm. Does  -n orm regularization.

References