Kernel embedding of distributions: Difference between revisions

Content deleted Content added
Line 279:
=== Domain adaptation under covariate, target, and conditional shift ===
The goal of [[___domain adaptation]] is the formulation of learning algorithms which generalize well when the training and test data have different distributions. Given training examples <math>\{(x_i^{tr}, y_i^{tr})\}_{i=1}^n</math> and a test set <math>\{(x_j^{te}, y_j^{te}) \}_{j=1}^m </math> where the <math>y_j^{te}</math> are unknown, three types of differences are commonly assumed between the distribution of the training examples <math>P^{tr}(X,Y)</math> and the test distribution <math> P^{te}(X,Y)</math>:<ref name = "DA">K. Zhang, B. Schölkopf, K. Muandet, Z. Wang. (2013). [http://jmlr.org/proceedings/papers/v28/zhang13d.pdf Domain adaptation under target and conditional shift]. ''Journal of Machine Learning Research, '''28'''(3): 819–827.</ref><ref name = "CovS">A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, B. Schölkopf. (2008). Covariate shift and local learning by distribution matching. ''In J. Quinonero-Candela, M. Sugiyama, A. Schwaighofer, N. Lawrence (eds.). Dataset shift in machine learning'', MIT Press, Cambridge, MA: 131–160.</ref>
# '''Covariate Shiftshift''' in which the marginal distribution of the covariates changes across domains: <math> P^{tr}(X) \neq P^{te}(X)</math>
# '''Target Shiftshift''' in which the marginal distribution of the outputs changes across domains: <math> P^{tr}(Y) \neq P^{te}(Y)</math>
# '''Conditional Shiftshift''' in which <math>P(Y)</math> remains the same across domains, but the conditional distributions differ: <math>P^{tr}(X \mid Y) \neq P^{te}(X \mid Y)</math>. In general, the presence of conditional shift leads to an [[Well-posed problem|ill-posed]] problem, and the additional assumption that <math>P(X \mid Y)</math> changes only under [[Location parameter|___location]]-[[Scale parameter|scale]] (LS) transformations on <math> X </math> is commonly imposed to make the problem tractable.
 
By utilizing the kernel embedding of marginal and conditional distributions, practical approaches to deal with the presence of these types of differences between training and test domains can be formulated. Covariate shift may be accounted for by reweighting examples via estimates of the ratio <math>P^{te}(X)/P^{tr}(X)</math> obtained directly from the kernel embeddings of the marginal distributions of <math>X</math> in each ___domain without any need for explicit estimation of the distributions.<ref name = "CovS"/> Target shift, which cannot be similarly dealt with since no samples from <math>Y</math> are available in the test ___domain, is accounted for by weighting training examples using the vector <math>\boldsymbol{\beta}^*(\mathbf{y}^{tr}) </math> which solves the following optimization problem (where in practice, empirical approximations must be used) <ref name = "DA"/>