Revision as of 07:58, 21 October 2020 edit 161.200.188.177 (talk) →Heuristic approaches ← Previous edit		Revision as of 07:59, 21 October 2020 edit undo 161.200.188.177 (talk) →Semisupervised learning Next edit →
Line 87: where <math>L</math> is the loss function (weighted negative log-likelihood in this case), <math>R</math> is the regularization parameter ([[Proximal gradient methods for learning#Exploiting group structure\|Group LASSO]] in this case), and <math>\Theta</math> is the conditional expectation consensus (CEC) penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be :<math>g^{\pi}_m(x)=<\langle\phi^{\pi}_m,\psi_m(x)>\rangle</math> where <math>\psi_m(x)=[K_m(x_1,x),\ldots,K_m(x_L,x)]^T</math> (the kernel distance between the labeled data and all of the labeled and unlabeled data) and <math>\phi^{\pi}_m</math> is a non-negative random vector with a 2-norm of 1. The value of <math>\Pi</math> is the number of times each kernel is projected. Expectation regularization is then performed on the MKD, resulting in a reference expectation <math>q^{pi}_m(y\|g^{\pi}_m(x))</math> and model expectation <math>p^{\pi}_m(f(x)\|g^{\pi}_m(x))</math>. Then, we define

Multiple kernel learning: Difference between revisions