Revision as of 04:42, 14 August 2022 edit Wozal (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 31,619 edits m Hyperlink addition Tag: Visual edit ← Previous edit		Revision as of 10:03, 14 June 2023 edit undo 46.149.103.3 (talk) →Separable kernels Next edit →
Line 39: The form of the kernel {{math\|Γ}} induces both the representation of the [[feature space]] and structures the output across tasks. A natural simplification is to choose a ''separable kernel,'' which factors into separate kernels on the input space {{mathcal\|X}} and on the tasks <math> \{1,...,T\} </math>. In this case the kernel relating scalar components <math> f_t </math> and <math> f_s </math> is given by <math display="inline"> \gamma((x_i,t),(x_j,s )) = k(x_i,x_j)k_T(s,t)=k(x_i,x_j)A_{s,t} </math>. For vector valued functions <math> f\in \mathcal H </math> we can write <math>\Gamma(x_i,x_j)=k(x_i,x_j)A</math>, where {{mvar\|k}} is a scalar reproducing kernel, and {{mvar\|A}} is a symmetric positive semi-definite <math>T\times T</math> matrix. Henceforth denote <math> S_+^T=\{\text{PSD matrices} \} \subset \mathbb R^{T \times T} </math> . This factorization property, separability, implies the input feature space representation does not vary by task. That is, there is no interaction between the input kernel and the task kernel. The structure on tasks is represented solely by {{mvar\|A}}. Methods for non-separable kernels {{math\|Γ}} is ana current field of research. For the separable case, the representation theorem is reduced to <math display="inline">f(x)=\sum _{i=1} ^N k(x,x_i)Ac_i</math>. The model output on the training data is then {{mvar\|KCA}} , where {{mvar\|K}} is the <math>n \times n</math> empirical kernel matrix with entries <math display="inline">K_{i,j}=k(x_i,x_j)</math>, and {{mvar\|C}} is the <math>n \times T</math> matrix of rows <math>c_i</math>.

Multi-task learning: Difference between revisions