Revision as of 10:44, 29 December 2023 edit Cerebellum (talk \| contribs) Autopatrolled, Administrators 24,206 edits m Cerebellum moved page Draft:Distributional data analysis to Distributional data analysis: Publishing accepted Articles for creation submission (AFCH) ← Previous edit		Revision as of 10:44, 29 December 2023 edit undo Cerebellum (talk \| contribs) Autopatrolled, Administrators 24,206 edits Cleaning up accepted Articles for creation submission (AFCH) Next edit →
Line 1: {{Short description\|Distributional data analysis}} ~~{{Draft topics\|mathematics}}~~ ~~{{AfC topic\|stem}}~~ ~~{{AfC submission\|\|\|ts=20231214183918\|u=Lazydry\|ns=118}}~~ ~~{{AfC submission\|t\|\|ts=20231214183715\|u=Lazydry\|ns=118\|demo=}}<!-- Important, do not remove this line before article has been created. -->~~ Distributional data analysis is a branch of [[nonparametric statistics]] that is related to [[functional data analysis]]. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that the space of probability distributions is, while a convex space, is not a [[vector space]]. Line 50 ⟶ 46: For <math>f \in \mathcal{F}_f</math>, let <math>Y = \Psi(f)</math>, the transformed functional variable. The mean function <math>\mu_Y(t) = \mathbb{E}\left[Y(t)\right]</math> and the covariance function <math>G_Y(s,t) = \operatorname{Cov}(Y(s), Y(t))</math> are defined accordingly, and let <math>\{\lambda_j, \phi_j\}_{j=1}^\infty</math> be the eigenpairs of <math>G_Y(s,t)</math>. The Karhunen-Loève decomposition gives <math>Y(t) = \mu_Y(t) + \sum_{j=1}^\infty \xi_j \phi_j(t)</math>, where <math>\xi_j = \int_D [Y(t) - \mu_Y(t)] \phi_j(t) dt</math>. Then, the <math>j</math>th transformation mode of variation is defined as <ref>{{Cite journal\|last1=Petersen\|first1=A.\|last2=Müller\|first2=H.-G.\|date=2016\|title=Functional data analysis for density functions by transformation to a Hilbert space\|journal=Annals of Statistics\|volume=44\|issue=1\|pages=183–218\|doi=10.1214/15-AOS1363}}</ref> <math> g_{j}^{TF}(t, \alpha) = \Psi^{-1} \left( \mu_Y + \alpha \sqrt{\lambda_j}\phi_j \right)(t), \quad t \in D, \; \alpha \in [-A, A]. Line 68 ⟶ 64: </math> Let the reference measure <math>\nu_0</math> be the Wasserstein mean <math>\mu_\oplus</math>. Then, a ''principal geodesic subspace (PGS)'' of dimension <math>k</math> with respect to <math>\mu_\oplus</math> is a set <math>G_k = \operatorname{argmin}_{G \in \text{CG}_{\nu_\oplus, k}(\mathcal{W}_2)} K_{W_2}(G)</math>. <ref name="gpca1">{{Cite journal\|last1=Bigot\|first1=J.\|last2=Gouet\|first2=R.\|last3=Klein\|first3=T.\|last4=López\|first4=A.\|date=2017\|title=Geodesic PCA in the Wasserstein space by convex PCA\|journal=Annales de l'institut Henri Poincare (B) Probability and Statistics\|volume=53\|issue=1\|pages=1–26\|doi=10.1214/15-AIHP706\|bibcode=2017AnIHP..53....1B \|s2cid=49256652 \|url=https://hal.archives-ouvertes.fr/hal-01978864/file/AIHP706.pdf }}</ref><ref name="gpca2">{{Cite journal\|last1=Cazelles\|first1=E.\|last2=Seguy\|first2=V.\|last3=Bigot\|first3=J.\|last4=Cuturi\|first4=M.\|last5=Papadakis\|first5=N.\|date=2018\|title=Geodesic PCA versus Log-PCA of histograms in the Wasserstein space\|journal=SIAM Journal on Scientific Computing\|volume=40\|issue=2\|pages=B429–B456\|doi=10.1137/17M1143459 \|bibcode=2018SJSC...40B.429C }}</ref> Note that the tangent space <math>T_{\mu_\oplus}</math> is a subspace of <math>L^2_{\mu_\oplus}</math>, the Hilbert space of <math>{\mu_\oplus}</math>-square-integrable functions. Obtaining the PGS is equivalent to performing PCA in <math>L^2_{\mu_\oplus}</math> under constraints to lie in the convex and closed subset.<ref name="gpca2"/> Therefore, a simple approximation of the Wasserstein Geodesic PCA is the Log FPCA by relaxing the geodesicity constraint, while alternative techniques are suggested.<ref name="gpca1"/><ref name="gpca2"/> Line 158 ⟶ 154: =References= [[Category:Statistical analysis]] [[Category:Statistical data types]]

Distributional data analysis: Difference between revisions