Distributional data analysis: Difference between revisions

Content deleted Content added
m Cerebellum moved page Draft:Distributional data analysis to Distributional data analysis: Publishing accepted Articles for creation submission (AFCH)
Cleaning up accepted Articles for creation submission (AFCH)
Line 1:
{{Short description|Distributional data analysis}}
{{Draft topics|mathematics}}
{{AfC topic|stem}}
{{AfC submission|||ts=20231214183918|u=Lazydry|ns=118}}
{{AfC submission|t||ts=20231214183715|u=Lazydry|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
 
Distributional data analysis is a branch of [[nonparametric statistics]] that is related to [[functional data analysis]]. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that the space of probability distributions is, while a convex space, is not a [[vector space]].
Line 50 ⟶ 46:
 
For <math>f \in \mathcal{F}_f</math>, let <math>Y = \Psi(f)</math>, the transformed functional variable. The mean function <math>\mu_Y(t) = \mathbb{E}\left[Y(t)\right]</math> and the covariance function <math>G_Y(s,t) = \operatorname{Cov}(Y(s), Y(t))</math> are defined accordingly, and let <math>\{\lambda_j, \phi_j\}_{j=1}^\infty</math> be the eigenpairs of <math>G_Y(s,t)</math>. The Karhunen-Loève decomposition gives
<math>Y(t) = \mu_Y(t) + \sum_{j=1}^\infty \xi_j \phi_j(t)</math>, where <math>\xi_j = \int_D [Y(t) - \mu_Y(t)] \phi_j(t) dt</math>. Then, the <math>j</math>th transformation mode of variation is defined as <ref>{{Cite journal|last1=Petersen|first1=A.|last2=Müller|first2=H.-G.|date=2016|title=Functional data analysis for density functions by transformation to a Hilbert space|journal=Annals of Statistics|volume=44|issue=1|pages=183–218|doi=10.1214/15-AOS1363}}</ref>
<math>
g_{j}^{TF}(t, \alpha) = \Psi^{-1} \left( \mu_Y + \alpha \sqrt{\lambda_j}\phi_j \right)(t), \quad t \in D, \; \alpha \in [-A, A].
Line 68 ⟶ 64:
</math>
Let the reference measure <math>\nu_0</math> be the Wasserstein mean <math>\mu_\oplus</math>.
Then, a ''principal geodesic subspace (PGS)'' of dimension <math>k</math> with respect to <math>\mu_\oplus</math> is a set <math>G_k = \operatorname{argmin}_{G \in \text{CG}_{\nu_\oplus, k}(\mathcal{W}_2)} K_{W_2}(G)</math>. <ref name="gpca1">{{Cite journal|last1=Bigot|first1=J.|last2=Gouet|first2=R.|last3=Klein|first3=T.|last4=López|first4=A.|date=2017|title=Geodesic PCA in the Wasserstein space by convex PCA|journal=Annales de l'institut Henri Poincare (B) Probability and Statistics|volume=53|issue=1|pages=1–26|doi=10.1214/15-AIHP706|bibcode=2017AnIHP..53....1B |s2cid=49256652 |url=https://hal.archives-ouvertes.fr/hal-01978864/file/AIHP706.pdf }}</ref><ref name="gpca2">{{Cite journal|last1=Cazelles|first1=E.|last2=Seguy|first2=V.|last3=Bigot|first3=J.|last4=Cuturi|first4=M.|last5=Papadakis|first5=N.|date=2018|title=Geodesic PCA versus Log-PCA of histograms in the Wasserstein space|journal=SIAM Journal on Scientific Computing|volume=40|issue=2|pages=B429–B456|doi=10.1137/17M1143459 |bibcode=2018SJSC...40B.429C }}</ref>
 
Note that the tangent space <math>T_{\mu_\oplus}</math> is a subspace of <math>L^2_{\mu_\oplus}</math>, the Hilbert space of <math>{\mu_\oplus}</math>-square-integrable functions. Obtaining the PGS is equivalent to performing PCA in <math>L^2_{\mu_\oplus}</math> under constraints to lie in the convex and closed subset.<ref name="gpca2"/> Therefore, a simple approximation of the Wasserstein Geodesic PCA is the Log FPCA by relaxing the geodesicity constraint, while alternative techniques are suggested.<ref name="gpca1"/><ref name="gpca2"/>
Line 158 ⟶ 154:
 
=References=
 
[[Category:Statistical analysis]]
[[Category:Statistical data types]]