Distributional data analysis: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: arxiv updated in citation with #oabot.
m v2.05 - Autofix / Fix errors for CW project (Link equal to linktext)
 
(5 intermediate revisions by 4 users not shown)
Line 2:
 
{{Orphan|date=December 2023}}
'''Distributional data analysis''' is a branch of [[nonparametric statistics]] that is related to [[functional data analysis]]. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that although the space of probability distributions is, while a convex space, it is not a [[vector space]].
 
== Notation ==
Line 32:
 
=== Functional principal component analysis ===
[[Functional principal component analysis|Functional]] principal component analysis(FPCA)]] can be directly applied to the probability density functions.<ref>{{Cite journal|last1=Kneip|first1=A.|last2=Utikal|first2=K.J.|date=2001|title=Inference for density families using functional principal component analysis|journal=Journal of the American Statistical Association|volume=96|issue=454|pages=519–532|doi=10.1198/016214501753168235|s2cid=123524014 }}</ref> Consider a distribution process <math>\nu \sim \mathfrak{F}</math> and let <math>f</math> be the density function of <math>\nu</math>. Let the mean density function as <math>\mu(t) = \mathbb{E}\left[f(t)\right]</math> and the covariance function as <math>G(s,t) = \operatorname{Cov}(f(s), f(t))</math> with orthonormal eigenfunctions <math>\{\phi_j\}_{j=1}^\infty</math> and eigenvalues <math>\{\lambda_j\}_{j=1}^\infty</math>.
 
By the Karhunen-Loève theorem, <math>
Line 63:
</math>
Let the reference measure <math>\nu_0</math> be the Wasserstein mean <math>\mu_\oplus</math>.
Then, a ''principal geodesic subspace (PGS)'' of dimension <math>k</math> with respect to <math>\mu_\oplus</math> is a set <math>G_k = \operatorname{argmin}_{G \in \text{CG}_{\nu_\oplus, k}(\mathcal{W}_2)} K_{W_2}(G)</math>.<ref name="gpca1">{{Cite journal|last1=Bigot|first1=J.|last2=Gouet|first2=R.|last3=Klein|first3=T.|last4=López|first4=A.|date=2017|title=Geodesic PCA in the Wasserstein space by convex PCA|journal= Annales de l'institutInstitut Henri Poincare (B)Poincaré, ProbabilityProbabilités andet StatisticsStatistiques|volume=53|issue=1|pages=1–26|doi=10.1214/15-AIHP706|bibcode=2017AnIHP..53....1B |s2cid=49256652 |url=https://hal.archives-ouvertes.fr/hal-01978864/file/AIHP706.pdf }}</ref><ref name="gpca2">{{Cite journal|last1=Cazelles|first1=E.|last2=Seguy|first2=V.|last3=Bigot|first3=J.|last4=Cuturi|first4=M.|last5=Papadakis|first5=N.|date=2018|title=Geodesic PCA versus Log-PCA of histograms in the Wasserstein space|journal=SIAM Journal on Scientific Computing|volume=40|issue=2|pages=B429–B456|doi=10.1137/17M1143459 |bibcode=2018SJSC...40B.429C }}</ref>
 
Note that the tangent space <math>T_{\mu_\oplus}</math> is a subspace of <math>L^2_{\mu_\oplus}</math>, the Hilbert space of <math>{\mu_\oplus}</math>-square-integrable functions. Obtaining the PGS is equivalent to performing PCA in <math>L^2_{\mu_\oplus}</math> under constraints to lie in the convex and closed subset.<ref name="gpca2"/> Therefore, a simple approximation of the Wasserstein Geodesic PCA is the Log FPCA by relaxing the geodesicity constraint, while alternative techniques are suggested.<ref name="gpca1"/><ref name="gpca2"/>
Line 144:
</math>
 
On the other hand, the spherical autoregressive model (SAR) considers the Fisher-Rao metric.<ref>{{Cite journal|last1=Zhu|first1=C.|last2=Müller|first2=H.-G.|date=2023|title=Spherical autoregressive models, with application to distributional and compositional time series|journal=Journal of Econometrics|volume=239 |issue=2 |doi=10.1016/j.jeconom.2022.12.008 |doi-access=free|arxiv=2203.12783}}</ref> Following the settings of [[##Tests for the intrinsic mean]], let <math>x_t \in \mathcal{X}</math> with Fréchet mean <math>\mu_x</math>. Let <math>\theta = \arccos(\langle x_t, \mu_x \rangle )</math>, which is the geodesic distance between <math>x_t</math> and <math>\mu_x</math>. Define a rotation operator <math>Q_{x_t, \mu_x}</math> that rotates <math>x_t</math> to <math>\mu_x</math>. The spherical difference between <math>x_t</math> and <math>\mu_x</math> is represented as <math>R_t = x_t \ominus \mu_x = \theta Q_{x_t, \mu_x}</math>. Assume that <math>R_t</math> is a stationary sequence with the Fréchet mean <math>\mu_R</math>, then <math>SAR(1)</math> is defined as
<math display="block">
R_t - \mu_R = \beta (R_{t-1} - \mu_R) + \epsilon_t,