Distributional data analysis: Difference between revisions

Content deleted Content added
Lazydry (talk | contribs)
m v2.05 - Autofix / Fix errors for CW project (Link equal to linktext)
 
(9 intermediate revisions by 6 users not shown)
Line 2:
 
{{Orphan|date=December 2023}}
'''Distributional data analysis''' is a branch of [[nonparametric statistics]] that is related to [[functional data analysis]]. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that although the space of probability distributions is, while a convex space, it is not a [[vector space]].
 
== Notation ==
Let <math>\nu</math> be a probability measure on <math>D</math>, where <math>D \subset \R^p</math> with <math>p \ge 1</math>. The probability measure <math>\nu</math> can be equivalently characterized as [[cumulative distribution function]] <math>F</math> or [[probability density function]] <math>f</math> if it exists. For univariate distributions with <math>p = 1</math>, [[quantile function]] <math>Q=F^{-1}</math> can also be used.
 
Let <math>\mathcal{F}</math> be a space of distributions <math>\nu</math> and let <math>d</math> be a metric on <math>\mathcal{F}</math> so that <math>(\mathcal{F}, d)</math> forms a [[metric space]]. There are various metrics available for <math>d</math>.<ref>{{Cite book|last1=Deza|first1=M.M.|last2=Deza|first2=E.|title=Encyclopedia of distances|publisher=Springer|year=2013}}</ref>
For example, suppose <math>\nu_1, \; \nu_2 \in \mathcal{F}</math>, and let <math>f_1</math> and <math>f_2</math> be the density functions of <math>\nu_1</math> and <math>\nu_2</math>, respectively. The Fisher-Rao metric is defined as
<math display="block"> d_{FR}(f_1, f_2) = \arccos \left( \int_D \sqrt{f_1(x) f_2(x)} dx \right). </math>.
 
For univariate distributions, let <math>Q_1</math> and <math>Q_2</math> be the quantile functions of <math>\nu_1</math> and <math>\nu_2</math>. Denote the <math>L^p</math>-Wasserstein space as <math>\mathcal{W}_p</math>, which is the space of distributions with finite <math>p</math>-th moments. Then, for <math>\nu_1, \; \nu_2 \in \mathcal{W}_p</math>, the <math>L^p</math>-[[Wasserstein metric]] is defined as
<math display="block"> d_{W_p}(\nu_1, \nu_2) = \left( \int_0^1 [Q_1(s) - Q_2(s)]^p ds \right)^{1/p}. </math>.
 
== Mean and variance ==
 
For a probability measure <math>\nu \in \mathcal{F}</math>, consider a [[stochastic process|random process]] <math>\mathfrak{F}</math> such that <math>\nu \sim \mathfrak{F}</math>. One way to define mean and variance of <math>\nu</math> is to introduce the [[Fréchet mean]] and the Fréchet variance. With respect to the metric <math>d</math> on <math>\mathcal{F}</math>, the ''Fréchet mean'' <math>\mu_\oplus</math>, also known as the [[barycenter]], and the ''Fréchet variance'' <math>V_\oplus</math> are defined as<ref>{{Cite journal|last1=Fréchet|first1=M.|date=1948|title=Les éléments aléatoires de nature quelconque dans un espace distancié|journal=Annales de l'Institut Henri Poincaré|volume=10|issue=4|pages=215–310}}</ref>
Line 27 ⟶ 28:
\end{align}</math>
 
== Modes of variation ==
 
[[Modes of variation]] are useful concepts in depicting the variation of data around the mean function. Based on the [[Kosambi-Karhunen-Loève theorem|Karhunen-Loève representation]], modes of variation show the contribution of each [[eigenfunction]] to the mean.
 
=== Functional principal component analysis ===
[[Functional principal component analysis|Functional]] principal component analysis(FPCA)]] can be directly applied to the probability density functions.<ref>{{Cite journal|last1=Kneip|first1=A.|last2=Utikal|first2=K.J.|date=2001|title=Inference for density families using functional principal component analysis|journal=Journal of the American Statistical Association|volume=96|issue=454|pages=519–532|doi=10.1198/016214501753168235|s2cid=123524014 }}</ref> Consider a distribution process <math>\nu \sim \mathfrak{F}</math> and let <math>f</math> be the density function of <math>\nu</math>. Let the mean density function as <math>\mu(t) = \mathbb{E}\left[f(t)\right]</math> and the covariance function as <math>G(s,t) = \operatorname{Cov}(f(s), f(t))</math> with orthonormal eigenfunctions <math>\{\phi_j\}_{j=1}^\infty</math> and eigenvalues <math>\{\lambda_j\}_{j=1}^\infty</math>.
 
[[Functional principal component analysis|Functional principal component analysis(FPCA)]] can be directly applied to the probability density functions.<ref>{{Cite journal|last1=Kneip|first1=A.|last2=Utikal|first2=K.J.|date=2001|title=Inference for density families using functional principal component analysis|journal=Journal of the American Statistical Association|volume=96|issue=454|pages=519–532|doi=10.1198/016214501753168235|s2cid=123524014 }}</ref> Consider a distribution process <math>\nu \sim \mathfrak{F}</math> and let <math>f</math> be the density function of <math>\nu</math>. Let the mean density function as <math>\mu(t) = \mathbb{E}\left[f(t)\right]</math> and the covariance function as <math>G(s,t) = \operatorname{Cov}(f(s), f(t))</math> with orthonormal eigenfunctions <math>\{\phi_j\}_{j=1}^\infty</math> and eigenvalues <math>\{\lambda_j\}_{j=1}^\infty</math>.
 
By the Karhunen-Loève theorem, <math>
Line 42 ⟶ 41:
with some constant <math>A</math>, such as 2 or 3.
 
=== Transformation FPCA ===
Assume the probability density functions <math>f</math> exist, and let <math>\mathcal{F}_f</math> be the space of density functions.
Transformation approaches introduce a continuous and invertible transformation <math>\Psi: \mathcal{F}_f \to \mathbb{H}</math>, where <math>\mathbb{H}</math> is a [[Hilbert space]] of functions. For instance, the log quantile density transformation or the centered log ratio transformation are popular choices.<ref>{{Cite journal|last1=Petersen|first1=A.|last2=Müller|first2=H.-G.|date=2016|title=Functional data analysis for density functions by transformation to a Hilbert space|journal=Annals of Statistics|volume=44|issue=1|pages=183–218|doi=10.1214/15-AOS1363|doi-access=free|arxiv=1601.02869}}</ref><ref>{{Cite journal|last1=van den Boogaart|first1=K.G.|last2=Egozcue|first2=J.J.|last3=Pawlowsky-Glahn|first3=V.|date=2014|title=Bayes Hilbert spaces|journal=Australian and New Zealand Journal of Statistics|volume=56|issue=2|pages=171–194|doi=10.1111/anzs.12074|s2cid=120612578 }}</ref>
 
For <math>f \in \mathcal{F}_f</math>, let <math>Y = \Psi(f)</math>, the transformed functional variable. The mean function <math>\mu_Y(t) = \mathbb{E}\left[Y(t)\right]</math> and the covariance function <math>G_Y(s,t) = \operatorname{Cov}(Y(s), Y(t))</math> are defined accordingly, and let <math>\{\lambda_j, \phi_j\}_{j=1}^\infty</math> be the eigenpairs of <math>G_Y(s,t)</math>. The Karhunen-Loève decomposition gives
<math>Y(t) = \mu_Y(t) + \sum_{j=1}^\infty \xi_j \phi_j(t)</math>, where <math>\xi_j = \int_D [Y(t) - \mu_Y(t)] \phi_j(t) dt</math>. Then, the <math>j</math>th transformation mode of variation is defined as<ref>{{Cite journal|last1=Petersen|first1=A.|last2=Müller|first2=H.-G.|date=2016|title=Functional data analysis for density functions by transformation to a Hilbert space|journal=Annals of Statistics|volume=44|issue=1|pages=183–218|doi=10.1214/15-AOS1363|doi-access=free|arxiv=1601.02869}}</ref>
<math>
g_{j}^{TF}(t, \alpha) = \Psi^{-1} \left( \mu_Y + \alpha \sqrt{\lambda_j}\phi_j \right)(t), \quad t \in D, \; \alpha \in [-A, A].
</math>
 
=== Log FPCA and Wasserstein Geodesic PCA ===
 
Endowed with metrics such as the Wasserstein metric <math>d_{W_2}</math> or the Fisher-Rao metric <math>d_{FR}</math>, we can employ the (pseudo) Riemannian structure of <math>\mathcal{F}</math>. Denote the [[tangent space]] at the Fréchet mean <math>\mu_\oplus</math> as <math>T_{\mu_\oplus}</math>, and define the logarithm and exponential maps <math>\log_{\mu_\oplus}:\mathcal{F} \to T_{\mu_\oplus}</math> and <math>\exp_{\mu_\oplus}: T_{\mu_\oplus} \to \mathcal{F}</math>.
Let <math>Y</math> be the projected density onto the tangent space, <math>Y = \log_{\mu_\oplus}(f)</math>.
Line 65 ⟶ 63:
</math>
Let the reference measure <math>\nu_0</math> be the Wasserstein mean <math>\mu_\oplus</math>.
Then, a ''principal geodesic subspace (PGS)'' of dimension <math>k</math> with respect to <math>\mu_\oplus</math> is a set <math>G_k = \operatorname{argmin}_{G \in \text{CG}_{\nu_\oplus, k}(\mathcal{W}_2)} K_{W_2}(G)</math>.<ref name="gpca1">{{Cite journal|last1=Bigot|first1=J.|last2=Gouet|first2=R.|last3=Klein|first3=T.|last4=López|first4=A.|date=2017|title=Geodesic PCA in the Wasserstein space by convex PCA|journal= Annales de l'institutInstitut Henri Poincare (B)Poincaré, ProbabilityProbabilités andet StatisticsStatistiques|volume=53|issue=1|pages=1–26|doi=10.1214/15-AIHP706|bibcode=2017AnIHP..53....1B |s2cid=49256652 |url=https://hal.archives-ouvertes.fr/hal-01978864/file/AIHP706.pdf }}</ref><ref name="gpca2">{{Cite journal|last1=Cazelles|first1=E.|last2=Seguy|first2=V.|last3=Bigot|first3=J.|last4=Cuturi|first4=M.|last5=Papadakis|first5=N.|date=2018|title=Geodesic PCA versus Log-PCA of histograms in the Wasserstein space|journal=SIAM Journal on Scientific Computing|volume=40|issue=2|pages=B429–B456|doi=10.1137/17M1143459 |bibcode=2018SJSC...40B.429C }}</ref>
 
Note that the tangent space <math>T_{\mu_\oplus}</math> is a subspace of <math>L^2_{\mu_\oplus}</math>, the Hilbert space of <math>{\mu_\oplus}</math>-square-integrable functions. Obtaining the PGS is equivalent to performing PCA in <math>L^2_{\mu_\oplus}</math> under constraints to lie in the convex and closed subset.<ref name="gpca2"/> Therefore, a simple approximation of the Wasserstein Geodesic PCA is the Log FPCA by relaxing the geodesicity constraint, while alternative techniques are suggested.<ref name="gpca1"/><ref name="gpca2"/>
 
== Distributional regression ==
=== Fréchet regression ===
 
Fréchet regression is a generalization of regression with responses taking values in a metric space and Euclidean predictors.<ref name="freg">{{Cite journal|last1=Petersen|first1=A.|last2=Müller|first2=H.-G.|date=2019|title=Fréchet regression for random objects with Euclidean predictors|journal=Annals of Statistics|volume=47|issue=2|pages=691–719|doi=10.1214/17-AOS1624 |doi-access=free|arxiv=1608.03012}}</ref><ref name="review">{{Cite journal|last1=Petersen|first1=A.|last2=Zhang|first2=C.|last3=Kokoszka|first3=P.|date=2022|title=Modeling probability density functions as data objects|journal=Econometrics and Statistics|volume=21|pages=159–178|doi=10.1016/j.ecosta.2021.04.004 |s2cid=236589040 }}</ref> Using the Wasserstein metric <math>d_{W_2}</math>, Fréchet regression models can be applied to distributional objects. The global Wasserstein-Fréchet regression model is defined as
==Fréchet regression==
Fréchet regression is a generalization of regression with responses taking values in a metric space and Euclidean predictors.<ref name="freg">{{Cite journal|last1=Petersen|first1=A.|last2=Müller|first2=H.-G.|date=2019|title=Fréchet regression for random objects with Euclidean predictors|journal=Annals of Statistics|volume=47|issue=2|pages=691–719|doi=10.1214/17-AOS1624 }}</ref><ref name="review">{{Cite journal|last1=Petersen|first1=A.|last2=Zhang|first2=C.|last3=Kokoszka|first3=P.|date=2022|title=Modeling probability density functions as data objects|journal=Econometrics and Statistics|volume=21|pages=159–178|doi=10.1016/j.ecosta.2021.04.004 |s2cid=236589040 }}</ref> Using the Wasserstein metric <math>d_{W_2}</math>, Fréchet regression models can be applied to distributional objects. The global Wasserstein-Fréchet regression model is defined as
{{NumBlk|::|<math display="block">\begin{align}
m_\oplus (x) &= \operatorname{argmin}_{\omega \in \mathcal{F}} \mathbb{E}\left[ s_G(X,x) d_{W_2}^2(\nu,\omega) \right], \\
Line 86 ⟶ 83:
where <math>\mu_j = \mathbb{E} \left[K_h(X-x)(X-x)^j \right]</math>, <math>j = 0,1,2,</math> and <math>\sigma_0^2 = \mu_0 \mu_2 - \mu_1^2</math>.
 
=== Transformation based approaches ===
Consider the response variable <math>\nu</math> to be probability distributions. With the space of density functions <math>\mathcal{F}_f</math> and a Hilbert space of functions <math>\mathbb{H}</math>, consider continuous and invertible transformations <math>\Psi: \mathcal{F}_f \to \mathbb{H}</math>. Examples of transformations include log hazard transformation, log quantile density transformation, or centered log-ratio transformation. Linear methods such as [[functional regression#Functional linear models (FLMs)|functional linear models]] are applied to the transformed variables. The fitted models are interpreted back in the original density space <math>\mathcal{F}</math> using the inverse transformation.<ref name="review"/>
 
=== Random object approaches ===
In Wasserstein regression, both predictors <math>\omega</math> and responses <math>\nu</math> can be distributional objects. Let <math>\omega{\oplus}</math> and <math>\nu_{\oplus}</math> be the Wasserstein mean of <math>\omega</math> and <math>\nu</math>, respectively. The Wasserstein regression model is defined as
<math display="block">\mathbb{E}(\log_{\nu_{\oplus}} \nu | \log_{\omega{\oplus}} \omega) = \Gamma(\log_{\omega{\oplus}} \omega),</math>
Line 96 ⟶ 93:
\Gamma g(t) = \langle \beta(\cdot, t),g \rangle_{\omega{\oplus}}, \; t \in D, \; g \in T_{\omega{\oplus}}, \; \beta:D^2 \to \R.</math>
Estimation of the regression operator is based on empirical estimators obtained from samples.<ref>{{Cite journal|last1=Chen|first1=Y.|last2=Lin|first2=Z.|last3=Müller|first3=H.-G.|date=2023|title=Wasserstein regression|journal=Journal of the American Statistical Association|volume=118|issue=542|pages=869–882|doi=10.1080/01621459.2021.1956937 |s2cid=219721275 }}</ref>
Also, the Fisher-Rao metric <math>d_{FR}</math> can be used in a similar fashion.<ref name="review"/><ref name="dai2022">{{Cite journal|last1=Dai|first1=X.|date=2022|title=Statistical inference on the Hilbert sphere with application to random densities|journal=Electronic Journal of Statistics|volume=16|issue=1|pages=700–736|doi=10.1214/21-EJS1942 |doi-access=free|arxiv=2101.00527}}</ref>
 
=Hypothesis testing=
 
== Hypothesis testing ==
=== Wasserstein F-test ===
 
Wasserstein <math>F</math>-test has been proposed to test for the effects of the predictors in the Fréchet regression framework with the Wasserstein metric.<ref name="ftest">{{Cite journal|last1=Petersen|first1=A.|last2=Liu|first2=X.|last3=Divani|first3=A.A.|date=2021|title=Wasserstein F-tests and confidence bands for the Fréchet regression of density response curves|journal=Annals of Statistics|volume=49|issue=1|pages=590–611|doi=10.1214/20-AOS1971 |arxiv=1910.13418 |s2cid=204950494 }}</ref> Consider Euclidean predictors <math>X \in \R^p</math> and distributional responses <math>\nu \in \mathcal{W}_2</math>. Denote the Wasserstein mean of <math>\nu</math> as <math>\mu_\oplus^*</math>, and the sample Wasserstein mean as <math>\hat{\mu}_\oplus^*</math>. Consider the global Wasserstein-Fréchet regression model <math>m_\oplus (x)</math> defined in ({{EquationNote|1}}), which is the conditional Wasserstein mean given <math>X=x</math>. The estimator of <math>m_\oplus (x)</math>, <math>\hat{m}_\oplus (x)</math> is obtained by minimizing the empirical version of the criterion.
Line 121 ⟶ 117:
[[Welch-Satterthwaite_equation|Satterthwaite's approximation]] or a [[Bootstrapping_(statistics)|bootstrap]] approach are proposed.<ref name="ftest"/>
 
=== Tests for the intrinsic mean ===
The Hilbert sphere <math>\mathcal{S}^\infty</math> is defined as <math>\mathcal{S}^\infty = \left\{f \in \mathbb{H} : \| f \|_{\mathbb{H}}=1 \right\}</math>, where <math>\mathbb{H}</math> is a separable infinite-dimensional Hilbert space with inner product <math>\langle \cdot, \cdot \rangle_{\mathbb{H}}</math> and norm <math>\| \cdot \|_{\mathbb{H}}</math>. Consider the space of square root densities <math>\mathcal{X} = \left\{ x:D \to \mathbb{R}: x = \sqrt{f}, \int_D f(t)dt = 1 \right\}</math>. Then with the Fisher-Rao metric <math>d_{FR}</math> on <math>f</math>, <math>\mathcal{X}</math> is the positive orthant of the Hilbert sphere <math>\mathcal{S}^\infty</math> with <math>\mathbb{H} = L^2(D)</math>.
 
Line 140 ⟶ 136:
where <math>W_k \overset{iid}{\sim} \chi_1^2</math>. The actual testing procedure can be done by employing the limiting distributions with Monte Carlo simulations, or bootstrap tests are possible. An extension to the two-sample test and paired test are also proposed.<ref name="dai2022"/>
 
== Distributional time series ==
[[Autoregressive model|Autoregressive (AR) models]] for distributional time series are constructed by defining [[Stationary process|stationarity]] and utilizing the notion of difference between distributions using <math>d_{W_2}</math> and <math>d_{FR}</math>.
 
Line 148 ⟶ 144:
</math>
 
On the other hand, the spherical autoregressive model (SAR) considers the Fisher-Rao metric.<ref>{{Cite journal|last1=Zhu|first1=C.|last2=Müller|first2=H.-G.|date=2023|title=Spherical autoregressive models, with application to distributional and compositional time series|journal=Journal of Econometrics|volume=239 |issue=2 |doi=10.1016/j.jeconom.2022.12.008 |doi-access=free|arxiv=2203.12783}}</ref> Following the settings of [[##Tests for the intrinsic mean]], let <math>x_t \in \mathcal{X}</math> with Fréchet mean <math>\mu_x</math>. Let <math>\theta = \arccos(\langle x_t, \mu_x \rangle )</math>, which is the geodesic distance between <math>x_t</math> and <math>\mu_x</math>. Define a rotation operator <math>Q_{x_t, \mu_x}</math> that rotates <math>x_t</math> to <math>\mu_x</math>. The spherical difference between <math>x_t</math> and <math>\mu_x</math> is represented as <math>R_t = x_t \ominus \mu_x = \theta Q_{x_t, \mu_x}</math>. Assume that <math>R_t</math> is a stationary sequence with the Fréchet mean <math>\mu_R</math>, then <math>SAR(1)</math> is defined as
<math display="block">
R_t - \mu_R = \beta (R_{t-1} - \mu_R) + \epsilon_t,
Line 154 ⟶ 150:
where <math>\mu_R = \mathbb{E}R_t</math> and mean zero random i.i.d innovations <math>\epsilon_t</math>. An alternative model, the differenced based spherical autoregressive (DSAR) model is defined with <math>R_t = x_{t+1} \ominus x_t</math>, with natural extensions to order <math>p</math>. A similar extension to the Wasserstein space was introduced.<ref>{{Cite journal|last1=Zhu|first1=C.|last2=Müller|first2=H.-G.|date=2023|title=Autoregressive optimal transport models|journal=Journal of the Royal Statistical Society Series B: Statistical Methodology|volume=85|issue=3|pages=1012–1033|doi=10.1093/jrsssb/qkad051 |pmid=37521164 |pmc=10376456 }}</ref>
 
== References ==
{{reflist}}
 
[[Category:Statistical analysis]]