Revision as of 18:39, 14 December 2023 edit Lazydry (talk \| contribs) 12 edits Submitting using AfC-submit-wizard ← Previous edit		Revision as of 05:11, 16 December 2023 edit undo Citation bot (talk \| contribs) Bots 5,867,708 edits Alter: pages. Add: pmc, arxiv, bibcode, pmid, doi, url, s2cid. Formatted dashes. \| Use this bot. Report bugs. \| Suggested by Eastmain \| Category:AfC pending submissions by age/1 day ago \| #UCB_Category 19/85 Next edit →
Line 18: =Mean and variance= For a probability measure <math>\nu \in \mathcal{F}</math>, consider a [[stochastic process\|random process]] <math>\mathfrak{F}</math> such that <math>\nu \sim \mathfrak{F}</math>. One way to define mean and variance of <math>\nu</math> is to introduce the [[Fréchet mean]] and the Fréchet variance. With respect to the metric <math>d</math> on <math>\mathcal{F}</math>, the ''Fréchet mean'' <math>\mu_\oplus</math>, also known as the [[barycenter]], and the ''Fréchet variance'' <math>V_\oplus</math> are defined as<ref>{{Cite journal\|last1=Fréchet\|first1=M.\|date=1948\|title=Les éléments aléatoires de nature quelconque dans un espace distancié\|journal=Annales de l'Institut Henri Poincaré\|volume=10\|issue=4\|pages=~~215-310~~215–310}}</ref> <math display="block">\begin{align} \mu_\oplus &= \operatorname{argmin}_{\mu \in \mathcal{F}} \mathbb{E}[d^2(\nu, \mu)], \\ Line 24: \end{align}</math> A widely used example is the Wasserstein-Fréchet mean, or simply the ''Wasserstein mean'', which is the Fréchet mean with the <math>L^2</math>-Wasserstein metric <math>d_{W_2}</math>.<ref>{{Cite journal\|last1=Agueh\|first1=A.\|last2=Carlier\|first2=G.\|date=2011\|title=Barycenters in the {Wasserstein} space\|journal=SIAM Journal on Mathematical Analysis\|volume=43\|issue=2\|pages=~~904-924~~904–924\|doi=10.1137/100805741\|s2cid=8592977 \|url=https://hal.archives-ouvertes.fr/hal-00637399/file/AC_bary_revis.pdf }}</ref> For <math>\nu, \; \mu \in \mathcal{W}_2</math>, let <math>Q_\nu, \; Q_\mu</math> be the quantile functions of <math>\nu</math> and <math>\mu</math>, respectively. The Wasserstein mean and Wasserstein variance is defined as <math display="block">\begin{align} \mu_\oplus^* &= \operatorname{argmin}_{\mu \in \mathcal{W}_2} \mathbb{E} \left[ \int_0^1 (Q_\nu (s) - Q_\mu (s))^2 ds \right], \\ Line 36: ==Functional principal component analysis== [[Functional principal component analysis\|Functional principal component analysis(FPCA)]] can be directly applied to the probability density functions.<ref>{{Cite journal\|last1=Kneip\|first1=A.\|last2=Utikal\|first2=K.J.\|date=2001\|title=Inference for density families using functional principal component analysis\|journal=Journal of the American Statistical Association\|volume=96\|issue=454\|pages=~~519-532~~519–532\|doi=10.1198/016214501753168235\|s2cid=123524014 }}</ref> Consider a distribution process <math>\nu \sim \mathfrak{F}</math> and let <math>f</math> be the density function of <math>\nu</math>. Let the mean density function as <math>\mu(t) = \mathbb{E}\left[f(t)\right]</math> and the covariance function as <math>G(s,t) = \operatorname{Cov}(f(s), f(t))</math> with orthonormal eigenfunctions <math>\{\phi_j\}_{j=1}^\infty</math> and eigenvalues <math>\{\lambda_j\}_{j=1}^\infty</math>. By the Karhunen-Loève theorem, <math> Line 47: ==Transformation FPCA== Assume the probability density functions <math>f</math> exist, and let <math>\mathcal{F}_f</math> be the space of density functions. Transformation approaches introduce a continuous and invertible transformation <math>\Psi: \mathcal{F}_f \to \mathbb{H}</math>, where <math>\mathbb{H}</math> is a [[Hilbert space]] of functions. For instance, the log quantile density transformation or the centered log ratio transformation are popular choices.<ref>{{Cite journal\|last1=Petersen\|first1=A.\|last2=Müller\|first2=H.-G.\|date=2016\|title=Functional data analysis for density functions by transformation to a Hilbert space\|journal=Annals of Statistics\|volume=44\|issue=1\|pages=~~183-218~~183–218\|doi=10.1214/15-AOS1363}}</ref><ref>{{Cite journal\|last1=van den Boogaart\|first1=K.G.\|last2=Egozcue\|first2=J.J.\|last3=Pawlowsky-Glahn\|first3=V.\|date=2014\|title=Bayes Hilbert spaces\|journal=Australian and New Zealand Journal of Statistics\|volume=56\|issue=2\|pages=~~171-194~~171–194\|doi=10.1111/anzs.12074\|s2cid=120612578 }}</ref> For <math>f \in \mathcal{F}_f</math>, let <math>Y = \Psi(f)</math>, the transformed functional variable. The mean function <math>\mu_Y(t) = \mathbb{E}\left[Y(t)\right]</math> and the covariance function <math>G_Y(s,t) = \operatorname{Cov}(Y(s), Y(t))</math> are defined accordingly, and let <math>\{\lambda_j, \phi_j\}_{j=1}^\infty</math> be the eigenpairs of <math>G_Y(s,t)</math>. The Karhunen-Loève decomposition gives <math>Y(t) = \mu_Y(t) + \sum_{j=1}^\infty \xi_j \phi_j(t)</math>, where <math>\xi_j = \int_D [Y(t) - \mu_Y(t)] \phi_j(t) dt</math>. Then, the <math>j</math>th transformation mode of variation is defined as <ref>{{Cite journal\|last1=Petersen\|first1=A.\|last2=Müller\|first2=H.-G.\|date=2016\|title=Functional data analysis for density functions by transformation to a Hilbert space\|journal=Annals of Statistics\|volume=44\|issue=1\|pages=~~183-218~~183–218\|doi=10.1214/15-AOS1363}}</ref> <math> g_{j}^{TF}(t, \alpha) = \Psi^{-1} \left( \mu_Y + \alpha \sqrt{\lambda_j}\phi_j \right)(t), \quad t \in D, \; \alpha \in [-A, A]. Line 60: Let <math>Y</math> be the projected density onto the tangent space, <math>Y = \log_{\mu_\oplus}(f)</math>. In Log FPCA, FPCA is performed to <math>Y</math> and then projected back to <math>\mathcal{F}</math> using the exponential map.<ref>{{Cite journal\|last1=Fletcher\|first1=T.F.\|last2=Lu\|first2=C.\|last3=Pizer\|first3=S.M.\|last4=Joshi\|first4=S.\|date=2004\|title=Principal geodesic analysis for the study of nonlinear statistics of shape\|journal=IEEE Transactions on Medical Imaging\|volume=23\|issue=8\|pages=~~995-1005~~995–1005\|doi=10.1109/TMI.2004.831793 \|pmid=15338733 \|s2cid=620015 }}</ref> Therefore, with <math>Y(t) = \mu_Y(t) + \sum_{j=1}^\infty \xi_j \phi_j(t)</math>, the <math>j</math>th Log FPCA mode of variation is defined as <math>g_j^{Log}(t, \alpha) = \exp_{f_\oplus} \left( \mu_{f_\oplus} + \alpha \sqrt{\lambda_j} \phi_j \right)(t), \quad t \in D, \; \alpha \in [-A, A].</math> Line 68: </math> Let the reference measure <math>\nu_0</math> be the Wasserstein mean <math>\mu_\oplus</math>. Then, a ''principal geodesic subspace (PGS)'' of dimension <math>k</math> with respect to <math>\mu_\oplus</math> is a set <math>G_k = \operatorname{argmin}_{G \in \text{CG}_{\nu_\oplus, k}(\mathcal{W}_2)} K_{W_2}(G)</math>. <ref name="gpca1">{{Cite journal\|last1=Bigot\|first1=J.\|last2=Gouet\|first2=R.\|last3=Klein\|first3=T.\|last4=López\|first4=A.\|date=2017\|title=Geodesic PCA in the Wasserstein space by convex PCA\|journal=Annales de l'institut Henri Poincare (B) Probability and Statistics\|volume=53\|issue=1\|pages=~~1-26~~1–26\|doi=10.1214/15-AIHP706\|bibcode=2017AnIHP..53....1B \|s2cid=49256652 \|url=https://hal.archives-ouvertes.fr/hal-01978864/file/AIHP706.pdf }}</ref><ref name="gpca2">{{Cite journal\|last1=Cazelles\|first1=E.\|last2=Seguy\|first2=V.\|last3=Bigot\|first3=J.\|last4=Cuturi\|first4=M.\|last5=Papadakis\|first5=N.\|date=2018\|title=Geodesic PCA versus Log-PCA of histograms in the Wasserstein space\|journal=SIAM Journal on Scientific Computing\|volume=40\|issue=2\|pages=~~B429-B456~~B429–B456\|doi=10.1137/17M1143459 \|bibcode=2018SJSC...40B.429C }}</ref> Note that the tangent space <math>T_{\mu_\oplus}</math> is a subspace of <math>L^2_{\mu_\oplus}</math>, the Hilbert space of <math>{\mu_\oplus}</math>-square-integrable functions. Obtaining the PGS is equivalent to performing PCA in <math>L^2_{\mu_\oplus}</math> under constraints to lie in the convex and closed subset.<ref name="gpca2"/> Therefore, a simple approximation of the Wasserstein Geodesic PCA is the Log FPCA by relaxing the geodesicity constraint, while alternative techniques are suggested.<ref name="gpca1"/><ref name="gpca2"/> Line 75: ==Fréchet regression== Fréchet regression is a generalization of regression with responses taking values in a metric space and Euclidean predictors.<ref name="freg">{{Cite journal\|last1=Petersen\|first1=A.\|last2=Müller\|first2=H.-G.\|date=2019\|title=Fréchet regression for random objects with Euclidean predictors\|journal=Annals of Statistics\|volume=47\|issue=2\|pages=~~691~~691–719\|doi=10.1214/17-~~719~~AOS1624 }}</ref><ref name="review">{{Cite journal\|last1=Petersen\|first1=A.\|last2=Zhang\|first2=C.\|last3=Kokoszka\|first3=P.\|date=2022\|title=Modeling probability density functions as data objects\|journal=Econometrics and Statistics\|volume=21\|pages=~~159-178~~159–178\|doi=10.1016/j.ecosta.2021.04.004 \|s2cid=236589040 }}</ref> Using the Wasserstein metric <math>d_{W_2}</math>, Fréchet regression models can be applied to distributional objects. The global Wasserstein-Fréchet regression model is defined as {{NumBlk\|::\|<math display="block">\begin{align} m_\oplus (x) &= \operatorname{argmin}_{\omega \in \mathcal{F}} \mathbb{E}\left[ s_G(X,x) d_{W_2}^2(\nu,\omega) \right], \\ Line 98: <math display="block"> \Gamma g(t) = \langle \beta(\cdot, t),g \rangle_{\omega{\oplus}}, \; t \in D, \; g \in T_{\omega{\oplus}}, \; \beta:D^2 \to \R.</math> Estimation of the regression operator is based on empirical estimators obtained from samples.<ref>{{Cite journal\|last1=Chen\|first1=Y.\|last2=Lin\|first2=Z.\|last3=Müller\|first3=H.-G.\|date=2023\|title=Wasserstein regression\|journal=Journal of the American Statistical Association\|volume=118\|issue=542\|pages=~~869-882~~869–882\|doi=10.1080/01621459.2021.1956937 \|s2cid=219721275 }}</ref> Also, the Fisher-Rao metric <math>d_{FR}</math> can be used in a similar fashion.<ref name="review"/><ref name="dai2022">{{Cite journal\|last1=Dai\|first1=X.\|date=2022\|title=Statistical inference on the Hilbert sphere with application to random densities\|journal=Electronic Journal of Statistics\|volume=16\|issue=1\|pages=~~700~~700–736\|doi=10.1214/21-~~736~~EJS1942 }}</ref> =Hypothesis testing= Line 105: ==Wasserstein <math>F</math>-test== Wasserstein <math>F</math>-test has been proposed to test for the effects of the predictors in the Fréchet regression framework with the Wasserstein metric.<ref name="ftest">{{Cite journal\|last1=Petersen\|first1=A.\|last2=Liu\|first2=X.\|last3=Divani\|first3=A.A.\|date=2021\|title=Wasserstein F-tests and confidence bands for the Fréchet regression of density response curves\|journal=Annals of Statistics\|volume=49\|issue=1\|pages=~~590~~590–611\|doi=10.1214/20-~~611~~AOS1971 \|arxiv=1910.13418 \|s2cid=204950494 }}</ref> Consider Euclidean predictors <math>X \in \R^p</math> and distributional responses <math>\nu \in \mathcal{W}_2</math>. Denote the Wasserstein mean of <math>\nu</math> as <math>\mu_\oplus^</math>, and the sample Wasserstein mean as <math>\hat{\mu}_\oplus^</math>. Consider the global Wasserstein-Fréchet regression model <math>m_\oplus (x)</math> defined in ({{EquationNote\|1}}), which is the conditional Wasserstein mean given <math>X=x</math>. The estimator of <math>m_\oplus (x)</math>, <math>\hat{m}_\oplus (x)</math> is obtained by minimizing the empirical version of the criterion. Let <math>F</math>, <math>Q</math>, <math>f</math>, Line 146: [[Autoregressive model\|Autoregressive (AR) models]] for distributional time series are constructed by defining [[Stationary process\|stationarity]] and utilizing the notion of difference between distributions using <math>d_{W_2}</math> and <math>d_{FR}</math>. In Wasserstein autoregressive model (WAR), consider a stationary density time series <math>f_t</math> with Wasserstein mean <math>f_\oplus</math>.<ref>{{Cite journal\|last1=Zhang\|first1=C.\|last2=Kokoszka\|first2=P.\|last3=Petersen\|first3=A.\|date=2022\|title=Wasserstein autoregressive models for density time series\|journal=Journal of Time Series Analysis\|volume=43\|issue=1\|pages=~~30-52~~30–52\|doi=10.1111/jtsa.12590 \|arxiv=2006.12640 \|s2cid=219980621 }}</ref> Denote the difference between <math>f_t</math> and <math>f_\oplus</math> using the logarithm map, <math>f_t \ominus f_{\oplus} = \log_{f_\oplus} f_t = T_t - \text{id}</math>, where <math>T_t = Q_t \circ F_\oplus</math> is the optimal transport from <math>f_\oplus</math> to <math>f_t</math> in which <math>F_t</math> and <math>F_{\oplus}</math> are the cdf of <math>f_t</math> and <math>f_{\oplus}</math>. An <math>AR(1)</math> model on the tangent space <math>T_{f_\oplus}</math> is defined as <math>V_t = \beta V_{t-1} + \epsilon_t, \; t \in \mathbb{Z},</math> for <math>V_t \in T_{f_\oplus}</math> with the autoregressive parameter <math>\beta \in \mathbb{R}</math> and mean zero random i.i.d. innovations <math>\epsilon_t</math>. Under proper conditions, <math>\mu_t = \exp_{f_\oplus}(V_t)</math> with densities <math>f_t</math> and <math>V_t = \log_{f_\oplus}(\mu_t)</math>. Accordingly, <math>WAR(1)</math>, with a natural extension to order <math>p</math>, is defined as <math display="block"> T_t - \text{id} = \beta (T_{t-1} - \text{id} ) + \epsilon_t. </math> On the other hand, the spherical autoregressive model (SAR) considers the Fisher-Rao metric.<ref>{{Cite journal\|last1=Zhu\|first1=C.\|last2=Müller\|first2=H.-G.\|date=2023\|title=Spherical autoregressive models, with application to distributional and compositional time series\|journal=Journal of Econometrics\|doi=10.1016/j.jeconom.2022.12.008 }}</ref> Following the settings of [[##Tests for the intrinsic mean]], let <math>x_t \in \mathcal{X}</math> with Fréchet mean <math>\mu_x</math>. Let <math>\theta = \arccos(\langle x_t, \mu_x \rangle )</math>, which is the geodesic distance between <math>x_t</math> and <math>\mu_x</math>. Define a rotation operator <math>Q_{x_t, \mu_x}</math> that rotates <math>x_t</math> to <math>\mu_x</math>. The spherical difference between <math>x_t</math> and <math>\mu_x</math> is represented as <math>R_t = x_t \ominus \mu_x = \theta Q_{x_t, \mu_x}</math>. Assume that <math>R_t</math> is a stationary sequence with the Fréchet mean <math>\mu_R</math>, then <math>SAR(1)</math> is defined as <math display="block"> R_t - \mu_R = \beta (R_{t-1} - \mu_R) + \epsilon_t, </math> where <math>\mu_R = \mathbb{E}R_t</math> and mean zero random i.i.d innovations <math>\epsilon_t</math>. An alternative model, the differenced based spherical autoregressive (DSAR) model is defined with <math>R_t = x_{t+1} \ominus x_t</math>, with natural extensions to order <math>p</math>. A similar extension to the Wasserstein space was introduced.<ref>{{Cite journal\|last1=Zhu\|first1=C.\|last2=Müller\|first2=H.-G.\|date=2023\|title=Autoregressive optimal transport models\|journal=Journal of the Royal Statistical Society Series B: Statistical Methodology\|volume=85\|issue=3\|pages=1012–1033\|doi=10.1093/jrsssb/qkad051 \|pmid=37521164 \|pmc=10376456 }}</ref> =References=

Distributional data analysis: Difference between revisions