Papers
Topics
Authors
Recent
Search
2000 character limit reached

Additive Optimal Transport Regression

Updated 15 December 2025
  • Additive optimal transport regression is a framework that replaces linear shifts with compositions of geodesic transport maps to model responses in general metric spaces.
  • It decomposes multivariate Euclidean predictors into univariate transport maps applied to the Fréchet mean, addressing the curse of dimensionality.
  • The ADOPT model ensures estimation consistency via a transport backfitting algorithm and has practical applications in SPD matrices and fMRI connectivity analysis.

Additive optimal transport regression is a model class for analyzing regression problems where the predictor variables are Euclidean and the response lies in a general geodesic metric space. The central innovation is the replacement of additive linear shifts (which require vector space structure) with compositions of optimal geodesic transport maps, thus enabling additive modeling for manifold- or distribution-valued responses without embedding or vectorization. The most developed framework for this approach is ADOPT (Additive Optimal Transport Regression), which systematically extends additive regression concepts through the geometry of geodesic metric spaces and optimal transport (Song et al., 8 Dec 2025).

1. Mathematical Formulation and Problem Setting

Consider observations (Xi,Yi)(X_i, Y_i), i=1,,ni=1,\dots, n, with Xi=(Xi1,...,Xip)RpX_i = (X_{i1}, ..., X_{ip}) \in \mathbb{R}^p—Euclidean predictors—and YiY_i valued in a bounded, separable, uniquely geodesic metric space (M,d)(\mathcal{M}, d). Unlike classical regression, where the conditional mean E[YX=x]\mathbb{E}[Y\,|\,X=x] is well defined, the lack of vector addition in general metric spaces is overcome by using the Fréchet conditional mean: μ(x):=argminmM  E[d2(Y,m)X=x].\mu_\oplus(x) := \arg\min_{m\in\mathcal M}\; \mathbb E\big[d^2(Y, m)\mid X = x\big]. The objective is to model μ(x)\mu_\oplus(x) flexibly for multivariate (X1,...,Xp)(X_1, ..., X_p) while maintaining interpretability and avoiding the curse of dimensionality (Song et al., 8 Dec 2025).

2. Additive Structure via Geodesic Transport

In Euclidean additive models, the regression function is parameterized as m(x)β0+jgj(xj)m(x) \approx \beta_0 + \sum_j g_j(x_j). In the ADOPT paradigm, addition is replaced by composition of transport maps, each encoding the influence of xjx_j on the response in the geometry of (M,d)(\mathcal{M}, d).

  • Define μ\mu_\oplus as the (unconditional) Fréchet mean.
  • For each predictor xjx_j, introduce a transport-valued function Tj:xjTj(xj)T_j: x_j \mapsto T_j(x_j), with Tj(xj)T_j(x_j) a geodesic transport map on M\mathcal{M}.
  • The combined transport is given by composition \oplus, i.e., T1(x1)Tp(xp)T_1(x_1)\oplus\cdots\oplus T_p(x_p), acting on μ\mu_\oplus as [T1(x1)Tp(xp)](μ)[T_1(x_1)\oplus\cdots\oplus T_p(x_p)](\mu_\oplus).
  • The ADOPT model is:

Y=[T1(X1)T2(X2)Tp(Xp)ε](μ),Y = [T_1(X_1)\oplus T_2(X_2)\oplus\cdots\oplus T_p(X_p)\oplus\varepsilon](\mu_\oplus),

where ε\varepsilon is a small random perturbation map.

This construction enables interpretability: each TjT_j explicitly describes how xjx_j alters the conditional mean via geodesic transport from the Fréchet mean along the geometry of M\mathcal{M}.

3. Optimal Geodesic Transports and Their Composition

In one-dimensional W2\mathcal{W}_2 Wasserstein space, the optimal transport from PP to QQ is given by quantile mapping. For general geodesic spaces, ADOPT posits the existence of a ternary map Γ:M×M×MM\Gamma: \mathcal M \times \mathcal M \times \mathcal M \to \mathcal M such that

Tuv(w)=Γ(u,v,w),T_{u\to v}(w) = \Gamma(u, v, w),

which moves ww along the unique geodesic from uu to vv. The sum of transports is defined by map composition: Tu1v1Tu2v2=Tu2v2Tu1v1.T_{u_1\to v_1} \oplus T_{u_2\to v_2} = T_{u_2\to v_2} \circ T_{u_1\to v_1}. This composition is associative and produces the effect of sequentially transporting the base point μ\mu_\oplus according to each coordinate's effect.

4. Estimation via Transport Backfitting Algorithm

Estimation in ADOPT is based on a transport backfitting scheme, analogous to classical additive model backfitting but operating on transport maps and Fréchet means.

Algorithmic steps:

  • Initialization: Compute μ^=argminmM1nid2(Yi,m)\hat\mu_\oplus = \arg\min_{m \in \mathcal{M}} \frac{1}{n}\sum_i d^2(Y_i, m). Set T^j(0)(xj)=id\hat T_j^{(0)}(x_j) = \mathrm{id} for all jj.
  • Iterative update for each coordinate jj:
    • Form partial transport–residuals by undoing the effect of other TkT_k and of the observed YiY_i relative to the overall mean.
    • Fit g^j(t)(x)\hat g_j^{(t)}(x) by local Fréchet regression of these residuals against XijX_{ij}.
    • Center (normalize) g^j(t)\hat g_j^{(t)} and use it to update T^j(t)(x)\hat T_j^{(t)}(x) via geodesic transport.
  • Update fitted responses Y^i(t)\hat Y_i^{(t)} and iterate until 1nid2(Y^i(t),Y^i(t1))<ε\frac{1}{n}\sum_i d^2(\hat Y_i^{(t)}, \hat Y_i^{(t-1)}) < \varepsilon.

Every T^j\hat T_j is estimated univariately, which mitigates the curse of dimensionality and makes the procedure practical for moderate to large pp.

5. Theoretical Guarantees

Under the key assumptions below, the transport backfitting estimates converge to the true transport maps:

  • Kernel regularity: smoothness of predictor densities and uniqueness/stability of Fréchet minimizers [(A1)-(A3)].
  • Small perturbation error and Lipschitz property of Γ\Gamma [(A4)-(A5)].
  • Transport maps as Fréchet-type perturbations [(A6)].

Main result: For any fixed xx and anchor νM\nu\in\mathcal M, the fitted T^j(x)\hat T_j(x) satisfies

d(Tj(x)(ν),T^j(x)(ν))=oP(1),d\big(T_j(x)(\nu),\,\hat T_j(x)(\nu)\big) = o_P(1),

uniformly over compact predictor domains, as nn\to\infty, h0h\to 0, nhnh\to\infty (Song et al., 8 Dec 2025).

6. Applications and Examples

The ADOPT framework accommodates responses including:

  • SPD (symmetric positive-definite) matrix regression using the log-Cholesky metric:

dLC2(S1,S2)=L1L2F2+D(L1)D(L2)2d^2_{\text{LC}}(S_1, S_2) = \|\lfloor L_1 \rfloor - \lfloor L_2 \rfloor\|_F^2 + \|D(L_1) - D(L_2)\|^2

where S=LLS = LL^\top is the Cholesky decomposition, and LL is split into diagonal/off-diagonal parts.

  • Correlation matrix analysis from fMRI data: Used to regress l×ll \times l correlation matrices of brain connectivity on biomedical predictors (e.g., cerebrospinal amyloid-β, diagnostic stage, p-Tau), revealing interpretable patterns in network connectivity.
  • Metrics with explicit geodesics and transports (e.g., the affine-invariant metric for SPD matrices and Wasserstein metrics for probability distributions).
  • The method is intrinsic and does not require embedding or vectorizing the metric space responses.

7. Advantages, Limitations, and Open Problems

ADOPT inherits the interpretability and statistical appeal of additive models while allowing responses in arbitrary geodesic metric spaces.

  • Advantages:
    • Each TjT_j is interpretable and univariate, allowing visualization and diagnostics.
    • The model is intrinsic to (M,d)(\mathcal M, d), avoiding possibly-distorting embeddings.
    • Efficiently mitigates dimensionality issues by decoupling predictor effects.
  • Challenges:
    • Computational scalability for large nn or complex metric spaces.
    • Bandwidth selection and regularization remain semi-manual.
    • Extension to spaces lacking unique geodesics (e.g., spheres) is non-trivial.
    • High-order interaction modeling and dimension-reduction schemes are underexplored.

Further empirical evaluation, theoretical guarantees under weaker conditions, and generalizations to broader classes of metric-valued regression remain substantial avenues of research (Song et al., 8 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Additive Optimal Transport Regression.