Additive Optimal Transport Regression

Updated 15 December 2025

Additive optimal transport regression is a framework that replaces linear shifts with compositions of geodesic transport maps to model responses in general metric spaces.
It decomposes multivariate Euclidean predictors into univariate transport maps applied to the Fréchet mean, addressing the curse of dimensionality.
The ADOPT model ensures estimation consistency via a transport backfitting algorithm and has practical applications in SPD matrices and fMRI connectivity analysis.

Additive optimal transport regression is a model class for analyzing regression problems where the predictor variables are Euclidean and the response lies in a general geodesic metric space. The central innovation is the replacement of additive linear shifts (which require vector space structure) with compositions of optimal geodesic transport maps, thus enabling additive modeling for manifold- or distribution-valued responses without embedding or vectorization. The most developed framework for this approach is ADOPT (Additive Optimal Transport Regression), which systematically extends additive regression concepts through the geometry of geodesic metric spaces and optimal transport (Song et al., 8 Dec 2025).

1. Mathematical Formulation and Problem Setting

Consider observations $(X_i, Y_i)$ , $i=1,\dots, n$ , with $X_i = (X_{i1}, ..., X_{ip}) \in \mathbb{R}^p$ —Euclidean predictors—and $Y_i$ valued in a bounded, separable, uniquely geodesic metric space $(\mathcal{M}, d)$ . Unlike classical regression, where the conditional mean $\mathbb{E}[Y\,|\,X=x]$ is well defined, the lack of vector addition in general metric spaces is overcome by using the Fréchet conditional mean: $\mu_\oplus(x) := \arg\min_{m\in\mathcal M}\; \mathbb E\big[d^2(Y, m)\mid X = x\big].$ The objective is to model $\mu_\oplus(x)$ flexibly for multivariate $(X_1, ..., X_p)$ while maintaining interpretability and avoiding the curse of dimensionality (Song et al., 8 Dec 2025).

2. Additive Structure via Geodesic Transport

In Euclidean additive models, the regression function is parameterized as $m(x) \approx \beta_0 + \sum_j g_j(x_j)$ . In the ADOPT paradigm, addition is replaced by composition of transport maps, each encoding the influence of $x_j$ on the response in the geometry of $(\mathcal{M}, d)$ .

Define $\mu_\oplus$ as the (unconditional) Fréchet mean.
For each predictor $x_j$ , introduce a transport-valued function $T_j: x_j \mapsto T_j(x_j)$ , with $T_j(x_j)$ a geodesic transport map on $\mathcal{M}$ .
The combined transport is given by composition $\oplus$ , i.e., $T_1(x_1)\oplus\cdots\oplus T_p(x_p)$ , acting on $\mu_\oplus$ as $[T_1(x_1)\oplus\cdots\oplus T_p(x_p)](\mu_\oplus)$ .
The ADOPT model is:

$Y = [T_1(X_1)\oplus T_2(X_2)\oplus\cdots\oplus T_p(X_p)\oplus\varepsilon](\mu_\oplus),$

where $\varepsilon$ is a small random perturbation map.

This construction enables interpretability: each $T_j$ explicitly describes how $x_j$ alters the conditional mean via geodesic transport from the Fréchet mean along the geometry of $\mathcal{M}$ .

3. Optimal Geodesic Transports and Their Composition

In one-dimensional $\mathcal{W}_2$ Wasserstein space, the optimal transport from $P$ to $Q$ is given by quantile mapping. For general geodesic spaces, ADOPT posits the existence of a ternary map $\Gamma: \mathcal M \times \mathcal M \times \mathcal M \to \mathcal M$ such that

$T_{u\to v}(w) = \Gamma(u, v, w),$

which moves $w$ along the unique geodesic from $u$ to $v$ . The sum of transports is defined by map composition: $T_{u_1\to v_1} \oplus T_{u_2\to v_2} = T_{u_2\to v_2} \circ T_{u_1\to v_1}.$ This composition is associative and produces the effect of sequentially transporting the base point $\mu_\oplus$ according to each coordinate's effect.

4. Estimation via Transport Backfitting Algorithm

Estimation in ADOPT is based on a transport backfitting scheme, analogous to classical additive model backfitting but operating on transport maps and Fréchet means.

Algorithmic steps:

Initialization: Compute $\hat\mu_\oplus = \arg\min_{m \in \mathcal{M}} \frac{1}{n}\sum_i d^2(Y_i, m)$ . Set $\hat T_j^{(0)}(x_j) = \mathrm{id}$ for all $j$ .
Iterative update for each coordinate $j$ :
- Form partial transport–residuals by undoing the effect of other $T_k$ and of the observed $Y_i$ relative to the overall mean.
- Fit $\hat g_j^{(t)}(x)$ by local Fréchet regression of these residuals against $X_{ij}$ .
- Center (normalize) $\hat g_j^{(t)}$ and use it to update $\hat T_j^{(t)}(x)$ via geodesic transport.
Update fitted responses $\hat Y_i^{(t)}$ and iterate until $\frac{1}{n}\sum_i d^2(\hat Y_i^{(t)}, \hat Y_i^{(t-1)}) < \varepsilon$ .

Every $\hat T_j$ is estimated univariately, which mitigates the curse of dimensionality and makes the procedure practical for moderate to large $p$ .

5. Theoretical Guarantees

Under the key assumptions below, the transport backfitting estimates converge to the true transport maps:

Kernel regularity: smoothness of predictor densities and uniqueness/stability of Fréchet minimizers [(A1)-(A3)].
Small perturbation error and Lipschitz property of $\Gamma$ [(A4)-(A5)].
Transport maps as Fréchet-type perturbations [(A6)].

Main result: For any fixed $x$ and anchor $\nu\in\mathcal M$ , the fitted $\hat T_j(x)$ satisfies

$d\big(T_j(x)(\nu),\,\hat T_j(x)(\nu)\big) = o_P(1),$

uniformly over compact predictor domains, as $n\to\infty$ , $h\to 0$ , $nh\to\infty$ (Song et al., 8 Dec 2025).

6. Applications and Examples

The ADOPT framework accommodates responses including:

SPD (symmetric positive-definite) matrix regression using the log-Cholesky metric:

$d^2_{\text{LC}}(S_1, S_2) = \|\lfloor L_1 \rfloor - \lfloor L_2 \rfloor\|_F^2 + \|D(L_1) - D(L_2)\|^2$

where $S = LL^\top$ is the Cholesky decomposition, and $L$ is split into diagonal/off-diagonal parts.

Correlation matrix analysis from fMRI data: Used to regress $l \times l$ correlation matrices of brain connectivity on biomedical predictors (e.g., cerebrospinal amyloid-β, diagnostic stage, p-Tau), revealing interpretable patterns in network connectivity.
Metrics with explicit geodesics and transports (e.g., the affine-invariant metric for SPD matrices and Wasserstein metrics for probability distributions).
The method is intrinsic and does not require embedding or vectorizing the metric space responses.

7. Advantages, Limitations, and Open Problems

ADOPT inherits the interpretability and statistical appeal of additive models while allowing responses in arbitrary geodesic metric spaces.

Advantages:
- Each $T_j$ is interpretable and univariate, allowing visualization and diagnostics.
- The model is intrinsic to $(\mathcal M, d)$ , avoiding possibly-distorting embeddings.
- Efficiently mitigates dimensionality issues by decoupling predictor effects.
Challenges:
- Computational scalability for large $n$ or complex metric spaces.
- Bandwidth selection and regularization remain semi-manual.
- Extension to spaces lacking unique geodesics (e.g., spheres) is non-trivial.
- High-order interaction modeling and dimension-reduction schemes are underexplored.

Further empirical evaluation, theoretical guarantees under weaker conditions, and generalizations to broader classes of metric-valued regression remain substantial avenues of research (Song et al., 8 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

ADOPT: Additive Optimal Transport Regression (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Additive Optimal Transport Regression.