Additive Optimal Transport Regression
- Additive optimal transport regression is a framework that replaces linear shifts with compositions of geodesic transport maps to model responses in general metric spaces.
- It decomposes multivariate Euclidean predictors into univariate transport maps applied to the Fréchet mean, addressing the curse of dimensionality.
- The ADOPT model ensures estimation consistency via a transport backfitting algorithm and has practical applications in SPD matrices and fMRI connectivity analysis.
Additive optimal transport regression is a model class for analyzing regression problems where the predictor variables are Euclidean and the response lies in a general geodesic metric space. The central innovation is the replacement of additive linear shifts (which require vector space structure) with compositions of optimal geodesic transport maps, thus enabling additive modeling for manifold- or distribution-valued responses without embedding or vectorization. The most developed framework for this approach is ADOPT (Additive Optimal Transport Regression), which systematically extends additive regression concepts through the geometry of geodesic metric spaces and optimal transport (Song et al., 8 Dec 2025).
1. Mathematical Formulation and Problem Setting
Consider observations , , with —Euclidean predictors—and valued in a bounded, separable, uniquely geodesic metric space . Unlike classical regression, where the conditional mean is well defined, the lack of vector addition in general metric spaces is overcome by using the Fréchet conditional mean: The objective is to model flexibly for multivariate while maintaining interpretability and avoiding the curse of dimensionality (Song et al., 8 Dec 2025).
2. Additive Structure via Geodesic Transport
In Euclidean additive models, the regression function is parameterized as . In the ADOPT paradigm, addition is replaced by composition of transport maps, each encoding the influence of on the response in the geometry of .
- Define as the (unconditional) Fréchet mean.
- For each predictor , introduce a transport-valued function , with a geodesic transport map on .
- The combined transport is given by composition , i.e., , acting on as .
- The ADOPT model is:
where is a small random perturbation map.
This construction enables interpretability: each explicitly describes how alters the conditional mean via geodesic transport from the Fréchet mean along the geometry of .
3. Optimal Geodesic Transports and Their Composition
In one-dimensional Wasserstein space, the optimal transport from to is given by quantile mapping. For general geodesic spaces, ADOPT posits the existence of a ternary map such that
which moves along the unique geodesic from to . The sum of transports is defined by map composition: This composition is associative and produces the effect of sequentially transporting the base point according to each coordinate's effect.
4. Estimation via Transport Backfitting Algorithm
Estimation in ADOPT is based on a transport backfitting scheme, analogous to classical additive model backfitting but operating on transport maps and Fréchet means.
Algorithmic steps:
- Initialization: Compute . Set for all .
- Iterative update for each coordinate :
- Form partial transport–residuals by undoing the effect of other and of the observed relative to the overall mean.
- Fit by local Fréchet regression of these residuals against .
- Center (normalize) and use it to update via geodesic transport.
- Update fitted responses and iterate until .
Every is estimated univariately, which mitigates the curse of dimensionality and makes the procedure practical for moderate to large .
5. Theoretical Guarantees
Under the key assumptions below, the transport backfitting estimates converge to the true transport maps:
- Kernel regularity: smoothness of predictor densities and uniqueness/stability of Fréchet minimizers [(A1)-(A3)].
- Small perturbation error and Lipschitz property of [(A4)-(A5)].
- Transport maps as Fréchet-type perturbations [(A6)].
Main result: For any fixed and anchor , the fitted satisfies
uniformly over compact predictor domains, as , , (Song et al., 8 Dec 2025).
6. Applications and Examples
The ADOPT framework accommodates responses including:
- SPD (symmetric positive-definite) matrix regression using the log-Cholesky metric:
where is the Cholesky decomposition, and is split into diagonal/off-diagonal parts.
- Correlation matrix analysis from fMRI data: Used to regress correlation matrices of brain connectivity on biomedical predictors (e.g., cerebrospinal amyloid-β, diagnostic stage, p-Tau), revealing interpretable patterns in network connectivity.
- Metrics with explicit geodesics and transports (e.g., the affine-invariant metric for SPD matrices and Wasserstein metrics for probability distributions).
- The method is intrinsic and does not require embedding or vectorizing the metric space responses.
7. Advantages, Limitations, and Open Problems
ADOPT inherits the interpretability and statistical appeal of additive models while allowing responses in arbitrary geodesic metric spaces.
- Advantages:
- Each is interpretable and univariate, allowing visualization and diagnostics.
- The model is intrinsic to , avoiding possibly-distorting embeddings.
- Efficiently mitigates dimensionality issues by decoupling predictor effects.
- Challenges:
- Computational scalability for large or complex metric spaces.
- Bandwidth selection and regularization remain semi-manual.
- Extension to spaces lacking unique geodesics (e.g., spheres) is non-trivial.
- High-order interaction modeling and dimension-reduction schemes are underexplored.
Further empirical evaluation, theoretical guarantees under weaker conditions, and generalizations to broader classes of metric-valued regression remain substantial avenues of research (Song et al., 8 Dec 2025).