Large-Scale Optimal Transport and Mapping Estimation (1711.02283v2)

Published 7 Nov 2017 in stat.ML

Abstract: This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.

Citations (236)

View on Semantic Scholar

Summary

The paper introduces a two-phase method that first learns regularized OT plans via stochastic dual optimization and then estimates a deterministic Monge map using deep neural networks.
It provides theoretical guarantees by proving convergence for the regularized OT estimator and consistency for the deep network mapping estimator.
Empirical results in domain adaptation and generative modeling demonstrate improved scalability and mapping fidelity in large-scale settings.

Overview

This work tackles the classical problem of transferring mass between probability distributions by formulating it as an optimal transport (OT) problem, and further extends the discussion to optimal mapping estimation. The approach is bifurcated into two distinct phases: an OT plan learning phase based on a stochastic dual framework, and a subsequent phase that estimates a Monge map through a deep neural network. The integration of these components not only leads to efficient computation in large-scale settings but also provides a principled mechanism for generalizing the mapping beyond observed data. The methodology exhibits robust theoretical guarantees via stability results and delivers promising empirical outcomes in applications such as domain adaptation and generative modeling.

Optimal Transport Plan Learning

The initial stage involves the formulation and numerical solution of a regularized OT problem. In contrast to classical OT formulations that may suffer from computational intractability in high-dimensional or large-sample scenarios, the paper employs a stochastic dual approach. Key technical points include:

Stochastic Dual Optimization:

The method leverages a dual optimization framework where the Lagrangian dual is optimized via stochastic methods. This significantly reduces the computational burden involved in handling large empirical measures.

Regularization:

The OT problem is regularized to ensure smoothness in the resulting optimal transport plan. This regularization not only aids in numerical stability but also facilitates the derivation of theoretical convergence properties. The regularized OT can be expressed as: $\min_{\pi \in \Pi(\mu,\nu)} \int c(x,y) \, d\pi(x,y) + \epsilon R(\pi),$ where $R(\pi)$ is a regularization term (commonly an entropic regularizer) and $\epsilon > 0$ controls the trade-off between fidelity and smoothness.

Scalability:

Empirical results indicate that the stochastic dual approach scales dramatically better than previous techniques, particularly in settings with millions of samples. This scalability is achieved by processing mini-batches instead of the full joint distributions, enabling efficient updates and memory utilization.

Monge Map Estimation

After obtaining the optimal transport plan, the focus shifts toward estimating a deterministic mapping, i.e., the Monge map. The significant details of this phase are:

Barycentric Projection:

The optimal plan, which inherently permits one-to-many mappings due to its probabilistic nature, is aggregated through a barycentric projection. This projection serves to compute a deterministic counterpart that approximates the Monge map.

Deep Neural Network Parameterization:

A deep neural network is subsequently trained to learn this barycentrically projected mapping. Architecturally, the network serves as a nonlinear regressor that, given an input from the source distribution, produces its corresponding output in the target distribution. The deep parameterization grants the network the ability to generalize beyond the support of the input measure.

Generalization Beyond Observed Support:

The learned mapping is not limited to the sampled data points, thereby enabling extrapolation to unseen regions — a property critically beneficial in scenarios like generative modeling. Training is typically performed by minimizing a loss function that ensures fidelity to the barycentric projection computed from the OT plan.

Theoretical Analysis and Stability

An important contribution of the work lies in its rigorous theoretical analysis. Two key stability results are established:

Convergence of Regularized OT:

The paper proves that the stochastic dual estimator converges to the true optimal transport plan between the underlying continuous measures, as the regularization parameter is suitably tuned, and the sample size increases. This result provides a theoretical basis for the stochastic dual approach, guaranteeing convergence under regular conditions.

Consistency of the Monge Map Estimator:

Similarly, it is shown that the deep neural network estimator, which approximates the barycentric projection, converges to the exact Monge map in the limit of infinite data and appropriate network capacity. Such consistency results underpin the practical deployment of the mapping estimator in various applications.

Both the convergence and consistency results are pivotal for validating the use of the proposed techniques in large-scale and high-dimensional settings where classical methods may not be feasible.

Scalability and Practical Considerations

From a practical standpoint, the scalability of the approach is ensured by several key design choices:

Stochastic Approximation:

The use of stochastic (mini-batch) optimization procedures in the dual formulation permits parallel processing and incremental updates, making it amenable to hardware acceleration on GPUs or TPU clusters.

Entropic Regularization:

The inclusion of an entropic regularizer not only supports numerical stability but also allows the use of efficient Sinkhorn-type iterations, albeit adapted for the stochastic setting. This significantly reduces iteration complexity, crucial for applications requiring real-time or near-real-time computations.

Deep Network Training:

Standard techniques from deep learning (e.g., regularization, dropout, batch normalization) can be naturally incorporated into the parameterization step. However, careful tuning of the network architecture and hyperparameters is essential to balance model capacity and overfitting, especially when generalizing beyond the empirical support.

Applications in Domain Adaptation and Generative Modeling

The versatility of the proposed method is showcased through its two primary applications:

Domain Adaptation:

In this context, the optimal mapping serves as a bridge to align source and target distributions, facilitating downstream tasks such as classification or segmentation in a new domain. The learned mapping can be integrated into domain adaptation pipelines, effectively reducing distributional discrepancies and improving target domain performance.

Generative Modeling:

For generative modeling, the approach is used to learn a mapping from a latent space (commonly a Gaussian distribution) to the data space. This mapping allows for the generation of high-fidelity samples that capture the true data distribution. The deterministic mapping provided by the Monge map ensures that generated samples meaningfully represent the learned manifold and not merely noisy approximations.

Empirical results in the paper substantiate the method's ability to handle datasets of significant size while maintaining mapping fidelity. Numerical experiments demonstrate that the proposed approach yields both qualitative and quantitative improvements over baseline methods.

Conclusion

The "Large-Scale Optimal Transport and Mapping Estimation" framework presents a well-grounded, two-step methodology for optimizing transport plans and estimating robust Monge maps using a combination of stochastic optimization and deep learning. The method stands out due to its scalability, proven convergence properties, and wide applicability in high-dimensional mapping tasks. Its technical innovations pave the way for deploying OT-based techniques in large-scale, real-world problems such as domain adaptation and generative modeling, making it a valuable addition to the computational optimal transport literature.

PDF Markdown