Papers
Topics
Authors
Recent
2000 character limit reached

Monge-Kantorovich Transport Map

Updated 23 November 2025
  • The Monge-Kantorovich transport map is a formulation that computes optimal mass-preserving mappings between probability distributions by minimizing transport costs.
  • It is applied in image processing for tasks like color harmonization, ensuring smooth integration of foreground and background colors through deterministic transformations.
  • Implementations such as MKL-Harmonizer use linear transformations based on distribution moments, achieving robust, real-time performance in augmented reality.

The Monge-Kantorovich transport map denotes the optimal mass-preserving mapping between two probability distributions that minimizes a specified transport cost. In the context of image processing, such as color harmonization, this concept provides a mathematical framework for aligning the color distributions of different image regions. The Monge-Kantorovich transport map is central to constructing deterministic transformations that push a source distribution onto a target while minimizing quadratic costs, with critical application in efficient, real-time harmonization algorithms for augmented reality and image compositing (Larchenko et al., 16 Nov 2025).

1. Optimal Transport and the Monge-Kantorovich Problem

Given a composite image constructed by pasting a foreground object with color distribution π0\pi_{0} into a background scene, the objective is to recolor the foreground so its pixel values match the statistical properties of the background, achieving a coherent visual integration. The deterministic Monge transport map Tim:R3R3T_{im} : \mathbb{R}^{3} \rightarrow \mathbb{R}^{3} must satisfy the mass-preservation constraint,

π0(x)=π1(Tim(x))detJTim(x),\pi_{0}(x) = \pi_{1}(T_{im}(x))\, |\det J_{T_{im}}(x)|,

where JTimJ_{T_{im}} is the Jacobian matrix of TimT_{im}, and π1\pi_{1} is the target distribution. Among all such maps, the Monge formulation selects the one minimizing the expected squared Euclidean distance: Tim=argminTT(x)x2π0(x)dx.T_{im}^{*} = \arg\min_{T} \int \|T(x)-x\|^{2}\, \pi_{0}(x)\, dx. Solvability and uniqueness under mild regularity conditions are established in the foundational works of Villani (2009).

2. The Monge–Kantorovich Linear Map

When both source and target color distributions are modeled as Gaussian measures, i.e., N(μ0,Σ0)\mathcal{N}(\mu_{0}, \Sigma_{0}) for the foreground and N(μ1,Σ1)\mathcal{N}(\mu_{1},\Sigma_{1}) for the background, the quadratic optimal transport admits a closed-form, linear solution: T(x)=μ1+A(xμ0),T^{*}(x) = \mu_{1} + A(x-\mu_{0}), where

A=Σ01/2(Σ01/2Σ1Σ01/2)1/2Σ01/2.A = \Sigma_{0}^{-1/2} \left(\Sigma_{0}^{1/2} \Sigma_{1} \Sigma_{0}^{1/2}\right)^{1/2} \Sigma_{0}^{-1/2}.

Hence, the Monge-Kantorovich linear map ("MKL", Editor's term) reduces optimal color harmonization to a global affine transformation estimated solely from the first and second moments of source and target. To ensure values remain within the allowable color cube [0,1]3[0,1]^3, outputs are clipped, and the process is equivalent to a Euclidean projection as formally proven (Lemma 1, (Larchenko et al., 16 Nov 2025)).

3. Architecture: MKL-Harmonizer Implementation

MKL-Harmonizer introduces a neural encoder to predict the optimal affine mapping in real time, targeting mobile and edge devices. The encoder employs EfficientNet-B0 (5.3M parameters, 0.39 GFLOPs for 224×224224 \times 224 input) as its backbone. The model ingests a 256×256256 \times 256 RGB image concatenated with a binary mask (foreground indicator channel), leveraging a modified convolutional stem and a series of MBConv blocks with Swish activations. The global feature representation is reduced via pooling and a fully-connected head to a 12-dimensional output, parameterizing either [μ1,Σ1][\mu_{1}, \Sigma_{1}] (mean, covariances) or the direct mapping [A,S][A,S] (affine matrix, shift). Empirical results indicate that direct [A,S][A,S] regression is more robust in practice.

4. Training Procedures and Loss Functions

Supervised training is conducted on the iHarmony4-clean dataset, a refinement of iHarmony4 with artifact suppression via unmasked region replacement. The network is optimized using Adam, batch size 64, over 210 epochs with a decaying learning rate. The loss function combines:

  • Label loss: Model(im)[A,S]1\| \mathrm{Model}(im) - [A,S] \|_1 (L1 norm on MKL parameters)
  • Content loss: MX0M(AX0+S)1\|M \odot X_{0} - M \odot (A'X_{0} + S')\|_1 (per-pixel L1 after MKL filtering, within the mask)
  • Total loss: Llabels+αLcontentL_{\mathrm{labels}} + \alpha L_{\mathrm{content}} with α=10\alpha = 10

This combination ensures both parameter regression accuracy and perceptual fidelity in the harmonized object. Regularization via L1L_1 promotes sharper, more stable solutions, while the per-pixel loss mitigates the risk of collapse to the identity mapping.

5. Inference Pipeline and Computational Complexity

The MKL-Harmonizer is optimized for edge deployment, with the full model (EfficientNet-B0 backbone plus head) requiring approximately 5.3M parameters and 0.4 GFLOPs per 256×256256 \times 256 image. The on-device workflow is as follows:

  1. Render RGB frame and binary mask to memory.
  2. Encoder predicts 12 MKL parameters in a single forward pass.
  3. Compute per-pixel color transformation T(x)=Ax+ST(x) = Ax + S with clipping, efficiently implemented on GPU. Performance metrics include:
  • RTX 4060Ti: 175 it/s (2562256^2), 167 it/s (5122512^2), 137 it/s (1024×20481024\times2048), 41 it/s (409624096^2)
  • Mobile (Pixel 4a/7): 12–15 fps at 1080×22041080 \times 2204, with potential for doubling via zero-copy optimizations.

6. Quantitative Benchmarks and User Evaluation

Table: Quantitative Results (iHarmony4-clean, 2562256^2)

Method MSE PSNR foreground-MSE
Ideal Linear OT 7.6 43.6 45.9
PCT-Net 29.1 38.0 201
Harmonizer 40.1 36.6 258
MKL-Harmonizer 65.0 34.1 438
INR-Harmonization 67.2 35.3 392
Unharmonized 182 31.0 984

User studies conducted on real ARCore composites (327 images) using a four-way forced-choice protocol found that MKL-Harmonizer achieved the highest mean opinion score (MOS) among tested algorithms. In the speed vs. perceptual quality trade-off, MKL-Harmonizer demonstrated a leading position with high MOS and maximal processing speed (Larchenko et al., 16 Nov 2025).

7. Observations, Biases, and Data Contributions

Key experimental findings reveal the power of linear MKL filters—Ideal Linear OT closely approaches deep counterparts on standard benchmarks. L1L_1 losses on affine filter parameters yield superior sharpness and stability. MKL-Harmonizer avoids overfitting to mask leakage ("exposure bias") due to its global filter architecture, in contrast to pixelwise encoder–decoder nets, which are susceptible to artifact generation at high resolution (e.g., upsampling striping, JPEG blocks).

A new AR dataset of 327 real-world composite images, annotated with binary masks and spanning diverse lighting, objects, and conditions, is provided. Data gathering uses a minimally modified ARCore pipeline to capture camera, rendered object, and precise mask. Full source code, data, and capture tools are released at https://github.com/maria-larchenko/mkl-harmonizer, providing a platform for future AR harmonization studies free of synthetic dataset biases (Larchenko et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Monge-Kantorovich Transport Map.