Regression via Implicit Models and Optimal Transport Cost Minimization (2003.01296v1)

Published 3 Mar 2020 in cs.LG and stat.ML

Abstract: This paper addresses the classic problem of regression, which involves the inductive learning of a map, $y=f(x,z)$, $z$ denoting noise, $f:\mathbb{R}^n\times \mathbb{R}^k \rightarrow \mathbb{R}^m$. Recently, Conditional GAN (CGAN) has been applied for regression and has shown to be advantageous over the other standard approaches like Gaussian Process Regression, given its ability to implicitly model complex noise forms. However, the current CGAN implementation for regression uses the classical generator-discriminator architecture with the minimax optimization approach, which is notorious for being difficult to train due to issues like training instability or failure to converge. In this paper, we take another step towards regression models that implicitly model the noise, and propose a solution which directly optimizes the optimal transport cost between the true probability distribution $p(y|x)$ and the estimated distribution $\hat{p}(y|x)$ and does not suffer from the issues associated with the minimax approach. On a variety of synthetic and real-world datasets, our proposed solution achieves state-of-the-art results. The code accompanying this paper is available at "https://github.com/gurdaspuriya/ot_regression".

Citations (1)

View on Semantic Scholar

Summary

The paper introduces an implicit regression framework that minimizes optimal transport cost to capture complex noise distributions without explicit parametric assumptions.
It employs a linear assignment formulation to compute tractable gradients, effectively bypassing the instability of minimax optimization seen in CGAN methods.
Experimental results on synthetic and real datasets show significant improvements in NLPD, MAE, and MSE, with a sparse variant reducing computation time by up to 10×.

The paper introduces a methodology for regression that circumvents the limitations of explicit noise modeling and minimax optimization frequently encountered in recent Conditional GAN (CGAN) approaches. The framework models the stochastic regression problem defined by

$y = f(x, z),$

where $x \in \mathbb{R}^n$ represents the input, $y \in \mathbb{R}^m$ the output, and $z \in \mathbb{R}^k$ the latent noise. Rather than assuming additive Gaussian or any parametric noise structure, the approach implicitly models the noise by minimizing an optimal transport cost between the empirical distributions of the true data and the model predictions.

Core Methodology

Optimal Transport Cost Minimization:

The key formulation estimates the distance between the true conditional distribution $p(y|x)$ and the model’s predicted distribution via a primal formulation of the Wasserstein distance. By explicitly computing the optimal transport cost over sample pairs drawn from these distributions, the method bypasses the instabilities associated with minimax optimization typical of GAN-based frameworks. The cost function is defined as a weighted $L_p$ distance between the paired samples:

$c(a,b) = \sum_{i=1}^m |x_{a,i} - x_{b,i}|^p + \lambda \sum_{i=1}^m |y_{a,i} - y_{b,i}|,$

where $\lambda$ controls the relative contribution of the input and output spaces. - $x_{a,i}$ and $x_{b,i}$ denote the $i$ -th components for the input features of samples $a$ and $b$ . - $y_{a,i}$ and $y_{b,i}$ denote the corresponding output components. - $p \geq 1$ is the norm degree.

Linear Assignment Problem:

The optimal transport problem is cast as a linear assignment problem, where the optimal transport plan is a permutation matrix mapping samples from the true distribution to those from the predicted distribution. This formulation renders the gradient with respect to the model parameters tractable. The gradient is computed as:

$\nabla_\theta C^* \approx \sum_{a\in R}\sum_{b\in F} M^*_{a,b} \nabla_\theta c(a,b),$

where $M^*$ is the optimal transport plan linking real and generated samples. This avoids the oscillatory behavior and vanishing gradients of minimax formulations.

Computational Considerations

Dense vs. Sparse Assignment:

Two variants are proposed:

RE-DLA (REgression via Dense Linear Assignment):

Uses the full cost matrix computed over all samples, ensuring precise matching at the expense of higher computational cost.
RE-SLA (REgression via Sparse Linear Assignment):

Introduces a sparsification strategy by restricting the assignment to a fixed number of nearest neighbors in the $x$ -space. This approximation significantly reduces computational overhead by cutting down both the cost matrix computation and the linear assignment solver's runtime. In experiments, RE-SLA demonstrated up to a 10× reduction in transport plan computation time for larger sample-sizes, making it computationally attractive for larger mini-batches and complex distributions.

Experimental Evaluation

Synthetic Experiments:

The method was evaluated on several synthetic datasets characterized by different noise regimes:

A sinusoidal function with additive Gaussian noise where standard methods such as Gaussian Process Regression (GPR) suffice, but implicit noise modeling methods remained competitive.
A linear function with additive exponential noise where the asymmetric noise structure was effectively captured. Notably, both RE-DLA and RE-SLA achieved significantly lower Negative Log Predictive Density (NLPD) values compared to CGAN.
Heteroscedastic noise scenarios (noise magnitude dependent on $x$ $x$ ) and multi-modal output distributions, where the proposed method generated sample clouds that more accurately tracked the true conditional distributions, yielding statistically significant improvements in NLPD, Mean Absolute Error (MAE), and Mean Squared Error (MSE) over baseline methods.
- Real-World Datasets:

The approach was benchmarked on multiple datasets, including those from standard repositories, where it consistently outperformed or matched state-of-the-art methods like CGAN, GPR, Deep Neural Networks (DNNs), and eXtreme Gradient BOOSTing (XGBoost). The gains in NLPD were particularly notable, underscoring the method’s capacity for effectively capturing the complexities of the true conditional distributions without resorting to explicit parametric assumptions.

Statistical Analysis:

Rigorous statistical comparisons using Friedman tests with Nemenyi post-hoc analysis confirmed that the improvements achieved by both RE-DLA and RE-SLA over more classical approaches were statistically significant across multiple performance metrics.

Conclusion

The proposed implicit regression framework presents a principled alternative to conventional noise assumptions by leveraging optimal transport cost minimization. With a formulation that sidesteps the difficulties associated with minimax optimization and allows for robust gradients, the framework achieves strong numerical performance across synthetic and real-world tasks. The sparse variant, in particular, provides a scalable solution without significant loss in predictive performance, making it an effective tool for regression tasks in settings with complex, non-standard noise distributions.

The methodology holds promise for extension to multi-dimensional output spaces and further enhancements, such as incorporating automatic relevance determination for each component of the inputs and outputs.

PDF Markdown

Regression via Implicit Models and Optimal Transport Cost Minimization (2003.01296v1)

Summary

Related Papers