Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tsallis Regularized Optimal Transport and Ecological Inference (1609.04495v1)

Published 15 Sep 2016 in cs.LG

Abstract: Optimal transport is a powerful framework for computing distances between probability distributions. We unify the two main approaches to optimal transport, namely Monge-Kantorovitch and Sinkhorn-Cuturi, into what we define as Tsallis regularized optimal transport (\trot). \trot~interpolates a rich family of distortions from Wasserstein to Kullback-Leibler, encompassing as well Pearson, Neyman and Hellinger divergences, to name a few. We show that metric properties known for Sinkhorn-Cuturi generalize to \trot, and provide efficient algorithms for finding the optimal transportation plan with formal convergence proofs. We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences. \trot~provides a convenient framework for ecological inference by allowing to compute the joint distribution --- that is, the optimal transportation plan itself --- when side information is available, which is \textit{e.g.} typically what census represents in political science. Experiments on data from the 2012 US presidential elections display the potential of \trot~in delivering a faithful reconstruction of the joint distribution of ethnic groups and voter preferences.

Citations (49)

Summary

  • The paper introduces Tsallis Regularized Optimal Transport (trot), unifying classical OT formulations with entropy-based divergence measures.
  • It proposes efficient algorithmic solutions, including a Second Order Row iteration method and a KL Projected Gradient approach, to optimize trot under different q parameters.
  • Application to ecological inference demonstrates improved joint distribution estimation, validated by empirical analysis using 2012 US presidential election data.

Tsallis Regularized Optimal Transport and Ecological Inference: A Comprehensive Analysis

The paper, titled "Tsallis Regularized Optimal Transport and Ecological Inference," presents an innovative framework combining Tsallis entropy with optimal transport (OT), offering a unified approach that extends the Monge-Kantorovitch and Sinkhorn-Cuturi methods. This paper introduces the Tsallis regularized optimal transport (trot) by interpolating between Wasserstein and various divergence measures, such as Kullback-Leibler, Pearson, Neyman, and Hellinger divergences. The authors demonstrate that trot inherits the metric properties of Sinkhorn-Cuturi regularization and propose efficient algorithms with convergence proofs for solving trot problems.

Key Contributions

  1. Unification of OT Paradigms: The authors advance the state of OT by integrating Tsallis entropies into the framework, thus bridging the computationally intensive Monge-Kantorovitch formulation and the more computationally tractable Sinkhorn-Cuturi algorithm with an entropy regularizer. This integration not only enriches OT's flexibility but also proposes new directions for research and application, including ecological inference.
  2. Efficient Algorithmic Solutions: Two significant algorithmic methodologies are proposed to optimize trot: a Second Order Row iteration approach for cases with q(0,1)q \in (0, 1) and a KL Projected Gradient method for q1q \geq 1. These algorithms are designed to address the computational challenges associated with trot, such as non-Lipschitz condition and the need for scalable solutions.
  3. Application to Ecological Inference: The paper marks the first use of OT in ecological inference—reconstructing joint distributions from given marginals, notably in the context of political science and social sciences. The framework allows for estimating joint distributions when additional information, like census data, is available.

Numerical Results and Experimentation

The paper provides empirical evidence through experiments using data from the 2012 US presidential elections. A variety of cost matrices are constructed to showcase the potential of trot in achieving accurate joint distribution reconstruction compared to traditional methods and simple aggregations like the Florida-average baseline. Notably, trot with certain parameter settings significantly improves upon these baselines, minimizing the average KL-divergence and absolute errors between inferred and true distributions.

Implications and Future Directions

The implications of this research extend both theoretically and practically:

  • Theoretical Expansion: By exploring the use of Tsallis entropies within the OT framework, this paper opens a theoretical avenue for understanding the interplay between various entropy measures and transport distances. This insight could inspire further studies on geometric properties of such interpolated divergences, potentially leading to novel metrics in probability spaces.
  • Practical Applications: The introduction of trot in ecological inference suggests a significant impact on how aggregate data is used for policymaking and socio-political analysis. With increased access to auxiliary information—census data, polls, etc.—trot could significantly improve the precision of inferred distributions, supporting data-driven decision-making in various fields.

Future research might focus on extending these techniques to real-time applications and integrating machine learning models to dynamically estimate cost matrices. Additionally, exploring trot's performance in high-dimensional settings and its application to other domains, such as econometrics or epidemiology, could further enhance its utility.

In conclusion, this paper offers a well-founded expansion of the OT field, enriched by Tsallis entropic regularization, providing substantial contributions to both the theoretical landscape and practical applications in data science and beyond.