Learning Latent Permutations with Gumbel-Sinkhorn Networks

Published 23 Feb 2018 in stat.ML and cs.LG | (1802.08665v1)

Abstract: Permutations and matchings are core building blocks in a variety of latent variable models, as they allow us to align, canonicalize, and sort data. Learning in such models is difficult, however, because exact marginalization over these combinatorial objects is intractable. In response, this paper introduces a collection of new methods for end-to-end learning in such models that approximate discrete maximum-weight matching using the continuous Sinkhorn operator. Sinkhorn iteration is attractive because it functions as a simple, easy-to-implement analog of the softmax operator. With this, we can define the Gumbel-Sinkhorn method, an extension of the Gumbel-Softmax method (Jang et al. 2016, Maddison2016 et al. 2016) to distributions over latent matchings. We demonstrate the effectiveness of our method by outperforming competitive baselines on a range of qualitatively different tasks: sorting numbers, solving jigsaw puzzles, and identifying neural signals in worms.

Abstract PDF Upgrade to Chat

Citations (248)

View on Semantic Scholar

Summary

The paper introduces the Gumbel-Sinkhorn method, a continuous relaxation technique for differentiable permutation learning.
The authors establish a theoretical link between permutation matrices and doubly stochastic matrices via the Sinkhorn operator.
Empirical results demonstrate the approach's effectiveness in tasks such as number sorting, jigsaw puzzle solving, and neural signal matching.

Learning Latent Permutations with Gumbel-Sinkhorn Networks: An Overview

The paper "Learning Latent Permutations with Gumbel-Sinkhorn Networks" presents an approach to learning models involving latent permutations by exploiting continuous relaxations through the Sinkhorn operator. The authors aim to address the intractability of marginalizing over permutations by extending the Gumbel-Softmax methodology suited for discrete latent variables to permutations, resulting in the Gumbel-Sinkhorn approach.

Key Contributions

Introduction of Gumbel-Sinkhorn Method: The authors build upon the Gumbel-Softmax trick for categorical distributions by proposing the Gumbel-Sinkhorn method for permutation matrices. The use of the continuous Sinkhorn operator, analogous to the softmax function, allows for end-to-end differentiable models where permutations are involved as latent variables. This approach overcomes the limitations associated with discrete maximum-weight matching problems by leveraging the continuous relaxation of the Sinkhorn operator.
Theoretical Foundations: The paper establishes a formal connection between permutations and doubly stochastic matrices through the Birkhoff polytope, justifying the continuous approximation provided by the Sinkhorn operator. The authors prove that permutation matrices can be approximated by the Sinkhorn operator as the temperature parameter in softmax-like normalization schemes approaches zero, enabling differential computation.
Applicability Across Diverse Tasks: The proposed methodology's efficacy is demonstrated through its application to various tasks that inherently involve permutations: sorting numeric sequences, solving jigsaw puzzles, and matching neural signals in C. elegans worms. Notably, the Gumbel-Sinkhorn networks outperform traditional models and provide a significant advance in processing and understanding combinatorial structures in data.
Probabilistic Inference: The introduction of the Gumbel-Matching and Gumbel-Sinkhorn distributions provides a framework for approximating posterior distributions over permutations, which are vital for learning in latent variable models. This complements existing methods by highlighting the potential of perturbative approaches to handle the permutation distribution setup.

Numerical Results and Claims

The paper reports strong empirical results across different domains and tasks. In number sorting tasks, the proposed networks achieve high accuracy with large input sizes, outperforming previously reported results with more complex recurrent models. In the jigsaw puzzle task, the network effectively solves up to 6x6 puzzles across distinct datasets, including MNIST, Celeba, and Imagenet. These results demonstrate robustness and the model's capability to generalize across diverse settings and structures effectively.

Implications and Future Directions

The implications of these methods extend significantly in both theoretical and practical domains:

The introduction of differentiable permutation matrices offers novel pathways in neural network architecture design, potentially impacting fields requiring structural alignment such as bioinformatics, computer vision, and natural language processing.
Future work could explore expanding on this methodology to other combinatorial optimization problems that present similar challenges as permutations.
The potential for improved scalable inference techniques opens avenues for further exploration into the integration and training of complex models capable of handling high-dimensional latent variable spaces.

In conclusion, "Learning Latent Permutations with Gumbel-Sinkhorn Networks" provides a comprehensive and impactful contribution to the exploration of latent permutations within the neural network framework. The work sets the stage for future development of models that require a delicate balance between combinatorial structure and differentiable computation.

Markdown