Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (2106.01798v2)

Published 3 Jun 2021 in cs.LG and cs.AI

Abstract: Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.

Citations (80)

View on Semantic Scholar

Summary

The paper introduces I-MLE, which efficiently computes gradients for discrete exponential family distributions, enabling end-to-end neural network training.
It leverages a novel Sum-of-Gamma noise distribution for effective perturb-and-MAP sampling, ensuring accurate gradient estimation.
The framework demonstrates superior performance in combinatorial optimization tasks while reducing reliance on problem-specific smooth relaxations.

Summary of Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

The paper "Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions" introduces a framework named Implicit Maximum Likelihood Estimation (I-MLE). This framework addresses the challenge of integrating discrete probability distributions with neural networks and other combinatorial optimization components. Traditional approaches often require problem-specific smooth relaxations, which are cumbersome and reduce general applicability. I-MLE advances this field by proposing a generalized, end-to-end learning strategy that efficiently computes gradients for models incorporating discrete elements.

Key Contributions:

General-Purpose Gradient Estimator: I-MLE facilitates the computation of gradients with respect to parameters of discrete exponential family distributions by constructing a target distribution that mitigates the model loss. This is especially beneficial for complex distributions where exact marginal computations are computationally expensive.
Noise Distributions for Perturb-and-MAP: The authors introduce a novel class of noise distributions, termed Sum-of-Gamma distributions, tailored for perturb-and-MAP sampling. This advancement permits more accurate gradient estimates by ensuring the noise perturbations of parameter weights follow a Gumbel distribution effectively.
Application to Combinatorial Optimization: The framework is applicable to black-box combinatorial solvers, demonstrating I-MLE's utility in differentiating through combinatorial optimization problems, such as integer linear programs. It leverages the structured nature of these problems to facilitate gradient computation without extensive reliance on continuous relaxations.
Experimental Validation: Extensive empirical experiments confirm that I-MLE is competitive with and occasionally surpasses existing methods, such as score-function and straight-through estimators, particularly in modeling tasks requiring discrete outputs during both training and inference phases.

Implications and Future Directions:

The framework proposed in this paper promises implications for the design and training of neural network architectures that encompass discrete and symbolic reasoning components. This is particularly relevant for applications in decision-making, explanation generation, and other areas where symbolic representation intersects with neural processing.

One theoretically appealing aspect is I-MLE's approach to target distribution construction using perturbation-based implicit differentiation, thus bypassing the need to derive complex problem-specific relaxations. Furthermore, the technique's grounding in optimization theory ensures a broadened spectrum of applicability, making it attractive for complex decision-making systems that rely on a robust understanding of discrete structures.

Practical implementations of I-MLE can significantly benefit from its flexibility and ease of integration into existing deep learning pipelines, as evidenced by its performance across a range of experimental benchmarks, including relational reasoning and variational autoencoders with discrete latent variables.

In conclusion, I-MLE offers a promising direction for future explorations in AI, suggesting avenues for further research into adaptive parameter tuning strategies and extending the noise distribution family for broader classes of problems. The framework's ability to harness the strengths of both explicit and implicit learning paradigms positions it as a pivotal tool for advancing neural-symbolic computation.

PDF Markdown

Related Papers

Tweets

https://twitter.com/PMinervini/status/1881124505931944039

YouTube

Show All Videos