A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization (2406.01661v2)

Published 3 Jun 2024 in cs.LG, cs.AI, cs.DM, and stat.ML

Abstract: Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based approaches rely primarily on generative models that yield exact sample likelihoods. This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models like diffusion models. Our approach is conceptually based on a loss that upper bounds the reverse Kullback-Leibler divergence and evades the requirement of exact sample likelihoods. We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.

Citations (1)

View on Semantic Scholar

Summary

The paper presents DiffUCO, a novel diffusion model framework that enables unsupervised combinatorial optimization by bounding the reverse KL divergence.
It employs a forward diffusion process to add noise and a neural network-based reverse process to progressively denoise, effectively solving discrete optimization problems.
Empirical results on tasks like Maximum Independent Set, Minimum Dominating Set, MaxClique, and MaxCut demonstrate competitive performance against state-of-the-art methods.

A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization

Overview

The paper "A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization" by Sebastian Sanokowski, Sepp Hochreiter, and Sebastian Lehner, addresses the challenge of sampling from intractable distributions over discrete sets without leveraging training data. This problem is critical in fields such as Combinatorial Optimization (CO), prediction of molecular structures, lattice models in physics, and Monte Carlo integration. Traditional approaches in CO have predominantly utilized generative models dependent on exact sample likelihoods. This paper breaks free from that restriction, proposing a novel approach that employs diffusion models, which are highly expressive latent variable models.

Methodology

The authors introduce a framework based on diffusion models, named Diffusion for Unsupervised Combinatorial Optimization (DiffUCO). The crux of DiffUCO is rooted in constructing a loss function that upper bounds the reverse Kullback-Leibler (KL) divergence, which circumvents the necessity for exact sample likelihoods. The forward diffusion process transforms the problem instance into a stationary distribution by sequentially adding noise, while the reverse process, modeled by a neural network, removes this noise step-by-step to generate solutions.

The training of DiffUCO is driven by minimizing the reverse joint KL divergence, which simplifies to an upper bound on the reverse KL divergence of the model’s marginal distribution against the target distribution. The loss function integrates the energy function of CO problems, coupling between the forward and reverse diffusion processes, and an entropy regularizer.

Numerical Results

The experimental validation conducted shows compelling results against state-of-the-art methods on various benchmark problems:

Maximum Independent Set (MIS) on RB datasets:
- DiffUCO with Conditional Expectation (CE) yields average independent set sizes of $19.24 \pm 0.05$ on RB-small and $38.87 \pm 0.13$ on RB-large, outperforming previous unsupervised learning methods.
Minimum Dominating Set (MDS) on BA datasets:
- DiffUCO achieves set sizes of $28.20 \pm 0.09$ on BA-small and $106.61 \pm 0.31$ on BA-large, showcasing superior performance.
Maximum Clique (MaxCl) and Maximum Cut (MaxCut):
- On RB-small for MaxCl, DiffUCO CE-ST achieves a set size of $16.30 \pm 0.08$ , proving competitive against other methods.
- For MaxCut on large BA graphs, DiffUCO with CE even surpasses the performance of Gurobi within a stringent time limit, achieving cut sizes of $2947.53 \pm 1.49$ .

Practical and Theoretical Implications

The approach presents several noteworthy implications:

Expressivity and Flexibility: By leveraging diffusion models, DiffUCO brings in a level of expressivity that exact likelihood models, such as autoregressive or mean-field models, could not achieve.
Scalability: The method demonstrates scalability in both problem size and difficulty, maintaining competitive performance even on hard combinatorial problems like those generated by the RB model.
Unsupervised Learning Paradigm: DiffUCO proves the viability of unsupervised learning paradigms in CO, reducing the dependency on labeled data which is often expensive or impractical to obtain.
Generalization Capability: The approach generalizes well across different types of CO problems, indicating broad applicability in various scientific and industrial fields.

Future Directions

The paper highlights promising directions for further research:

Efficient Training and Inference: While the results are robust, exploring lighter-weight diffusion processes or hybrid schemes could reduce computational costs.
Broader Applicability: Extending the framework to continuous optimization problems and integrating with hybrid classical-quantum solvers could widen its scope.
Enhanced Variational Methods: Further refining variational bounds and exploring alternative divergence measures within the diffusion model framework might yield even better performance.

Conclusion

The introduced diffusion model framework for unsupervised neural combinatorial optimization marks a significant advancement in the CO landscape. DiffUCO not only enhances solution quality and computational efficiency but also broadens the applicability of unsupervised learning methods in optimization problems. The results, grounded in comprehensive empirical validation, demonstrate its potential as a robust tool for tackling complex combinatorial challenges without the crutch of exact sample likelihoods. This framework is poised to inspire further innovations in the field of neural probabilistic optimization and beyond.

Related Papers

Tweets

https://twitter.com/SebSanokowski/status/1798335563206177039

https://twitter.com/SebSanokowski/status/1891908398876082570

https://twitter.com/math_papers/status/1798366053912949033