Reinforcement Learning for Combinatorial Optimization: A Survey (2003.03600v3)

Published 7 Mar 2020 in cs.LG, math.CO, math.OC, and stat.ML

Abstract: Many traditional algorithms for solving combinatorial optimization problems involve using hand-crafted heuristics that sequentially construct a solution. Such heuristics are designed by domain experts and may often be suboptimal due to the hard nature of the problems. Reinforcement learning (RL) proposes a good alternative to automate the search of these heuristics by training an agent in a supervised or self-supervised manner. In this survey, we explore the recent advancements of applying RL frameworks to hard combinatorial problems. Our survey provides the necessary background for operations research and machine learning communities and showcases the works that are moving the field forward. We juxtapose recently proposed RL methods, laying out the timeline of the improvements for each problem, as well as we make a comparison with traditional algorithms, indicating that RL models can become a promising direction for solving combinatorial problems.

View on arXiv

Authors (4)

Nina Mazyavkina (1 paper)
Sergey Sviridov (1 paper)
Sergei Ivanov (44 papers)
Evgeny Burnaev (189 papers)

Citations (515)

View on Semantic Scholar

Summary

Reinforcement Learning for Combinatorial Optimization

The paper "Reinforcement Learning for Combinatorial Optimization: A Survey" by Mazyavkina et al. systematically reviews the application of reinforcement learning (RL) methodologies to combinatorial optimization (CO) problems. This survey endeavors to bridge gaps between operations research and machine learning communities by exploring the intersection where RL models are applied to hard combinatorial problems.

Context and Objectives

Combinatorial optimization problems involve finding an optimal object from a finite set of objects, often characterized by NP-hardness. Traditional approaches rely on heuristics curated by experts, which may not always yield optimal solutions due to the complex nature of these problems. The surveyed paper focuses on leveraging RL to automate the generation of these heuristics and demonstrate how RL can potentially outperform traditional methods.

Key Highlights

The survey offers a comprehensive overview of diverse RL techniques applied to several canonical CO problems: the Traveling Salesman Problem (TSP), Maximum Cut (Max-Cut), Bin Packing Problem (BPP), Minimum Vertex Cover (MVC), and Maximum Independent Set (MIS). It categorizes RL approaches across different paradigms, such as policy-based methods, value-based methods, and model-based methods like Monte Carlo Tree Search (MCTS).

Reinforcement Learning Approaches

The survey dives into critical RL algorithms employed in CO:

Value-Based Methods: This includes algorithms like Deep Q-Networks (DQN) that approximate the value of actions and optimize using Q-learning principles.
Policy-Based Methods: Involves direct optimization of policy functions, leveraging techniques like REINFORCE or actor-critic approaches to maximize expected rewards.
Model-Based Methods: Tools like MCTS and AlphaZero, which utilize the known or learned transition models for planning and decision-making.

An architectural focus is also provided on neural networks employed to encode problem states, ranging from recurrent neural networks and attention mechanisms to graph neural networks (GNNs), demonstrating adaptability of RL to structural data.

Numerical Results and Insights

The survey compares several RL approaches to traditional heuristics using benchmarks like average tour length for TSP and CVRP. Methods such as those proposed by Kool et al., leveraging transformer models, demonstrate competitive or superior performance to classic approaches, maintaining efficiency in solution quality and runtime. Works like the Active Search method by Bello et al. show the potential for RL models to generalize across diverse instances and graph sizes.

Theoretical and Practical Implications

By exhibiting RL's capability in solving CO problems, the survey suggests substantial implications:

Practical: Enhancing industrial process optimization, routing, and scheduling via RL-based models can result in significant efficiency gains and cost reductions.
Theoretical: The potential crossover of RL with combinatorial optimization opens avenues for hybrid models that combine RL with existing exact methods, providing enhanced solvers for NP-hard problems.

Future Directions

The survey proposes several future research directions:

Generalization Across Problems: Developing RL models capable of adapting to different CO problems, reducing the need for problem-specific adjustments.
Improving Solution Quality and Computational Performance: Pursuing ways to ensure RL solutions maintain robustness, especially for larger problem instances.
Joint Exploration of Methods: Integrating RL with existing solver strategies, such as branch-and-bound, to refine solution spaces and heuristics dynamically.

Conclusion

This paper offers a vital contribution to the field of combinatorial optimization by illustrating the burgeoning role of reinforcement learning. It provides a structured perspective on how RL can transform heuristic development and presents a convincing case for further exploration and integration into mainstream optimization processes.

PDF Markdown

Related Papers

Find Related Papers