Learning Improvement Heuristics for Solving Routing Problems (1912.05784v2)

Published 12 Dec 2019 in cs.AI and cs.LG

Abstract: Recent studies in using deep learning to solve routing problems focus on construction heuristics, the solutions of which are still far from optimality. Improvement heuristics have great potential to narrow this gap by iteratively refining a solution. However, classic improvement heuristics are all guided by hand-crafted rules which may limit their performance. In this paper, we propose a deep reinforcement learning framework to learn the improvement heuristics for routing problems. We design a self-attention based deep architecture as the policy network to guide the selection of next solution. We apply our method to two important routing problems, i.e. travelling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Experiments show that our method outperforms state-of-the-art deep learning based approaches. The learned policies are more effective than the traditional hand-crafted ones, and can be further enhanced by simple diversifying strategies. Moreover, the policies generalize well to different problem sizes, initial solutions and even real-world dataset.

PDF Abstract

Reinforcement Learning for Improved Heuristics in Routing Problems

The paper "Learning Improvement Heuristics for Solving Routing Problems" by Wu et al. addresses the challenge of optimizing combinatorial routing problems like the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). Classic approaches, such as exact and heuristic methods, often fall short due to computational limitations and reliance on domain expertise. This research leverages deep reinforcement learning (RL) to learn improvement heuristics, offering a promising alternative that requires minimal human intervention.

Summary of Contributions

The authors introduce a novel RL framework designed to learn improvement heuristics for routing problems. This is a departure from traditional hand-crafted heuristics, which are reliant on expert intuition and experience. The key contributions as outlined in the paper are:

Self-Attention Policy Network: The research utilizes a self-attention based neural architecture as the policy network. This allows the framework to intelligently select successor solutions, thereby improving the quality of solutions found through heuristic methods.
Generalized Framework: The proposed RL formulation generalizes well across instances of varying sizes, diverse initial solutions, and even real-world datasets. This is particularly significant when there is limited domain knowledge available for specific routing problems.
Empirical Superiority: In experiments conducted on TSP and CVRP, the authors demonstrate that their method surpasses existing deep learning approaches in conventional solution quality metrics. Specifically, they found significant performance improvements over traditional deep learning methods, particularly in terms of optimizing objective values.
Scalability and Flexibility: The results indicate that even with a small step limit during training, the policies perform effectively when tested with larger step limits. Additionally, this scalability ensures that the solutions enhance with additional computational steps.

Numerical Results and Contradictory Claims

The paper provides evidence showing that the proposed RL framework not only reduces the gap to highly-optimized traditional solvers but also significantly outperforms established deep learning models like the pointer network. This claim is fortified with empirical data obtained by testing on both standard synthetic datasets and real-world datasets from TSPlib and CVRPlib. For example, the RL-enhanced strategies reduce computational time and improve solution quality notably—in certain instances achieving up to a 20% superior solution relative to deep learning baselines.

Implications and Future Work

The implications of this research extend beyond theoretical contributions to practical applications in routing and logistics where rapid, efficient, and high-quality solutions are critical. The approach champions a design paradigm shift from traditional heuristics to learned heuristics, minimizing manual intervention and domain knowledge requirements.

The paper opens avenues for several future explorations:

Extension to Other Combinatorial Problems: The framework could potentially apply to other combinatorial optimization problems like scheduling, expanding its applicability beyond routing.
Learning with Multiple Operators: Future work could involve integrating multiple operators within the RL framework, possibly utilizing hierarchical RL to determine both operator and consequent actions in improvement heuristics.
Advanced Search Schemes: While this work employs basic search schemes, it provides a foundation for adapting advanced schemes like simulated annealing or tabu search, potentially increasing solution quality further.

In summary, this research demonstrates the effectiveness of reinforcement learning in learning improvement heuristics for solving NP-hard routing problems, providing an innovative alternative to traditional heuristic approaches, and setting the stage for broader future applications and extensions in the field of combinatorial optimization.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yaoxin Wu (26 papers)
Wen Song (24 papers)
Zhiguang Cao (48 papers)
Jie Zhang (846 papers)
Andrew Lim (26 papers)

Citations (248)

View on Semantic Scholar

Learning Improvement Heuristics for Solving Routing Problems (1912.05784v2)

Reinforcement Learning for Improved Heuristics in Routing Problems

Summary of Contributions

Numerical Results and Contradictory Claims

Implications and Future Work

Related Papers