Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning (2004.01608v3)

Published 3 Apr 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

PDF Abstract

Overview of Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

The paper introduces a method for optimizing the Traveling Salesperson Problem (TSP) utilizing deep reinforcement learning to learn a 2-opt improvement heuristic. The authors scrutinize the limitations of traditional approaches that heavily depend on manually designed heuristics and establish a data-driven approach that leverages deep learning to autonomously discover efficient solutions.

Background and Motivation

The TSP, characterized as NP-hard, has been a benchmark problem for evaluating combinatorial optimization methods. Conventional algorithms like exact solvers or heuristic-based algorithms (e.g., Lin-Kernighan, LKH) have been utilized extensively, yet they demand specialized knowledge and often underperform due to rigid algorithmic design constraints when handling larger or more complex instances. Recent trends leverage machine learning to learn construction and improvement heuristics; however, these often necessitate additional optimization procedures to reach competitive solution quality.

Proposed Method

The authors propose a novel framework using a deep reinforcement learning algorithm based on policy gradients to learn a stochastic policy for 2-opt moves. Their method involves a neural network architecture empowered by a pointer attention mechanism, addressing the complexity limitations found in previous methods. The architecture enables interaction with the TSP through Markov Decision Process modeling, forecasting next states in optimization trajectories. The network consists of graph convolution layers and recurrent neural network units for more efficient embeddings of tour sequences, encoding critical graph topology and sequential tour properties.

Experimental Setup

The authors conduct experiments handling instances of the TSP with 20, 50, and 100 nodes, showcasing the trained model's ability to produce high-quality solutions from initial random solutions, outperforming prevalent deep learning methods like GAT and GAT-T on both performance and sample efficiency metrics. The evaluation was done against classical and contemporary algorithms inclusive of OR-Tools and Concorde to provide a comprehensive landscape of the method's efficacy.

Results and Discussion

Strong numerical evidence indicates that the proposed method yields solutions with optimality gaps as low as 0.00% for TSP20, 0.12% for TSP50, and 0.87% for TSP100 over 2,000 sampling steps. Notably, this contrasts favorably against existing methods which require many more samples to achieve comparable or slightly inferior performance. Absent the need for domain-specific heuristic guidance, this finding highlights the model's robustness and adaptability.

Implications and Future Directions

The research advances the field of combinatorial optimization by illustrating the potential for deep reinforcement learning to automatize the discovery and refinement of heuristic strategies, potentially enabling broader application to various optimization problems beyond TSP. Future directions may include scaling the framework for larger instances, diversifying to other NP-hard problems, or integrating into hybrid systems combining classical algorithmic techniques and learning-based approaches for enhanced efficiency and practicality of combinatorial problem-solving.

Overall, this paper represents a significant step toward leveraging artificial intelligence in scenarios where adaptability and scalability are crucial, aligning with current trends to decrease human dependency in heuristic design.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Paulo R. de O. da Costa (4 papers)
Jason Rhuggenaath (3 papers)
Yingqian Zhang (30 papers)
Alp Akcay (5 papers)

Citations (113)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - paulorocosta/learning-2opt-drl: Learning 2-opt Heuristics for the TSP via Deep Reinforcement Learning (53 stars)