Deep Policy Dynamic Programming for Vehicle Routing Problems (2102.11756v2)

Published 23 Feb 2021 in cs.LG and stat.ML

Abstract: Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP), the vehicle routing problem (VRP) and TSP with time windows (TSPTW) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming most other 'neural approaches' for solving TSPs, VRPs and TSPTWs with 100 nodes.

PDF Abstract

Deep Policy Dynamic Programming for Vehicle Routing Problems

The paper "Deep Policy Dynamic Programming for Vehicle Routing Problems" by Wouter Kool et al. presents an investigative approach that integrates the principles of dynamic programming (DP) with deep learning to address the computational challenges associated with large-scale vehicle routing problems (VRPs). This innovative framework, termed Deep Policy Dynamic Programming (DPDP), leverages the predictive capabilities of neural networks to restrict and prioritize the DP state space, thereby enhancing performance and scalability.

Background and Motivation

Routing problems, a subset of combinatorial optimization problems, have extensive practical implications, including logistics and supply chain management. Traditional dynamic programming approaches guarantee optimal solutions but are hindered by poor scalability as the problem size increases. Conversely, end-to-end deep learning methods, though promising, often do not match the performance and reliability of traditional solvers like the Lin-Kernighan-Helsgaun (LKH) algorithm. DPDP emerges as a methodology to capitalize on the complementary strengths of these two paradigms, aiming to combine the rigorous solution guarantees of DP with the heuristic power of deep learning.

Key Methodology

The DPDP framework utilizes a deep neural network to generate a policy that acts as a guide during the DP process. This policy is trained on example solutions to predict promising edges within the graph structures of routing problems. The approach is validated on the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the TSP with time windows (TSPTW).

The neural network outputs a heatmap indicating the likelihood of selecting specific edges based on their inclusion in optimal routing solutions. This heatmap is utilized to focus the DP algorithm on the most promising areas of the search space, thereby reducing computational overhead and improving scalability.

Experimental Results

The authors present robust evidence on the efficacy of DPDP through various experiments:

TSP with 100 Nodes: DPDP demonstrates a competitive performance against traditional methodologies like Concorde and LKH, achieving near-optimal solutions faster than many neural approaches.
CVRP with Varied Instances: The method outperforms most neural network-based methods, and, when scaled to larger beam sizes, approaches the performance of LKH, albeit at a higher computational cost.
TSPTW: The results indicate that DPDP can handle hard constraints effectively, delivering solutions that outperform leading approaches in both quality and computational efficiency.

Implications and Future Directions

The introduction of DPDP has several significant implications. Practically, it provides a scalable and adaptive approach to handle large-scale VRPs with dynamic and complex constraints that traditional algorithms struggle with. Theoretically, it showcases the potential of integrating neural network-based policies within classical optimization frameworks to overcome the limitations of both approaches individually.

Future research could explore refining the heat + potential scoring function used in DPDP, exploring end-to-end learning strategies to tune it more accurately. Additionally, extending this framework to other complex combinatorial optimization problems, such as job shop scheduling, holds potential. Another avenue is addressing the scalability challenges by exploring alternative neural architectures or dimensionality reduction techniques that maintain predictive accuracy while decreasing computational demands.

In summary, this paper significantly contributes to the discussion on hybridizing deep learning with classical optimization paradigms and highlights a clear path for advancing the state-of-the-art in solving combinatorially complex routing problems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Wouter Kool (8 papers)
Herke van Hoof (38 papers)
Joaquim Gromicho (1 paper)
Max Welling (202 papers)

Citations (106)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos