Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Reinforcement Learning for Traveling Purchaser Problems (2404.02476v6)

Published 3 Apr 2024 in math.OC, cs.AI, and cs.LG

Abstract: The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant advantage of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, by leveraging DRL, we can train the policy network towards optimizing the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Burstall RM (1966) A heuristic method for a job-scheduling problem. Journal of the Operational Research Society 17:291–304.
  2. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780.
  3. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning.
  4. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. International Conference on Learning Representations.
  5. Munkres J (1957) Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5(1):32–38.
  6. Ong HL (1982) Approximate algorithms for the travelling purchaser problem. Operations Research Letters 1(5):201–205.
  7. Palomo-Martínez PJ, Salazar-Aguilar MA (2019) The bi-objective traveling purchaser problem with deliveries. European Journal of Operational Research 273(2):608–622.
  8. Pearn WL, Chien RC (1998) Improved solutions for the traveling purchaser problem. Computers & Operations Research 25(11):879–885.
  9. Ramesh T (1981) Travelling purchaser problem. Opsearch 18(2):78–91.
  10. Riera-Ledesma J (2012) TPPLIB. URL https://jriera.webs.ull.es/TPP.htm/.
  11. Riera-Ledesma J, González JJS (2006) Solving the asymmetric traveling purchaser problem. Annals of Operations Research 144(1):83–97.
  12. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8:229–256.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets