Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers (2212.11498v3)

Published 22 Dec 2022 in cs.LG, cs.AI, cs.MA, and cs.RO

Abstract: We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. C. G. Petersen and R. W. Schmenner, “An evaluation of routing and volume-based storage policies in an order picking operation,” Decision Sciences, vol. 30, no. 2, pp. 481–501, 1999.
  2. J. Drury, “Towards more efficient order picking,” IMM monograph, vol. 1, no. 1, pp. 1–69, 1988.
  3. P. R. Wurman, R. D’Andrea, and M. Mountz, “Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses,” AI Magazine, vol. 29, no. 1, p. 9, 2008. [Online]. Available: https://ojs.aaai.org/index.php/aimagazine/article/view/2082
  4. R. De Koster, T. Le-Duc, and K. J. Roodbergen, “Design and control of warehouse order picking: A literature review,” European journal of operational research, vol. 182, no. 2, pp. 481–501, 2007.
  5. T. Bonkenburg, “Robotics in logistics,” Deutsche Post DHL Group, Tech. Rep., 2016. [Online]. Available: https://www.thehive-network.com/wp-content/uploads/2017/03/DHL_RoboticsInLogistics.pdf
  6. K. Azadeh, D. Roy, and M. De Koster, “Dynamic human-robot collaborative picking strategies,” Available at SSRN 3585396, 2020.
  7. M. Löffler, N. Boysen, and M. Schneider, “Picker routing in agv-assisted order picking systems,” INFORMS Journal on Computing, vol. 34, no. 1, pp. 440–462, 2022.
  8. I. Žulj, H. Salewski, D. Goeke, and M. Schneider, “Order batching and batch sequencing in an amr-assisted picker-to-parts system,” European Journal of Operational Research, vol. 298, no. 1, pp. 182–201, 2022.
  9. M. G. Bellemare, S. Candido, P. S. Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang, “Autonomous navigation of stratospheric balloons using reinforcement learning,” Nature, vol. 588, no. 7836, pp. 77–82, 2020.
  10. J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de Las Casas et al., “Magnetic control of tokamak plasmas through deep reinforcement learning,” Nature, vol. 602, no. 7897, pp. 414–419, 2022.
  11. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2016, pp. 1928–1937.
  12. G. Papoudakis, F. Christianos, L. Schäfer, and S. V. Albrecht, “Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021. [Online]. Available: http://arxiv.org/abs/2006.07869
  13. J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers 16.   Springer, 2017, pp. 66–83.
  14. F. Christianos, G. Papoudakis, M. A. Rahman, and S. V. Albrecht, “Scaling multi-agent reinforcement learning with selective parameter sharing,” in International Conference on Machine Learning.   PMLR, 2021, pp. 1989–1998.
  15. F. Christianos, L. Schäfer, and S. Albrecht, “Shared experience actor-critic for multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, vol. 33.   Curran Associates, Inc., 2020, pp. 10 707–10 717. [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/7967cc8e3ab559e68cc944c44b1cf3e8-Paper.pdf
  16. H. Ma, J. Li, T. K. S. Kumar, and S. Koenig, “Lifelong Multi-Agent Path Finding for Online Pickup and Delivery Tasks,” in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017.
  17. H. Ma, W. Hönig, T. S. Kumar, N. Ayanian, and S. Koenig, “Lifelong path planning with kinematic constraints for multi-agent pickup and delivery,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 7651–7658.
  18. Q. Xu, J. Li, S. Koenig, and H. Ma, “Multi-goal multi-agent pickup and delivery,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.
  19. N. Greshler, O. Gordon, O. Salzman, and N. Shimkin, “Cooperative multi-agent path finding: Beyond path planning and collision avoidance,” in 2021 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), 2021, pp. 20–28.
  20. G. Papoudakis, F. Christianos, A. Rahman, and S. V. Albrecht, “Dealing with non-stationarity in multi-agent deep reinforcement learning,” 2019.
  21. J.-B. Kim, H.-B. Choi, G.-Y. Hwang, K. Kim, Y.-G. Hong, and Y.-H. Han, “Sortation Control Using Multi-Agent Deep Reinforcement Learning in N-Grid Sortation System,” Sensors, vol. 20, no. 12, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/12/3401
  22. Y. Xiao, J. Hoffman, and C. Amato, “Macro-action-based deep multi-agent reinforcement learning,” in Proceedings of the Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 100.   PMLR, 30 Oct–01 Nov 2020, pp. 1146–1161. [Online]. Available: https://proceedings.mlr.press/v100/xiao20a.html
  23. S. Ahilan and P. Dayan, “Feudal multi-agent hierarchies for cooperative reinforcement learning,” arXiv preprint arXiv:1901.08492, 2019.
  24. P. Dayan and G. E. Hinton, “Feudal reinforcement learning,” Advances in Neural Information Processing Systems, vol. 5, 1992.
  25. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv, vol. abs/1606.01540, 2016.
  26. E. A. Hansen, D. S. Bernstein, and S. Zilberstein, “Dynamic programming for partially observable stochastic games,” in AAAI Conference on Artificial Intelligence, vol. 4, 2004, pp. 709–715.
  27. E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, no. 1, pp. 269–271, 1959.
  28. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  29. R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  30. W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: a survey,” in 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 737–744.
Citations (13)

Summary

We haven't generated a summary for this paper yet.