Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods (2404.16721v1)
Abstract: This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Multi-task learning for continuous control. arXiv preprint arXiv:1802.01034, 2018.
- Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
- X. Bresson and T. Laurent. The transformer network for the traveling salesman problem. arXiv preprint arXiv:2103.03012, 2021.
- Learning by cheating. In Conference on Robot Learning, pages 66–75. PMLR, 2020.
- Heterogeneous, multiple depot multi-uav path planning for remote sensing tasks. In 2018 AIAA Information Systems-AIAA Infotech@ Aerospace, page 0894. 2018.
- Mix & match agent curricula for reinforcement learning. In International Conference on Machine Learning, pages 1087–1095. PMLR, 2018.
- Distilling policy distillation. In The 22nd international conference on artificial intelligence and statistics, pages 1331–1340. PMLR, 2019.
- Compact trilinear interaction for visual question answering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 392–401, 2019.
- Deep drone acrobatics. RSS: Robotics, Science, and Systems, 2020.
- Modality distillation with multiple stream networks for action recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pages 103–118, 2018.
- K. Helsgaun. An effective implementation of the lin–kernighan traveling salesman heuristic. European journal of operational research, 126(1):106–130, 2000.
- Deep q-learning from demonstrations. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- J. Ho and S. Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Towards multi-modal perception-based navigation: A deep reinforcement learning method. IEEE Robotics and Automation Letters, 6(3):4986–4993, 2021.
- Dubins traveling salesman problem with neighborhoods: A graph-based approach. Algorithms, 6(1):84–99, 2013.
- Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018.
- Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33:21188–21198, 2020.
- Deep learning under privileged information using heteroscedastic dropout. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8886–8895, 2018.
- Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Structured knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2604–2613, 2019.
- Unifying distillation and privileged information. arXiv preprint arXiv:1511.03643, 2015.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Dipcan: Distilling privileged information for crowd-aware navigation. Robotics: Science and Systems (RSS) XVIII, 2022.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018.
- Sampling-based path planning for a visual reconnaissance unmanned air vehicle. Journal of Guidance, Control, and Dynamics, 35(2):619–631, 2012.
- Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 37(4):1–14, 2018.
- J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4):682–697, 2008.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- S. Ross and D. Bagnell. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 661–668. JMLR Workshop and Conference Proceedings, 2010.
- Policy distillation. arXiv preprint arXiv:1511.06295, 2015.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- S. Schaal. Learning from demonstration. Advances in neural information processing systems, 9, 1996.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Learning to navigate sidewalks in outdoor environments. IEEE Robotics and Automation Letters, 7(2):3906–3913, 2022.
- K. Sundar and S. Rathinam. Algorithms for heterogeneous, multiple depot, multiple unmanned vehicle path planning problems. Journal of Intelligent & Robotic Systems, 88:513–526, 2017.
- Distral: Robust multitask reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Visual navigation among humans with optimal control as a supervisor. IEEE Robotics and Automation Letters, 6(2):2288–2295, 2021.
- Jump-start reinforcement learning. In International Conference on Machine Learning, pages 34556–34583. PMLR, 2023.
- V. Vapnik and A. Vashist. A new learning paradigm: Learning using privileged information. Neural networks, 22(5-6):544–557, 2009.
- Learning using privileged information: similarity control and knowledge transfer. J. Mach. Learn. Res., 16(1):2023–2049, 2015.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
- Pointer networks. Advances in neural information processing systems, 28, 2015.
- Kdgan: Knowledge distillation with generative adversarial networks. Advances in neural information processing systems, 31, 2018.
- Adversarial distillation for learning with privileged provisions. IEEE transactions on pattern analysis and machine intelligence, 43(3):786–797, 2019.