Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations (2305.13030v4)
Abstract: Modeling of real-world biological multi-agents is a fundamental problem in various scientific and engineering fields. Reinforcement learning (RL) is a powerful framework to generate flexible and diverse behaviors in cyberspace; however, when modeling real-world biological multi-agents, there is a domain gap between behaviors in the source (i.e., real-world data) and the target (i.e., cyberspace for RL), and the source environment parameters are usually unknown. In this paper, we propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios. We adopt an approach that combines RL and supervised learning by selecting actions of demonstrations in RL based on the minimum distance of dynamic time warping for utilizing the information of the unknown source dynamics. This approach can be easily applied to many existing neural network architectures and provide us with an RL model balanced between reproducibility as imitation and generalization ability to obtain rewards in cyberspace. In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the reproducibility and the generalization ability compared with the baselines. In particular, we used the tracking data of professional football players as expert demonstrations in football and show successful performances despite the larger gap between behaviors in the source and target environments than the chase-and-escape task.
- Dynamic inverse reinforcement learning for characterizing animal behavior. Advances in Neural Information Processing Systems, 35.
- Vector-based navigation using grid-like representations in artificial agents. Nature, 557(7705):429–433.
- Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology, 218(1):1–11.
- A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64:645–703.
- Deep reinforcement learning in a racket sport for player evaluation with technical and tactical contexts. IEEE Access, 10:54764–54772.
- Seed rl: Scalable and efficient deep-rl with accelerated central inference. In International Conference on Learning Representations.
- Cross-domain imitation learning via optimal transport. In International Conference on Learning Representations.
- Fujii, K. (2021). Data-driven analysis for understanding team sports behaviors. Journal of Robotics and Mechatronics, 33(3):505–514.
- Policy learning with partial observation and mechanical constraints for multi-person modeling. arXiv preprint arXiv:2007.03155.
- Learning interaction rules from multi-animal trajectories via augmented behavioral models. In Advances in Neural Information Processing Systems 34, pages 11108–11122.
- Estimating counterfactual treatment outcomes over time in complex multi-agent scenarios. arXiv preprint arXiv:2206.01900.
- Imitation learning from observations under transition model disparity. In International Conference on Learning Representations.
- Social force model for pedestrian dynamics. Physical Review E, 51(5):4282.
- Deep q-learning from demonstrations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference, pages 3223–3230.
- Generative adversarial imitation learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4572–4580.
- Knowledge-guided agent-tactic-aware learning for starcraft micromanagement. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 1471–1477.
- Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning. Sensors, 21(4):1278.
- Deep imitation learning with memory for robocup soccer simulation. In International Conference on Engineering Applications of Neural Networks, pages 31–43. Springer.
- Deepfoids: Adaptive bio-inspired fish simulation with deep reinforcement learning. Advances in Neural Information Processing Systems, 35.
- Robocup: The robot world cup initiative. In Proceedings of the First International Conference on Autonomous Agents, pages 340–347.
- Hierarchical apprenticeship learning with application to quadruped locomotion. Advances in Neural Information Processing Systems, 20.
- Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94.
- Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4501–4510.
- Reinforcement learning with few expert demonstrations. In NIPS Workshop on Deep Learning for Action and Interaction.
- Coordinated multi-agent imitation learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1995–2003. JMLR. org.
- Improved cooperative multi-agent reinforcement learning algorithm augmented by mixing demonstrations from centralized policy. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 1089–1098.
- Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:3991–4002.
- Deep soccer analytics: learning an action-value function for evaluating soccer players. Data Mining and Knowledge Discovery, 34(5):1531–1559.
- Deep reinforcement learning in ice hockey for context-aware player evaluation. arXiv preprint arXiv:1805.11088.
- Semantic tracklets: An object-centric representation for visual multi-agent reinforcement learning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5603–5610. IEEE.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 30:6382–6393.
- Inverse reinforcement learning for team sports: Valuing actions and players. In Bessiere, C., editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3356–3363. International Joint Conferences on Artificial Intelligence Organization.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533.
- Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(6):623–635.
- Action valuation of on-and off-ball soccer players based on multi-agent deep reinforcement learning. IEEE Access, 11:131237–131244.
- Structure-preserving imitation learning with delayed reward: An evaluation within the robocup soccer 2d simulation environment. Frontiers in Robotics and AI, 7:123.
- Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690. PMLR.
- Hybrid learning for multi-agent cooperation with sub-optimal demonstrations. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3037–3043.
- Boosted bellman residual minimization handling expert demonstrations. In Joint European Conference on machine learning and knowledge discovery in databases, pages 549–564. Springer.
- Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97.
- Inferring the strategy of offensive and defensive play in soccer with inverse reinforcement learning. In Machine Learning and Data Mining for Sports Analytics (MLSA 2018) in ECML-PKDD Workshop.
- Model-free reinforcement learning from expert demonstrations: a survey. Artificial Intelligence Review, 55(4):3213–3241.
- Cross-domain imitation from observations. In International Conference on Machine Learning, pages 8902–8912. PMLR.
- Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 661–668. JMLR Workshop and Conference Proceedings.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth International Conference on Artificial Intelligence and Statistics, pages 627–635.
- Promoting coordination through policy regularization in multi-agent deep reinforcement learning. Advances in Neural Information Processing Systems, 33:15774–15785.
- Sim-to-real robot learning from pixels with progressive nets. In Conference on Robot Learning, pages 262–270. PMLR.
- Learning through imitation and reinforcement learning: Toward the acquisition of painting motions. In 2014 IIAI 3rd International Conference on Advanced Applied Informatics, pages 873–880. IEEE.
- Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, 26(1):43–49.
- Schaal, S. (1996). Learning from demonstration. Advances in Neural Information Processing Systems, 9:1040–1046.
- Prioritized experience replay. In International Conference on Learning Representations.
- How does AI play football? An analysis of RL and real-world football strategies. In 14th International Conference on Agents and Artificial Intelligence (ICAART’ 22), volume 1, pages 42–52.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489.
- Multi-agent generative adversarial imitation learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 7472–7483.
- Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395.
- The state of the art in online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(8):787–808.
- Trajectory prediction with imitation learning reflecting defensive evaluation in team sports. In 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE), pages 124–125. IEEE.
- Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction. In International Workshop on Machine Learning and Data Mining for Sports Analytics. Springer.
- Synergizing deep reinforcement learning and biological pursuit behavioral rule for robust and interpretable navigation. In 1st Workshop on the Synergy of Scientific and Machine Learning Modeling in International Conference on Machine Learning.
- Collaborative hunting in artificial agents with deep reinforcement learning. bioRxiv.
- Emergence of collaborative hunting via multi-agent deep reinforcement learning. In ICPR Workshop on Human Behavior Understanding. Springer.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30.
- Learning a markov model for evaluating soccer decision making. In Reinforcement Learning for Real Life (RL4RealLife) Workshop at ICML 2021.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817.
- Vintsyuk, T. K. (1968). Speech discrimination by dynamic programming. Cybernetics, 4(1):52–57.
- Diverse generation for multi-agent sports games. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Improving dribbling, passing, and marking actions in soccer simulation 2d games using machine learning. In Robot World Cup, pages 340–351. Springer.
- Generating multi-agent trajectories using programmatic weak supervision. In International Conference on Learning Representations.
- Generating long-term trajectories using deep hierarchical networks. In Advances in Neural Information Processing Systems 29, pages 1543–1551.
- Transfer learning in deep reinforcement learning: A survey. arXiv preprint arXiv:2009.07888.
- A survey of deep rl and il for autonomous driving policy learning. IEEE Transactions on Intelligent Transportation Systems.