Grasper: A Generalist Pursuer for Pursuit-Evasion Problems (2404.12626v1)
Abstract: Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.
- Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608 (2023).
- Multi-robot adversarial patrolling: facing a full-knowledge opponent. Journal of Artificial Intelligence Research 42 (2011), 887–916.
- On discrete-time pursuit-evasion games with sensing limitations. IEEE Transactions on Robotics 24, 6 (2008), 1429–1439.
- Jan Buermann and Jie Zhang. 2022. Multi-robot adversarial patrolling strategies via lattice paths. Artificial Intelligence 311 (2022), 103769.
- Exploration by random network distillation. In ICLR.
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML. 160–167.
- Is Nash equilibrium approximator learnable?. In AAMAS. 233–241.
- Are equivariant equilibrium approximators beneficial? arXiv preprint arXiv:2301.11481 (2023).
- Neural auto-curricula in two-player zero-sum games. In NeurIPS. 3504–3517.
- Counterfactual multi-agent policy gradients. In AAAI. 2974–2982.
- Ross Girshick. 2015. Fast R-CNN. In ICCV. 1440–1448.
- HyperNetworks. In ICLR.
- Masked autoencoders are scalable vision learners. In CVPR. 16000–16009.
- Karel Horák and Branislav Bošanskỳ. 2017. Dynamic programming for one-sided partially observable pursuit-evasion games. In ICAART. 503–510.
- GraphMAE: self-supervised masked graph autoencoders. In KDD. 594–604.
- A survey of multi-robot regular and adversarial patrolling. IEEE/CAA Journal of Automatica Sinica 6, 4 (2019), 894–903.
- Linan Huang and Quanyan Zhu. 2021. A dynamic game framework for rational and persistent robot deception with an application to deceptive pursuit-evasion. IEEE Transactions on Automation Science and Engineering 19, 4 (2021), 2918–2932.
- A unified game-theoretic approach to multiagent reinforcement learning. In NeurIPS. 4190–4203.
- Population-size-aware policy optimization for mean-field games. In ICLR.
- Solving large-scale pursuit-evasion games using pre-trained strategies. In AAAI. 11586–11594.
- CFR-MIX: Solving imperfect information extensive-form games with combinatorial action space. In IJCAI. 3663–3669.
- A survey of decision making in adversarial games. arXiv preprint arXiv:2207.07971 (2022).
- Solutions for multiagent pursuit-evasion games on communication graphs: Finite-time capture and asymptotic behaviors. IEEE Transactions on Automatic Control 65, 5 (2019), 1911–1923.
- Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers. In NeurIPS. 5586–5600.
- GCC: Graph contrastive coding for graph neural network pre-training. In SIGKDD. 1150–1160.
- Frederick P Rivara and Christopher D Mack. 2004. Motor vehicle crash deaths related to police pursuits in the United States. Injury Prevention 10, 2 (2004), 93–95.
- Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).
- Student of Games: a unified learning algorithm for both perfect and imperfect information games. Science Advances 9, 46 (2023), eadg3256.
- Multi-robot adversarial patrolling: facing coordinated attacks. In AAMAS. 1093–1100.
- Milind Tambe. 2011. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press.
- Large-scale representation learning on graphs via bootstrapping. In ICLR.
- Urban security: Game-theoretic resource allocation in networked domains. In AAAI. 881–886.
- Probabilistic pursuit-evasion games: Theory, implementation, and experimental evaluation. IEEE Transactions on Robotics and Automation 18, 5 (2002), 662–669.
- Sharing experience in multitask reinforcement learning. In IJCAI. 3642–3648.
- Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 412 (2020), 101–114.
- Multi-task reinforcement learning: A hierarchical Bayesian approach. In ICML. 1015–1022.
- NSGZero: efficiently learning non-exploitable policy in large-scale network security games with neural Monte Carlo tree search. In AAAI. 4646–4653.
- Solving large-scale extensive-form network security games via neural fictitious self-play. In IJCAI. 3713–3720.
- The surprising effectiveness of PPO in cooperative multi-agent games. In NeurIPS Datasets and Benchmarks Track. 24611–24624.
- A decentralized policy gradient approach to multi-task reinforcement learning. In UAI. 1002–1012.
- From canonical correlation analysis to self-supervised graph neural networks. In NeurIPS. 76–89.
- Optimal escape interdiction on transportation networks. In IJCAI. 3936–3944.
- Optimal interdiction of urban criminals with the aid of real-time information. In AAAI. 1262–1269.
- Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2021), 5586–5609.
- On the effectiveness of fine-tuning versus meta-reinforcement learning. In NeurIPS. 26519–26531.
- Graph contrastive learning with adaptive augmentation. In WWW. 2069–2080.
- Regret minimization in games with incomplete information. In NeurIPS. 1729–1736.