Real-Time Pursuit Strategies (R2PS)

Updated 28 November 2025

R2PS are algorithmic frameworks that integrate control laws, dynamic planning, and learning-based policies to enable pursuers to capture or intercept evaders in real time.
They employ methodologies such as MDPs, geometric control, and potential fields to tackle challenges like partial observability and operational constraints.
Real-time execution is achieved through explicit control laws, distributed planning, and hybrid RL methods, enhancing robustness and scalability in dynamic environments.

A real-time pursuit strategy (R2PS) is a family of algorithmic frameworks, control laws, and learning-based policies that enable one or more robotic or software agents (pursuers) to pursue and capture, encircle, or intercept evading agents (evaders) under operational constraints such as imperfect sensing, actuation limits, dynamic environments, and partial information, with all planning and execution steps constrained to be computable in real time. R2PS research spans settings including continuous and discrete spaces, perfect and partial information, single and multi-agent teams, and incorporates elements of dynamic programming, learning, game theory, geometric control, and potential fields. The unifying principle of R2PS is the explicit design of control and/or decision-making procedures that are provably or empirically feasible under computational and real-world constraints and robust to evader strategies, sensor limitations, and disturbances.

1. Problem Models and Foundations

R2PS research adopts diverse formal models depending on agent dynamics, sensor topology, and information structure. Canonical frameworks include:

Markov Decision Processes (MDP) / Partially Observable Stochastic Games (POSG): Used for continuous and discrete pursuit–evasion games, with explicit models for joint agent states, controlled transitions, and possibly adversarial rewards. Real-time feasibility is achieved via local horizon one-step lookahead or efficient functional approximations (Bertram et al., 2019, Gonultas et al., 8 May 2024).
Zero-Sum Markov Games on Graphs: In environments like road networks or urban street graphs, the evader and pursuers move on discrete vertices, with capture defined as adjacency. Partial observability yields a pursuer POMDP; worst-case robust real-time strategies must compute over the belief space or its tractable proxies (Lu et al., 21 Nov 2025, Krishnamoorthy et al., 2014).
Geometric Control Laws: For nonholonomic or Dubins-like agents, geometric or sliding-mode control is used. The engagement is parameterized by ranges, line-of-sight angles, lead angles, and prescribed geometric error variables; the control laws are algebraic and explicit (Kumar et al., 9 Feb 2024).
Potential Games and Cooperative Optimization: For multi-pursuer scenarios, R2PS adopts potential game formulations, with the global capture utility as a potential function and decentralized learning protocols such as Spatial Adaptive Play (SAP) for real-time, distributed assignment of active pursuers (Lee et al., 2020).
Gradient-Based Area Minimization: For spatial containment and guaranteed capture, the evader's safe-reachable set is geometrically defined as the intersection of Apollonius circles; R2PS pursuer controls are closed-form gradients of the area functional with respect to own position (Mammadov et al., 19 Nov 2025).

2. Policy Synthesis and Algorithmic Structures

R2PS designs deploy a spectrum of synthesis techniques, ensuring real-time feasibility:

Dynamic Programming (DP) Recursion: In discrete or graph-based games, the DP recursion is

$D(s_p, s_e) = 1 + \min_{n_p} \max_{n_e} D(n_p, n_e),$

where $D$ is the worst-case capture time, $n_p$ are pursuer moves, and $n_e$ are evader (possibly asynchronous, information-rich) responses. In partial observability, the DP policy is computed over belief sets or distributions, with online updates at $O(|V|)$ per step (Lu et al., 21 Nov 2025, Krishnamoorthy et al., 2014).

Belief and Set Propagation: Under partial/incomplete information, pursuers propagate a set $Pos_t$ of possible evader locations or a belief $b_t(s_e)$ , updated via Bayes' rule or set-theoretic operations at each time step, used to guide DP or RL policies (Lu et al., 21 Nov 2025, Gonultas et al., 8 May 2024).
Geometric/Switching Control Laws: In cooperative defense, sliding-mode surfaces and prescribed-time convergence laws are applied to engagement angle errors, with explicit algebraic allocation of net joint control to minimize effort (Kumar et al., 9 Feb 2024).
Potential and Artificial Field Design: Artificial potential fields (APF) encode obstacle and inter-agent repulsion and evader attraction, hybridized with deep RL to enhance data efficiency and generalization (Zhang et al., 2022).
Reinforcement Learning (RL): Deep RL policies for both single and team pursuit are trained via actor-critic, TD3, or MADDPG, incorporating curriculum learning, centralized training with decentralized execution, and neighbor sorting for scalability and permutation invariance (Jr et al., 2020, Zhang et al., 2022, Gonultas et al., 8 May 2024). Hybrid neuro-symbolic approaches introduce online policy selection based on opponent classification (Kalanther et al., 8 Nov 2025).
Gradient-Based Area Minimization: Pursuers compute $\nabla_{p_i} A_e(t)$ for the area $A_e(t)$ of the intersection of safe sets, and steer along $- \nabla_{p_i} A_e(t)$ for provably optimal containment (Mammadov et al., 19 Nov 2025).

3. Real-Time Execution and Scalability

Ensuring real-time feasibility is central to R2PS:

Explicit, Closed-Form Control Laws: Many geometric or sliding-mode R2PS prescribe closed-form algebraic controls using current sensor measurements, avoiding any online optimization or iterative methods. Computational complexity per agent is $O(1)$ per step and suitable for 10–100 Hz embedded hardware (Kumar et al., 9 Feb 2024).
Localized or Distributed Planning: Decentralized MDP, RL, or potential field policies operate on local observations and subsetted neighbor states. This yields linear or sub-linear scaling in the number of agents, with per-agent inference or update times $\ll$ control loop period (Jr et al., 2020, Bertram et al., 2019).
Set/Belief Propagation: Set update for partial observability (e.g., $Pos_t$ for graph-based pursuit) is $O(|V|)$ , enabling deployment on graphs with $|V|\sim 1000$ (Lu et al., 21 Nov 2025).
Fast Forward-Projection: Discrete action lists, forward integration of ODEs, and vectorization/JIT (Numba) are used for continuous-state real-time value estimation, with per-agent planning times from 2–27 ms for up to $100$ pursuers, supporting parallel execution (Bertram et al., 2019).
Hybrid Online/Offline Designs: Offline RL training combined with online adversary-classification enables real-time policy switching in bounded-rationality settings (Kalanther et al., 8 Nov 2025).

4. Robustness, Adaptation, and Performance Metrics

Modern R2PS implementations prioritize robustness to partial observability, adversarial strategies, dynamics, and disturbances:

Worst-Case Guarantees: DP-based policies maintain optimality even for asynchronous, adversarial evader strategies with knowledge of pursuer moves (Lu et al., 21 Nov 2025).
Partial Information: Belief-state RL, set propagation, and explicit incorporation of sensor and memory constraints yield policies that retain $>80\%$ success under severe observation/model mismatch, outperforming classical RL in zero-shot generalization (Lu et al., 21 Nov 2025, Gonultas et al., 8 May 2024).
Empirical Gains: RL and APF-enhanced R2PS demonstrate up to 30% higher capture rates versus classical heuristics; real-world deployments achieve low-latency end-to-end control with robust sim-to-real transfer (Zhang et al., 2022, Gonultas et al., 8 May 2024, Jr et al., 2020).
Geometric and Potential-Based Metrics: Area contraction rates, time-to-capture, survivability, and formation quality (e.g., Kamimura–Ohira scores) are reported as standardized benchmarks (Mammadov et al., 19 Nov 2025, Bertram et al., 2019).
Heterogeneous and Multi-Agent Scaling: Scenarios with up to $100$ pursuers/evaders display graceful performance degradation and statistically guaranteed capture rates, with real-time algorithmic scaling (Bertram et al., 2019, Mammadov et al., 19 Nov 2025).

5. Representative Algorithms and Comparative Analysis

The R2PS literature has produced a suite of algorithms, each tailored to problem structure and operational constraints:

Framework	Core Technique	Scalability/Complexity
Graph-based PEG w/ Partial Info	DP + belief propagation + RL-GNN	$O(n^2 m)$ / $O(\|V\|)$ per step
Geometric Cooperative Defense	Sliding-mode, prescribed-time	$O(1)$ per agent
Continuous Multi-Agent 6DOF	Peak-based FastMDP + vectorization	$O(N^2)$ , real-time for $N\leq 100$
Hybrid RL+APF	D3QN-parameterized APF	$O(M)$ (neighbors), real-time
Decentralized RL (TD3/MADDPG)	Shared network, neighbor embedding	$O(K)$ per agent
Area-gradient Min. (Encirclement)	Explicit $\nabla A_e$ steering	$O(N^2)$ (arc computation)

Each approach emphasizes modularity and generalizability: parameter sharing, local observability, and explicit decomposition enable team-scale operation, including in unstructured, obstacle-filled, or adversarial settings (Zhang et al., 2022, Jr et al., 2020).

6. Future Directions and Open Challenges

The R2PS field continues to evolve, with critical directions including:

Partial Observability and Memory Augmentation: Richer history-encoded belief objects (e.g., RNN or Bayesian filtering), memory inheritance, and online adaptation for environments with severe occlusions or sparse sensing (Gonultas et al., 8 May 2024, Lu et al., 21 Nov 2025).
Formal Guarantees in Learning-Based Systems: Integration of geometric or game-theoretic priors with RL, leading toward stable and provably optimal hybrid architectures (Zhang et al., 2022, Lu et al., 21 Nov 2025, Mammadov et al., 19 Nov 2025).
Generalization, Sim-to-Real Transfer: Systematic domain randomization, cross-graph robust policy distillation, and closed-loop field trials to ensure empirical robustness (Lu et al., 21 Nov 2025, Gonultas et al., 8 May 2024).
Scalable Multi-Agent Coordination: Efficient encoding (graph neural nets, mean-pooling, permutation-invariant observables) for seamless scaling to large pursuer/evader teams (Jr et al., 2020, Bertram et al., 2019).
Symbolic Reasoning and Long-Horizon Planning: Integration of planning transformers, symbolic path-finding, and knowledge-augmented neural policies to address combinatorial map and goal uncertainty (Kalanther et al., 8 Nov 2025, Zhou et al., 2 Nov 2024).

A plausible implication is that future R2PS systems will fuse task-specific geometric control laws, distributed reinforcement learning, and partial-information reasoning layers, with rigorous performance guarantees and hardware-constrained real-time execution.