Interactive Swarm Leader Identification (iSLI)

Updated 27 December 2025

The paper presents a principled methodology that integrates consensus-based estimation, graph spectral analysis, and RL-trained probing to rapidly and accurately identify leaders in swarms.
Key methods include decentralized consensus dynamics, conditional transfer entropy for information flow analysis, and adaptive probing policies that ensure exponential tracking error decay.
Applications span autonomous formation flight and human–swarm interactive leader switching, demonstrating scalable, robust control in dynamic robotic and biological systems.

Interactive Swarm Leader Identification (iSLI) encompasses principled methodologies for detecting, inferring, or adaptively designating leaders in robotic or biological swarms, focusing on collective tracking, formation control, and behavioral analysis. This concept integrates decentralized estimation, graph-theoretic metrics, statistical information flow, and interactive probing to identify leaders—whether overt, covert, or dynamically switched—using online and physically interactive mechanisms. iSLI spans foundational consensus dynamics, information-theoretic diagnostics, control-system engineering, and deep reinforcement learning approaches, supporting applications from autonomous formation flight to adversarial swarm interrogation.

1. Mathematical Formulation and Consensus-Based Leader Selection

The foundational model treats a swarm of $N$ agents with states $x_i \in \mathbb{R}^d$ evolving under graph-based consensus dynamics on a connected undirected graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ . Communication topology is encoded via adjacency matrix $A$ and Laplacian $L = D - A$ , with algebraic connectivity $\lambda_2(L)>0$ ensuring network coherence. An external reference velocity $u_r(t)$ is provided by a “master” to a single leader $\ell(t)$ every $T_r$ seconds, with the leader switch period $T$ potentially far shorter for higher adaptivity.

Each non-leader agent deploys a distributed consensus estimator: $\dot{\hat{x}}_i = -k_u \sum_{j \in \mathcal{N}_i}(\hat{x}_i - \hat{x}_j), \quad i \neq \ell; \quad \hat{x}_\ell = u_r$ Aggregated in block form with a directed Laplacian $L_\ell$ (zeroed leader row), the estimator supports decentralized propagation of $u_r$ . Agents then apply a formation-tracking controller: $\dot{x}_i = \hat{x}_i - k_p \sum_{j \in \mathcal{N}_i}[(x_i - x_j)-(r_i - r_j)], \quad \text{for} \quad x_i \in \mathbb{R}^d$ where $r_i$ are formation setpoints. Stacked error dynamics yield a system with global exponential stability, convergence governed by $-\lambda_2(L_\ell)$ and control gains (Franchi et al., 2013).

Leader selection exploits the spectral properties of $L_\ell$ . At each $t_k$ , a performance metric: $E(\ell_k) := \|\mathbf{e}(t_k)\|_{P_{k_n}}^2 \cdot e^{-2\mu_\ell T}$ where $\mu_\ell$ is an analytically derived decay rate, determines the optimal leader as $\ell_k = \arg\min_\ell E(\ell)$ , maximizing the rate of tracking error reduction over $[t_k, t_{k+1})$ . Distributed estimation methods (e.g., power iteration for $\lambda_2(L_\ell)$ , PI-ace filters for global error sums) enable fully decentralized selection.

2. Information-Theoretic Leader Detection via Transfer Entropy

iSLI also encompasses leader identification by measuring directed information flows within swarms, applying conditional transfer entropy (TE) as a diagnostic criterion (Sun et al., 2014). TE quantifies the reduction in uncertainty about agent $X$ ’s future state given observation of agent $Y$ , beyond $X$ 's own historical data: $t_{Y \rightarrow X|W}(n+1;k) = \log_2 \frac{p(x_{n+1}|x_n^{(k)}, y_n, w_n)}{p(x_{n+1}|x_n^{(k)}, w_n)}$ Leaders—especially covert leaders following external cues—are statistically distinguishable by receiving less TE from neighbors than followers, an effect robust to swarming parameters and forms of exogenous input. The computation pipeline consists of sliding-window time-series extraction, state-space discretization, empirical PDF estimation, local TE calculation, and population-level thresholding to classify agents with TE received below $[\mu - \alpha \cdot \sigma]$ as leaders.

This approach is scalable ( $O(N m B^d)$ per window for $N$ agents, $m$ neighbors, $B$ bins), implementable in real time, and empirically validated in disk-forming swarms (Sun et al., 2014).

3. Probing Policies and Deep Reinforcement Learning for Adversarial iSLI

Recent work formulates iSLI as a Partially Observable Markov Decision Process (POMDP), with a probing agent actively interacting with the swarm to expose leader identity (Bachoumas et al., 20 Dec 2025). The state $s_k$ encodes full agent poses, probe position, and the (hidden) leader index $L$ . Observations are constructed as dynamic graphs $\hat{\mathcal{G}}[k]$ reflecting agent and prober features, enhanced with temporal and relational information.

The probing policy is trained via Proximal Policy Optimization (PPO), leveraging a neural architecture consisting of:

Timed Graph Relationformer (TGR): merges node-wise embeddings (Graph Attention Transformer), set-based summaries (DeepSets), relational context (RelationNet), and temporal encoding (Time2Vec) through gating.
Structured State Space Sequence (S5): stacks recurrent layers modeling latent state evolution, supporting permutation invariance and long-range dependencies.

The prober maximizes a reward function favoring frequent probe-leader interaction, with additional terms for minimal leader distance and action smoothness. This policy achieves high leader-identification accuracy (e.g., 75.8% for $N=19$ , $v_{\max}=0.3$ ), generalizes zero-shot across swarm sizes/speeds, and exhibits robust sim-to-real transfer and resilience to online network disruptions (Bachoumas et al., 20 Dec 2025).

4. Dynamic Leader-Switching in Human–Swarm Systems

iSLI also refers to dynamic, operator-assisted leader switching in formation-controlled swarms (Wu et al., 14 May 2025). Here, a human operator governs only the current leader via velocity/yaw commands, while followers track leader-relative setpoints using monocular camera and laser altimeter inputs.

A finite-state machine formalizes switching:

On waypoint arrival, turn angle $\alpha$ is computed.
If $\alpha \geq \alpha_{\rm th}$ , leader-switching is triggered, selecting the next agent based on turn geometry.
Mutual-visibility guards—such as $\hat{d}_{\ell,\ell'} \leq d_{\max}$ , $|\hat{\psi}_{\ell,\ell'}| \leq \Psi_{\rm FOV}/2$ —guarantee tracking continuity.
UI feedback (visual highlighting, switch buttons) closes the human–swarm loop, enabling interactive role reassignment.

Experimental results demonstrate that the leader-switching mechanism significantly improves formation stability and sharp-turn maneuvering success compared to static leadership (RMSE of triangle area $0.14$–$0.28$ m $^2$ with switch vs. $2.06$–$2.56$ m $^2$ without) (Wu et al., 14 May 2025).

5. Performance Analysis and Stability Guarantees

Across the spectrum of iSLI, rigorous Lyapunov-based arguments establish global boundedness and asymptotic stability. For consensus-based strategies, common Lyapunov functions validate exponential tracking error decay under arbitrary leader switching, conditioned on graph spectral gaps and estimator/controller gains (Franchi et al., 2013). For human–in–the–loop systems, bounded error jumps and dwell-time constraints, coupled with mutual-visibility guards, maintain stability during role transitions (Wu et al., 14 May 2025).

Reinforcement learning-based probing policies maintain high identification accuracy in simulation and deploy robustly to physical robots, recovering leader belief following abrupt observation set changes and unmodeled real-world phenomena (Bachoumas et al., 20 Dec 2025).

6. Implementation Guidance and Open Challenges

Typical implementation protocols involve:

Distributed consensus filters (PI-ace) and power iteration for spectral estimation in fully decentralized leader selectors.
Sliding-window, binned PDF estimation for TE-based identification.
Modular sensing (monocular camera, laser altimeter) and nested PID loops for formation control.
Deep neural architectures (TGR+S5) for RL-based probing with policy training via PPO.

Open problems include scaling iSLI to swarms with second-order agent dynamics, handling multiple exogenous master controllers, adapting to time-varying communication graphs or formation shapes, and guaranteeing robustness to time-delays or packet loss. For RL-based probing, further exploration is warranted on transfer learning to highly heterogeneous platforms and scaling temporal reasoning to very large swarms.

7. Comparative Summary of iSLI Methodologies

Approach Domain	Identification Mechanism	Scalability / Key Results
Consensus-based (Franchi et al., 2013)	Online leader selection via spectral graph metrics and distributed estimation	Decentralized; exponential error decay; rapid response ( $\sim$ 0.7s settling time)
Information-theoretic (Sun et al., 2014)	Local transfer entropy analysis (TE received)	Scalable to $N \sim 1000$ ; robust against parameter change; distinguish covert leaders
Probing + RL (Bachoumas et al., 20 Dec 2025)	Physical probing, graph-based observation ingestion, PPO-trained neural policy	High accuracy ( $>75\%$ ), zero-shot generalization, sim-to-real success
Human–in–the–loop (Wu et al., 14 May 2025)	Manual or algorithmic leader switching at turn events; explicit UI feedback	Maintains formation stability; 100\% sharp-turn success rate with leader switching

Each methodology offers distinct trade-offs in operational context, implementation complexity, and deployment scale, reflecting the multi-modal nature of iSLI in contemporary swarm robotics and behavioral analysis.