Shepherd: Swarm Control and Computational Systems

Updated 4 July 2026

Shepherd is defined as an external agent guiding swarms toward global objectives through alternating collect and drive behaviors.
It encompasses several computational systems such as language model critics, meta-agent runtimes, web-security frameworks, and molecular diffusion models.
Research applications span swarm robotics, learning-based control, adaptive strategy selection, and trusted coordination in adversarial or distributed settings.

In the arXiv literature represented here, “shepherd” most commonly denotes an external control agent that guides a population of interacting agents toward a global objective by exploiting collective dynamics, typically through alternating collecting and driving behaviors (Gee et al., 2019, Long et al., 2019). The same term also names several distinct computational systems: a 7B language-model critic, a runtime substrate for meta-agents, a post-login web-measurement framework, and an SE(3)-equivariant molecular diffusion model (Wang et al., 2023, Yu et al., 11 May 2026, Jonker et al., 2018, Adams et al., 2024). In addition, “Shepherd” appears eponymically in atmospheric energetics, graph theory, and quantum verification (Marquet, 2014, Heuer et al., 2020, Yung et al., 2020).

1. Swarm shepherding as a control paradigm

In swarm robotics and multi-agent control, shepherding is the external control of a group of interacting agents (“sheep”) by one or more external agents (“shepherds”) to achieve a global objective such as reaching a goal region while maintaining cohesion and safety (Gee et al., 2019). A standard formalization places the swarm in a 2D continuous plane, with sheep positions $x_i(t)\in\mathbb{R}^2$ , velocities $v_i(t)\in\mathbb{R}^2$ , shepherd position $s(t)\in\mathbb{R}^2$ , and goal $P_T$ or $g$ (Gee et al., 2019, Zhi et al., 2020). Canonical summary statistics are the centroid

$c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$

the maximum dispersion

$R(t)=\max_i \|x_i(t)-c(t)\|,$

and mean-radius variants such as $D(t)=\frac{1}{N}\sum_i \|x_i(t)-c(t)\|$ (Gee et al., 2019).

The dominant heuristic lineage is the Strömbom-style collect/drive switch. If the furthest sheep lies within a collection threshold $f(N)$ , the shepherd drives the cohesive group from a point behind the centroid relative to the goal; otherwise it collects by moving behind the furthest agent relative to the centroid (Gee et al., 2019, Nguyen et al., 2020, Long et al., 2019). In one widely used form,

$\|P_F-P_G\|<f(N)\Rightarrow \Psi=\Psi_D,\quad \text{else } \Psi=\Psi_C,$

with

$v_i(t)\in\mathbb{R}^2$ 0

and normalized directions toward those target points (Gee et al., 2019). The review literature generalizes this into a broader behavioral taxonomy including collect, drive, protect/guard/patrol, and obstacle steering, while also emphasizing that shepherd speed relative to agent speed, sensing locality, and obstacle structure materially change feasible strategies (Long et al., 2019).

The sheep side of the model is usually force-based. Common ingredients are short-range repulsion, cohesion, alignment or inertia, and repulsion from the shepherd; in some formulations, angular or Gaussian noise is added (Gee et al., 2019, Nguyen et al., 2020, Fujioka et al., 2022). Evaluation typically uses time-to-goal, success rate, dispersion metrics, path length, and safety events such as collision counts or boundary violations (Long et al., 2019, Zhi et al., 2020). This structure makes shepherding a hybrid control problem: continuous motion control is subordinated to a discrete decision over behavioral mode.

2. Learning-based shepherds

A major research direction replaces hand-coded steering laws with learned policies while retaining interpretable collect/drive structure. “Transparent Machine Education of Neural Networks for Swarm Shepherding Using Curriculum Design” decomposes the policy into two supervised modules, one for collecting and one for driving, with a pre-scripted switch based on $v_i(t)\in\mathbb{R}^2$ 1 (Gee et al., 2019). Training uses approximately 200,000 labeled samples across 480 simulations, produced by human-in-the-loop demonstrations, and maps a 9-feature state vector to a 2D direction output (Gee et al., 2019). The curriculum learner uses two single-hidden-layer networks (10 hidden nodes each), whereas non-curriculum baselines use single networks with 10 or 20 hidden nodes (Gee et al., 2019). In open-environment assessment, the curriculum learner achieved 32% success versus 7% for the strongest non-curriculum baseline, with learning-rate ratios of approximately 11.02 for collecting, 1.70 for driving, and 4.30 for success rate (Gee et al., 2019). The driving and collecting metric improvements did not reach significance in the reported trials, but the success-rate difference did (Gee et al., 2019).

Obstacle-rich environments motivated reinforcement-learning formulations. “Learning to Herd Agents Amongst Obstacles” casts shepherding as an MDP with local $v_i(t)\in\mathbb{R}^2$ 2 frame stacks, a semi-discrete eight-direction action space with continuous perturbation, and reward shaping based on projection progress and PRM geodesic progress (Zhi et al., 2020). The learning algorithm is Double Deep Q-Learning with prioritized experience replay, and the PRM supplies obstacle-aware waypoints and path-length estimates (Zhi et al., 2020). Across layered U-turn and gap environments, the learned model showed higher success rate, shorter completion time and path length than the rule-based behavioral methods, with advantages of at least 20–40% in several difficult settings (Zhi et al., 2020).

A continuous-control variant appears in “Continuous Deep Hierarchical Reinforcement Learning for Ground-Air Swarm Shepherding,” where a UAV shepherd learns separate DDPG policies for collection and driving and fuses them through a Strömbom-style gate (Nguyen et al., 2020). For $v_i(t)\in\mathbb{R}^2$ 3 UGVs, the method reached 100% success in both $v_i(t)\in\mathbb{R}^2$ 4 and $v_i(t)\in\mathbb{R}^2$ 5 environments and transferred from simulation to indoor physical experiments with 100% success in the reported scenarios (Nguyen et al., 2020). The article attributes part of this performance to hierarchical decomposition and continuous action outputs, which reduced the zig-zag behavior associated with discrete-action baselines (Nguyen et al., 2020).

3. Heterogeneity, noise, and contextual adaptation

Once the assumption of a homogeneous flock is relaxed, the shepherd must infer which agents are responsive and how they respond. “Shepherding Heterogeneous Flock with Model-Based Discrimination” introduces virtual sheep simulated under an estimated nominal model and classifies actual agents by residuals

$v_i(t)\in\mathbb{R}^2$ 6

computed at observation times (Fujioka et al., 2022). Static thresholding uses a fixed $v_i(t)\in\mathbb{R}^2$ 7, while dynamic thresholding sets

$v_i(t)\in\mathbb{R}^2$ 8

where $v_i(t)\in\mathbb{R}^2$ 9 are quartiles of the residuals (Fujioka et al., 2022). Across 14 variant types, the proposed methods reported success rates > 63% regardless of variant type, with average execution time < 530 steps, while a conventional FAT baseline dropped below 13% for variant types lacking both attraction and shepherd repulsion (Fujioka et al., 2022).

Context awareness can also be embedded at the level of tactic selection rather than residual classification. “Contextually Aware Intelligent Control Agents for Heterogeneous Swarms” organizes control into an S2AI-to-AI2A pipeline that classifies agent types, infers scenario probabilities, and selects among 25 tactic pairs formed from five drive variants and five collect variants (Hepworth et al., 2022). Across 11 scenarios, the context-aware version achieved an overall Mission Success Rate of 74% ± 28%, versus 64% ± 33% without context, and reduced mission length from 3320.1 ± 1646.6 to 2157.5 ± 1959.9 over all trials (Hepworth et al., 2022). The same study reports a significant increase in the number of swarm agents directly influenced by the shepherd in 100% of scenarios (Hepworth et al., 2022).

Noise changes not only performance but preferred parameterization. “Disturbances in Influence of a Shepherding Agent is More Impactful than Sensorial Noise During Swarm Guidance” studies perception noise $s(t)\in\mathbb{R}^2$ 0 and actuation disturbance $s(t)\in\mathbb{R}^2$ 1 under Strömbom’s rule (Nguyen et al., 2020). The central empirical finding is that actuation disturbance is more detrimental than perception noise, even though the tested $s(t)\in\mathbb{R}^2$ 2 magnitudes are an order of magnitude smaller than the tested $s(t)\in\mathbb{R}^2$ 3 magnitudes (Nguyen et al., 2020). The same work shows that the collect/drive threshold should move in opposite directions depending on noise type: increase the threshold under high perception noise to avoid spurious collection, and decrease it under high actuation noise to re-compact the group more aggressively (Nguyen et al., 2020).

A related adaptive idea appears in “Re-Solving the Shepherding Problem: Lead When Possible, Herd When Necessary,” which switches between a leading mode and a herding mode based on whether the closest agent is approaching or receding (Strömbom et al., 18 Feb 2026). In the reported simulations, the mixed controller transported groups with any follower/evader composition $s(t)\in\mathbb{R}^2$ 4, whereas herd-only succeeded only at $s(t)\in\mathbb{R}^2$ 5 and lead-only only at $s(t)\in\mathbb{R}^2$ 6 (Strömbom et al., 18 Feb 2026). Under time-varying strategy switching, the mixed controller remained effective up to $s(t)\in\mathbb{R}^2$ 7 when the time limit was removed, and succeeded within a 6000-step horizon for $s(t)\in\mathbb{R}^2$ 8 up to approximately 0.01 (Strömbom et al., 18 Feb 2026).

4. Selective guidance, safety, and trusted-shepherd formulations

Some work narrows the objective from transporting an entire swarm to manipulating a designated subset. “Shepherding Control for Separating a Single Agent from a Swarm” formulates target separation under an $s(t)\in\mathbb{R}^2$ 9-disc connectivity constraint on the remaining $P_T$ 0 agents (Deng et al., 2022). The method defines a pinning sheep, constructs an ideal velocity from distance- and velocity-based surrogates, and projects the shepherd’s target point onto analytically derived feasibility sets $P_T$ 1 from a two-sheep analysis (Deng et al., 2022). In numerical experiments, it produced larger connected components among the non-target agents than a bipartite baseline, especially for boundary targets (Deng et al., 2022).

“Collision-Free Shepherding Control of a Single Target within a Swarm” instead gives a continuous-time control law for driving one target sheep to the origin while avoiding sheep-sheep collisions (Deng et al., 2023). The key sufficient safety condition is

$P_T$ 2

which bounds the shepherd’s maximal repulsive influence by the saturated sheep-sheep repulsion (Deng et al., 2023). Under this condition, the paper proves noncollision and gives a Lyapunov-based feedback controller with local asymptotic convergence of the target to the origin (Deng et al., 2023). Simulations up to N = 200 showed the proposed method consistently regulated the target below a small threshold before $P_T$ 3, while the baseline heuristic did not achieve consistent convergence (Deng et al., 2023).

A more abstract selective-control role appears in distributed algorithms on graphs. “Dispersion, Capacitated Nodes, and the Power of a Trusted Shepherd” introduces a trusted shepherd robot that is never Byzantine and can orchestrate exploration, mapping, and allocation on capacitated anonymous graphs (Jr. et al., 2023). With knowledge of $P_T$ 4 and $P_T$ 5, the shepherd achieves Byzantine dispersion in $P_T$ 6 rounds when

$P_T$ 7

and with knowledge of $P_T$ 8 and $P_T$ 9 it achieves the same asymptotic time under the weaker knowledge assumption but stricter tolerance

$g$ 0

(Jr. et al., 2023). In the benign setting, the paper shows that any uncapacitated dispersion algorithm with time $g$ 1 can be wrapped into a capacitated version with time $g$ 2, concentrating the additional memory cost on the shepherd (Jr. et al., 2023). This suggests a broader interpretation of “shepherd” as a trusted, more capable coordinator within an otherwise decentralized or adversarial system.

5. “Shepherd” as a named computational system

Outside swarm control, several papers use Shepherd as the name of a concrete model or runtime.

System	Domain	Defining property
Shepherd (Wang et al., 2023)	LLM criticism	7B critic model tuned to provide natural-language feedback
Shepherd (Yu et al., 11 May 2026)	Meta-agent runtime	Typed effect trace with fork/merge/discard semantics
Shepherd (Jonker et al., 2018)	Web security measurement	Automated login and post-login scanning framework
ShEPhERD (Adams et al., 2024)	Drug design	SE(3)-equivariant diffusion over molecules and interaction fields

“Shepherd: A Critic for LLM Generation” defines a 7B-parameter critic, based on LLaMA-7B, trained by supervised fine-tuning on approximately 8K critique instances, including 1,317 human-annotated examples (Wang et al., 2023). Its role is not to answer questions directly but to inspect a candidate answer, identify specific errors, and suggest refinements (Wang et al., 2023). In GPT-4 pairwise evaluation, it achieved average win-rates of 87.0% versus Alpaca, 53.0% versus SelFee, and 56.0% versus ChatGPT; in human evaluation it reached 72.4%, 59.7%, and 49.6% respectively, with especially strong performance on the distribution-shifted CritiqueEval set (Wang et al., 2023).

“Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace” treats an agent and its execution as first-class runtime objects with typed events, branchable scopes, and a Git-like persistent trace (Yu et al., 11 May 2026). The system reports 5× faster process-and-filesystem forking than Docker commit and >95% prompt-cache reuse on replay (Yu et al., 11 May 2026). In applications, a live supervisor increased pair-coding pass rates on CooperBench from 28.8% to 54.7%; counterfactual meta-optimization improved benchmark scores by up to 11 points while reducing wall-clock time by up to 58%; and Tree-RL training improved TerminalBench-2 from 34.2% to 39.4% (Yu et al., 11 May 2026).

“Shepherd: Enabling Automatic and Large-Scale Login Security Studies” is a web-measurement framework for discovering login forms, submitting credentials, verifying authenticated state, and running post-login scans (Jonker et al., 2018). Using BugMeNot credentials, it automatically verified logins on 6,273 unknown sites, or 12.4% of the test set, and found 2,579 of those sites—41.4%—vulnerable to simple session hijacking under the paper’s criterion (Jonker et al., 2018).

“ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design” uses the same lexical label in a different capitalization pattern, but again denotes a concrete system rather than a control agent (Adams et al., 2024). The model jointly diffuses 3D molecular graphs, shape surfaces, electrostatic potential surfaces, and directional pharmacophores (Adams et al., 2024). Reported conditional validities are 96.0% for $g$ 3, 91.9% for $g$ 4, and 80.7% for $g$ 5, with applications to natural-product ligand hopping, protein-blind hit diversification, and bioisosteric fragment merging (Adams et al., 2024).

6. Eponymic and formal uses

In atmospheric energetics, T. G. Shepherd’s pseudo-energy is a Hamiltonian-Casimir construction for compressible, hydrostatic flow (Marquet, 2014). The central quantity is

$g$ 6

which becomes the sum of kinetic energy and generalized available potential energy (Marquet, 2014). For an isothermal/isobaric reference state, the construction yields the available enthalpy

$g$ 7

establishing a direct link between Shepherd’s pseudo-energy and exergy formulations in thermodynamics (Marquet, 2014).

In graph theory, “Shepherd” refers to the finite Hamiltonicity results of F. B. Shepherd on claw-free and net-free graphs (Heuer et al., 2020). The cited 1991 theorem states that if a finite graph is claw-free and net-free, then connectedness implies a Hamilton path and 2-connectedness implies Hamiltonicity; for $g$ 8, $g$ 9-connectedness is equivalent to $c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 0-leaf-connectedness (Heuer et al., 2020). The 2020 extension to locally finite graphs replaces finite Hamilton cycles with Hamilton circles in the Freudenthal compactification and proves that every locally finite, 2-connected, claw-free, net-free graph is Hamiltonian in that topological sense (Heuer et al., 2020).

In quantum verification, the Shepherd–Bremner protocol is an IQP-based cryptographic test of quantum computational power (Yung et al., 2020). Its basic statistic is the secret-string correlator

$c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 1

or equivalently the parity bias $c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 2 (Yung et al., 2020). The 2020 “Anti-Forging Quantum Data” paper explains that the original construction can be broken by the Kahanamoku-Meyer attack, then generalizes the protocol by introducing multiple hidden secrets, a decomposition $c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 3 with $c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 4 anticommting and $c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 5 commuting with the relevant $c(t)=\frac{1}{N}\sum_{i=1}^{N}x_i(t),$ 6, and verifier-side estimators for the correlators (Yung et al., 2020). A notable design feature is that multiple secret strings can be encoded simultaneously, which the paper presents as a significant strengthening of anti-forging hardness (Yung et al., 2020).

Across these usages, “shepherd” consistently marks one of two roles. It either denotes an external guide or supervisor acting on a more complex collective process, or it designates a framework that inspects, branches, verifies, or redirects another system’s behavior. The recurrence of that role vocabulary across robotics, distributed algorithms, machine learning, security, physics, and combinatorics suggests a stable conceptual pattern, even though the underlying mathematical objects differ substantially.