Decentralized Control of Quadrotor Swarms

Updated 21 March 2026

Decentralized quadrotor swarm control is defined by autonomous, local decision-making without a centralized coordinator, utilizing onboard sensing and computation.
It leverages methods such as weight-sharing neural network policies, optimization-based trajectory planning, and local interaction rules to ensure collision avoidance and cohesive flight.
The approach emphasizes scalability and sim-to-real transfer, maintaining performance under computational constraints, limited communication, and dynamic environmental conditions.

A decentralized control system for quadrotor swarms is one in which each agent computes its actions autonomously using locally available information—typically only its own sensor data and (possibly) observations or messages from neighboring agents—without reliance on a centralized coordinator or global state. This organizational paradigm is fundamental for achieving scalability, robustness against failures, and adaptability in the collective operation of large teams of aerial robots, particularly in dynamic, cluttered, and communications-limited environments.

1. Architectural Principles of Decentralized Quadrotor Swarm Control

Decentralized quadrotor swarm control architectures are characterized by local sensing, onboard computation, and inter-agent coupling mediated only by immediate observations or restricted neighborhood communication. Representative frameworks include:

Weight-sharing of control policies: All quadrotors execute identical neural network policies, taking as input their own state and local neighbor information (Batra et al., 2021, Shi et al., 2020).
No explicit message passing or global state requirements: Coupling arises from observing the relative positions and velocities of a fixed-size set of neighbors, rather than through broadcast communications (Batra et al., 2021).
Fully reactive or event-triggered planning structures: Agents replan trajectories asynchronously in response to new sensor data, environmental changes, or neighbor trajectory updates (Zhou et al., 2020, Zhou et al., 2021).

Common system features include real-time operating constraints, critically limited onboard compute/memory (e.g., ≈1,000 parameter networks for 168 MHz class MCUs (Batra et al., 2021)), and domain-randomized simulation-to-real transfer to ensure robustness against sensor noise, actuation lag, and imperfect modeling.

2. Algorithmic Foundations and Mathematical Models

A wide range of modeling and control paradigms have been employed for decentralized quadrotor swarms:

Dynamical Models: Quadrotors are typically modeled with full 6-DOF Newton-Euler equations, including motor lag, aerodynamic coupling, and (where relevant) downwash interactions (Batra et al., 2021, Shi et al., 2020).
Observation Models: Sensing is limited to per-agent proprioceptive state (position, velocity, attitude, angular rates) and either a fixed number or a neighborhood radius of the closest neighbors' relative states. Some frameworks incorporate onboard vision/CNNs for perception (Hu et al., 2020).
Control Inputs: The action space ranges from high-frequency low-level motor thrusts (for full end-to-end RL) (Batra et al., 2021), to aggregate force/torque vectors, or waypoints/acceleration setpoints for hierarchical designs (Virágh et al., 2013, Shi et al., 2020).
Policy Representation:
- Deep learning-based: Neural network controllers from end-to-end RL or imitation learning encapsulate local perception and encode neighbor interactions via permutation-invariant "Deep Sets" or attention-based modules (Batra et al., 2021, Shi et al., 2020).
- Model-based: Classical decentralized controllers use rule-based or optimization-based feedback (e.g., viscous-friction flocking, chance constrained MPC, control barrier functions) (Virágh et al., 2013, Goarin et al., 2024, Arul et al., 2020).
- Hybrid learning-model integration: Data-driven learning of unmodeled aerodynamic coupling is embedded within nominal nonlinear controllers for close-proximity flight (Shi et al., 2020).

3. Communication, Interaction, and Observability Structures

Decentralization imposes strict constraints on information flow:

Neighborhood definition: Coordination arises via direct observation of relative positions/velocities of $K$ neighbors (Batra et al., 2021), vision-based relative state extraction (Petracek et al., 2023), or radius-based neighbor graph construction (Hu et al., 2020, Peng et al., 10 Jun 2025).
No explicit communication: Several approaches operate without any inter-agent messaging—purely through instantaneous local perception and hardwired interaction rules (Petracek et al., 2023, Koifman et al., 2024).
Local communication primitives: When communication is present, it is restricted to (a) periodic, best-effort, small-sized, local broadcasts (trajectories or compressed financial embeddings) (Zhou et al., 2020, Peng et al., 10 Jun 2025); (b) strictly neighbor communication graphs (e.g., one- or two-hop GNNs (Hu et al., 2020, Peng et al., 10 Jun 2025)).
Information aggregation: Neural network approaches employ deep sets, attention, or GAT (graph attention networks) to maintain permutation invariance and efficiently summarize local neighborhood information (Batra et al., 2021, Peng et al., 10 Jun 2025).

4. Learning and Optimization Approaches

A diversity of algorithmic strategies have been developed for control synthesis:

End-to-end deep reinforcement learning: Swarm policies are optimized using PPO (Proximal Policy Optimization) operating on batches of multi-agent trajectories. Training uses a curriculum of diverse formation, flocking, pursuit-evasion, and obstacle-avoidance tasks with domain randomization and noise injection to ensure robust zero-shot transfer (Batra et al., 2021).
Vision-based decentralized imitation learning: End-to-end frameworks combining CNNs for feature extraction and GNNs for information propagation are trained to imitate centralized "expert" swarm controllers (Hu et al., 2020).
Optimization-based decentralized trajectory planning: Agents solve local model predictive control (MPC) problems that embed collision-avoidance constraints (e.g., ORCA, CBFs, chance constraints), and treat neighbor trajectories as moving obstacles (Arul et al., 2019, Goarin et al., 2024, Arul et al., 2020). Some exploit explicit deadlock avoidance via grid-based multi-agent path planning and subgoal optimization (Park et al., 2022).
Bio-inspired local interaction laws: Flocking, pursuit, repulsion, and alignment rules directly encode cohesive and collision-averse group behavior mathematically, often augmented with local obstacle or goal attraction (Virágh et al., 2013, Koifman et al., 2024, Petracek et al., 2023).
Distributed learning over communication graphs: Graph attention-based decentralized actor-critic frameworks aggregate information from spatial neighbors, supporting multi-objective optimization such as coverage versus battery lifetime without centralized state (Peng et al., 10 Jun 2025).

5. Safety, Scalability, and Sim-to-Real Transfer

Empirical and analytical validation, as well as formal guarantees, are addressed via the following:

Formal safety guarantees: Control barrier function (CBF) approaches guarantee forward invariance of safety sets, including inter-agent and obstacle avoidance, even under limited local sensing and communication (Goarin et al., 2024, Palani et al., 2024).
Scalability and computational constraints: All frameworks are implemented to scale with agent count, supporting local-only computation (millisecond per-agent update rates), with communication and computation growing linearly or sublinearly with number of neighbors ( $K\ll N$ ) (Batra et al., 2021, Zhou et al., 2020, Peng et al., 10 Jun 2025).
Deadlock avoidance and performance under asynchrony: Solutions to deadlock in highly cluttered environments use decentralized, grid-based path planning embedded in the continuous control loop, guaranteeing progress and eliminating mutual-blocking (Park et al., 2022). Asynchronous planners with no global synchronization maintain high throughput and robustness to packet loss (Zhou et al., 2020, Zhou et al., 2021).
Sim-to-real transfer: Real-world trials with up to 8 drones confirm that controllers trained or validated in high-fidelity physics simulators transfer zero-shot to onboard execution, including aggressive maneuvers and collision recovery (Batra et al., 2021, Zhou et al., 2020, Petracek et al., 2023).

6. Theoretical Analysis and Empirical Observations

Tables are used to succinctly synthesize quantitative results reported in major works, illustrating the strengths and trade-offs among methods.

Framework / Paper	Real-world agents	Metric (N=8)	Safety / Scalability	Unique Features
(Batra et al., 2021)	up to 8	<0.02 collisions/min/drone; 0.42 m mean target distance	Yes / w/ retraining	End-to-end RL, permutation-invariant attention, zero-shot real transfer
(Shi et al., 2020)	up to 4-5	2–4× reduction in max height error	Stable, generalizes to larger swarms	Learned interaction model (downwash), Lyapunov I2SS
(Zhou et al., 2020)	up to 3	0 collisions, <0.5 ms replan	Yes, sublinear w/ swarm size	Local mapping, asynchronous replanning
(Goarin et al., 2024, Palani et al., 2024)	3–5	0 barrier violations in practical range	Provable, up to 10 agents	NMPC+CBF for limited range, LOS connectivity
(Peng et al., 10 Jun 2025)	Sim only	Approaches exhaustive search for N=40	Linear scaling, dual-objective	GAT-based decentralized actor-critic, flexible tradeoff
(Virágh et al., 2013, Petracek et al., 2023)	8–9, 3–4	Sustained flocking, self-organization	No explicit centralization	Purely local rules, communication-free

Qualitative behaviors in various works include cohesive flocking with collision weaving (Batra et al., 2021), recovery from minor physical impacts (Batra et al., 2021), flexible split-and-reform maneuvers through obstacles (Petracek et al., 2023), and deadlock-free collective passage through dense mazes (Park et al., 2022).

7. Limitations, Open Challenges, and Future Extensions

Despite significant progress, open challenges persist:

Scaling to massive swarms: All approaches exhibit some performance degradation (e.g., increased collisions, state distribution shift) without targeted retraining or algorithmic adaptation as $N$ grows (Batra et al., 2021, Peng et al., 10 Jun 2025).
Partial observability and communication delays: Advanced aggregation (e.g., multi-hop GNNs, recurrent units (Peng et al., 10 Jun 2025)) mitigates limited local information. However, full robustness to asynchrony, delayed/lossy communication, and measurement noise remains a topic for ongoing development (Zhou et al., 2020, Goarin et al., 2024).
Formal deadlock and safety proofs: While CBF and NMPC schemes provide strong guarantees under idealized conditions, practical efficacy in dense, cluttered, or dynamic environments is often achieved heuristically, or requires conservative detection ranges (Goarin et al., 2024, Palani et al., 2024).
Sim-to-real gap and onboard constraints: Sustained transfer of advanced behaviors requires domain randomization, compact policy architectures, and real-world hardware validation (Batra et al., 2021, Petracek et al., 2023).
Integrating perception and higher-level autonomy: Inclusion of vision-based controllers (Hu et al., 2020), scalable real-time mapping (Zhou et al., 2020), or self-organizing hierarchies for ad-hoc leadership and dynamic structure (Zhu et al., 2024) is expanding the behavioral repertoire and complexity of decentralized swarms.

Potential extensions include graph neural networks for richer relational modeling (Batra et al., 2021), learned explicit communication, integration of planning and learning, adaptive safety margins, hybrid map-based/reactive control, dynamic scaling of neighbor sets, and merging of functional layers such as task allocation and formation maintenance (Koifman et al., 2024, Peng et al., 10 Jun 2025).

Decentralized control of quadrotor swarms accommodates the physical, informational, and operational constraints of large-scale aerial robot collectives, achieving robust, scalable, and sometimes even provably safe behaviors across a spectrum of environments and tasks. Ongoing research continues to push toward higher autonomy, improved safety, stronger real-world performance, and new capabilities at scale (Batra et al., 2021, Shi et al., 2020, Virágh et al., 2013, Goarin et al., 2024, Zhou et al., 2020, Peng et al., 10 Jun 2025).