Autonomous MAV Navigation Framework

Updated 26 November 2025

The paper presents a modular pipeline integrating perception, mapping, state estimation, and control to enable autonomous navigation for MAVs.
The framework employs probabilistic occupancy maps and TSDF fusion for rapid obstacle detection, ensuring efficient trajectory planning and collision avoidance.
The system leverages adaptive NMPC and reactive control algorithms to meet resource constraints and handle dynamic obstacles in real time.

Autonomous navigation frameworks for Micro Aerial Vehicles (MAVs) encompass the suite of algorithms, architectural pipelines, and system integrations that enable these vehicles to perceive, map, plan, and safely traverse complex, unknown, and often cluttered environments without human intervention. Central to these frameworks is the ability to create and update a world model from onboard sensory data, localize the MAV, generate dynamically feasible trajectories that guarantee collision avoidance, and execute control policies in real time—often under severe payload and computation constraints.

1. Framework Architectures and System Components

Comprehensive MAV navigation frameworks are designed as modular pipelines integrating perception, mapping, state estimation, planning, and control. Key architectural patterns include:

Perception and State Estimation: Visual-inertial odometry (VIO), LiDAR-inertial odometry (LIO), or monocular SLAM provide pose estimation. Examples include OKVIS2-based VIO for stereo+IMU fusion, FAST-LIO2 for LiDAR-inertial odometry, or LSD-SLAM for monocular systems (Papatheodorou et al., 25 Sep 2024, Stumberg et al., 2016, Pfreundschuh et al., 2022).
Mapping/World Modeling: Frameworks employ probabilistic occupancy grids (octree-based in OctoMap or supereight2), truncated signed distance fields (TSDF, as in voxblox), or scene reconstructions via monocular depth estimation and TSDF fusion (Oleynikova et al., 2018, Simon et al., 2023, Papatheodorou et al., 25 Sep 2024).
Frontier and Utility Computation: Efficient frontier extraction—local and global—is implemented directly on submaps or octomaps to identify “next-best-view” (NBV) exploration goals (Papatheodorou et al., 25 Sep 2024, Patel et al., 2022).
Planning: Global coverage or target-reaching utilizes sampling-based path planning (RRT*, PRM), skeleton graph search (topological planners), or information-gain sampling over frontier voxels (Oleynikova et al., 2018, Papatheodorou et al., 25 Sep 2024). Local planners often use B-spline or polynomial trajectory generation optimized for obstacle avoidance, or motion-primitive libraries for constrained vehicles (Oleynikova et al., 2018, Simon et al., 2023).
Control Execution: Nonlinear Model Predictive Control (NMPC) and iterative LQR approaches are standard for trajectory tracking and robustness, augmented by PID cascades on embedded autopilots (e.g., Pixhawk, PX4) (Oleynikova et al., 2018, Papatheodorou et al., 25 Sep 2024).
Resource Management: Several systems adapt computational parameters (map resolution, sensing frequency) to maintain real-time constraints on resource-limited platforms (Patel et al., 2022, Papatheodorou et al., 25 Sep 2024).

End-to-end block diagrams often follow:

1	Sensors → State Estimation → Mapping → Frontier/Exploration Planning → Local/Global Planning → Trajectory Tracking (MPC/NMPC) → MAV Actuation

2. Mapping, Representation, and State Estimation

Mapping approaches employ probabilistic occupancy representations to capture environmental structure efficiently and enable fast collision queries:

Submap-based Occupancy Mapping: Frameworks such as (Papatheodorou et al., 25 Sep 2024) partition the global octree into local submaps, with explicit free/occupied voxel representation. Submaps are generated either by geometric overlap (LiDAR) or keyframe count (depth cameras), enhancing computational efficiency and drift correction via loop closure.
TSDF Fusion: Systems such as voxblox (Oleynikova et al., 2018) and MonoNav (Simon et al., 2023) accumulate projective distance fields from depth or predicted depth, supporting ESDF construction for fast obstacle distance lookups.
Semi-Dense Monocular Mapping: LSD-SLAM-based navigation (Stumberg et al., 2016) fuses inverse-depth at high-gradient pixels only, necessitating specialized exploration strategies to handle angularly-limited, texture-dependent reconstructions.
State Estimation: Factor-graph-based fusion (GTSAM/iSAM2), error-state EKFs, and tight-coupling with GNSS or UWB networks are prominent for robustness to degenerate conditions (Pfreundschuh et al., 2022, Kanellakis et al., 2019).

Pose updates are integrated with mapping pipelines to correct prior submaps or occupancy grids upon loop closure/event triggers, maintaining global consistency (Papatheodorou et al., 25 Sep 2024).

3. Exploration, Frontier Extraction, and Planning

Autonomous exploration is driven by efficient detection of environmental “frontiers” (interfaces between free and unknown space) and maximization of information gain:

Frontier Extraction:
- Local frontiers: Extracted as free voxels adjacent to unknown/freshly modified cells in an active submap (Papatheodorou et al., 25 Sep 2024). Updates are incremental: combining deprecated (“stale”) and newly created frontiers with set operations.
- Global frontiers: Constructed across frozen submaps with exclusion criteria to prevent redundant exploration; computation is parallelized for scalability (Papatheodorou et al., 25 Sep 2024).
- Safe frontier sets: Further pruned by risk margin and proximity to occupied cells, as in the REF scheme (Patel et al., 2022).
NBV Sampling and Utility:
- Candidate viewpoints are sampled near global frontiers. A utility function, typically $u_j = g_j / t_j$ , balances expected information gain (from entropy raycasts) against flight time (Papatheodorou et al., 25 Sep 2024).
- Motion primitives or continuous trajectory optimization select a feasible action that maximizes utility or exploration reward (Simon et al., 2023, Oleynikova et al., 2018).
Planning Algorithms:
- Global planners utilize RRT-Connect/RRT*/PRM skeleton graph searches, graph-based coverage methods (e.g., C-CPP for infrastructure inspection (Kanellakis et al., 2019)), or topological planning on ESDF-derived GVD skeletons (Oleynikova et al., 2018).
- Local planners leverage B-spline, snap-minimizing polynomial formulations, or reactive end-point selection (e.g., “shotgun” for unknown or occupied goals) (Oleynikova et al., 2018).

Exploration strategies may be decoupled into local (short-horizon, FOV-constrained) and global (repositioning) phases, with switching logic triggered by frontier availability or actuation cost (Patel et al., 2022, Stumberg et al., 2016).

4. Robust Trajectory Generation and Control

Trajectory generation in autonomous MAV frameworks integrates dynamic feasibility, obstacle avoidance, and localization uncertainty:

Trajectory Optimization: Polynomial or B-spline trajectories are smoothed by minimizing snap, jerk, or “soft” obstacle penalty terms derived from ESDFs (Oleynikova et al., 2018). In MonoNav (Simon et al., 2023), collision-free motion primitives are chosen from a discrete library and scored for proximity to the goal and obstacle clearance.
Online Control: NMPC is widely used, with model predictive formulations directly incorporating nonlinear quadrotor dynamics and non-convex, potentially time-varying obstacle constraints (Small et al., 2018, Papatheodorou et al., 25 Sep 2024). Efficient solvers such as PANOC achieve real-time embedded performance at 20 Hz on resource-constrained hardware (Small et al., 2018).
Uncertainty Handling: Some frameworks, e.g., (Mansouri et al., 2020), adapt NMPC tracking weights using Shannon entropy computed over localization covariance, de-emphasizing uncertain state components to maintain safety under drift.
Reactive Control: For highly dynamic or mapless domains, potential field approaches operate directly on raw pointclouds, summing repulsive and attractive forces, with saturation and normalization enforcing smoothness (Lindqvist et al., 2021).

Control commands are typically sent at high frequencies (100–200 Hz) to embedded flight controllers (Pixhawk, PX4), which implement low-level cascaded PID stabilization.

5. Specializations: Low-Payload, Resource-Limited, and Domain-Specific Frameworks

Several recent frameworks exploit tightly specialized methods to meet unique application or hardware constraints:

Ultra-low Payload (Monocular/CNN/End-to-End):
- Purely monocular systems reconstruct metric maps from vision transformer depth networks, with offboard or onboard fusion and primitive-based planning (Simon et al., 2023).
- End-to-end deep learning (e.g., D-PPO (Singh et al., 8 Apr 2025)) maps depth input directly to discrete action spaces, trained in simulation and transferred to resource-limited MAVs (DJI Tello), realizing up to 91% latency reduction and substantial gains over prior RL baselines.
- Imitation learning with perception modules (e.g., ResNet-50) trained on optimal trajectories from simulated environments (Lin et al., 2021).
Insect-Inspired and Reactive Schemes:
- Navigation-by-detection (ForaNav (Kuang et al., 4 Mar 2025)), uses HOG+color SVM tree detectors in agricultural tasks, with memory-driven visual servoing, recovery upon target occlusion, and orders-of-magnitude lower computational load than CNN baselines.
- Compact potential-field frameworks, e.g., COMPRA (Lindqvist et al., 2021), avoid global maps and enable rapid, robust exploration in GPS-denied/search-and-rescue contexts.
High-Value Infrastructure Inspection:
- Multi-agent, geometry-based coverage utilizing collaborative planning on surface slices, UWB-inertial localization, and high-fidelity mapping for defect localization, fully deployed in field trials (Kanellakis et al., 2019).
Collision Recovery and Map Enrichment:
- Air Bumper (Wang et al., 2023) demonstrates IMU-only collision detection, rapid recovery setpoints, and collision-aware map augmentation, improving safety and navigation in environments with transparent or previously unobserved obstacles.

6. Performance Metrics and Empirical Comparison

State-of-the-art frameworks are empirically benchmarked by metrics including:

Coverage/time to 100% exploration: 100% in <60 s for Leica BLK2Fly (LiDAR), 95% in ~120 s for depth-camera MAV (Papatheodorou et al., 25 Sep 2024).
Reconstruction error: RMSE as low as 0.11 m vs. 0.22 m for baselines (Papatheodorou et al., 25 Sep 2024).
Resource utilization: CPU time per mission of 162 s vs. 748 s, memory footprint of 6.7 GB vs. 23.5 GB (Papatheodorou et al., 25 Sep 2024).
Navigation robustness: Collision rates 0.13 (MonoNav) vs. 0.53 (NoMaD) on <100 g airframes (Simon et al., 2023). In the Deep RL setting, collision rate dropped >80% relative to existing RL navigation methods (Singh et al., 8 Apr 2025).
Localization accuracy: UWB-inertial fusion achieves 0.55 m RMSE for complex tower inspection, crucial for mapping tasks without GNSS (Kanellakis et al., 2019).
Compute efficiency: Real-time operation (30–50 Hz) on Intel Atom-class CPUs without discrete GPUs is demonstrated by depth/disparity-based mapping frameworks (Campos-Macías et al., 2019).
Resilience: No collisions over extended missions in narrow subterranean passages (REF (Patel et al., 2022), COMPRA (Lindqvist et al., 2021)); robust heading and path tracking in the absence of feature-rich environments or intermittent sensor dropout.

7. Limitations, Open Challenges, and Future Directions

Current limitations include:

Semantic Mapping: Most frameworks are geometric; few incorporate semantic (object-level, affordance-aware) mapping. Future work targets integration of vision-LLMs for high-level task planning (Papatheodorou et al., 25 Sep 2024).
Dynamic Obstacles and Risk: Limited explicit modeling of dynamic obstacles or trajectory risk. Ongoing research explores dynamic feasibility-aware utilities and predictive collision avoidance (Campos-Macías et al., 2019, Papatheodorou et al., 25 Sep 2024).
Sim-to-Real Transfer: Deep RL and learning-based approaches require further advances in domain randomization for effective generalization from simulation to real hardware (Singh et al., 8 Apr 2025).
Memory and Long-horizon Planning: Reactive and mapless frameworks may be trapped in dead-ends without persistent memory. Hybrid architectures combining global mapping with local reactivity are a subject of active investigation (Lin et al., 2021).
Hardware Generality: Some algorithms are matched to specific sensors (LiDAR vs. depth vs. monocular), or constrained by hardware actuation (full actuation vs. gimbaled sensor heads) (Pfreundschuh et al., 2022). Generalizable frameworks are increasingly supporting sensor/configuration modularity (Papatheodorou et al., 25 Sep 2024, Oleynikova et al., 2018).

Autonomous navigation frameworks for MAVs now span from highly resource-optimized, reactive methods enabling true onboard exploration even on sub-50 g platforms, through semantic-rich, coverage-optimized architectures for industrial inspection and complex search-and-rescue. Ongoing research focuses on bridging geometric and semantic reasoning, robustifying mapping and planning against uncertainty, and maintaining rigorous resource adaptation for deployment across the spectrum of contemporary MAV designs.