AerialSense: Multi-Modal UAV Sensing
- AerialSense is a modular multi-modal UAV sensing system employing advanced sensors and layered autonomy for diverse applications like mapping and surveillance.
- It deploys hybrid VTOL and multirotor platforms equipped with LiDAR, EO/IR, SAR, and RF receivers to capture high-fidelity, real-time environmental data.
- The system achieves sub-meter mapping, persistent change detection, and distributed object localization through robust sensor fusion and adaptive mission planning.
AerialSense denotes a modular, multi-modal UAV-based sensing paradigm emphasizing robust autonomy, high-fidelity sensor integration, dynamic data processing, and application-agnostic utility for environmental, communications, object localization, surveillance, and intelligent perception tasks. The architecture is generally centered on hybrid VTOL or multirotor aerial vehicles equipped with advanced sensors (LiDAR, EO/IR, SAR, atmospheric probes, and radio-frequency receivers), real-time onboard compute, and layered autonomy stacks for active perception, planning, and control. AerialSense has been formalized and experimentally validated for sub-meter DEM generation, persistent change detection, cellular network characterization, real-time air quality mapping, distributed object localization, third-person robotics teleoperation, panoramic scene parsing, coordinated SAR interferometry, multi-user RF localization, and vision-language benchmarking (Hsieh et al., 2016, Abdullah et al., 4 Sep 2025, Szenher et al., 2023, Bastani et al., 2021, Gawel et al., 2018, Sun et al., 2021, Chakareski, 2017, Hu et al., 2018, Guo et al., 8 Dec 2025, Lahmeri et al., 15 Jul 2025, Nguyen et al., 2022, Manduhu et al., 2023, Paglierani et al., 17 Nov 2025).
1. System Architecture: Layered Autonomy and Sensor Fusion
AerialSense systems feature a hierarchical software stack supporting robust autonomy across perception, planning, and control (Hsieh et al., 2016). The perception layer ingests multi-rate sensor data—LiDAR (≤32 beams, 300 k points/s), EO/IR cameras (global-shutter, pixel pitch ~1.5 µm, 70 dB dynamic range), SAR (X-band, 3 cm range resolution), GNSS/IMU (2–5 cm horizontal accuracy), and atmospheric probes—through low-level drivers (≥50 Hz). Spatio-temporal scene geometry is continuously reconstructed via TSDF voxel-grids. Sensor fusion, including pose-graph SLAM (LiDAR+EO/IR+GNSS/IMU/SAR) with EKF, enables real-time mapping and data alignment:
where , denote process and measurement noise.
Mission planners (RRT*-based) and local D* Lite planners operate on coarse DEMs and TSDFs, autonomously replanning around obstacles and science cues. Trajectory optimization is performed via Bézier curve minimization under collision constraints. Control stacks utilize cascaded PID/LPV architectures at 200 Hz for six-DOF tracking and fail-safes (GNSS loss, battery margins) invoke emergency modes (parachute deployment, hover-in-place). Sensor suite calibration is achieved via checkerboard/bundle adjustment and LiDAR-camera boresight refinement to achieve sub-centimeter alignment, with IEEE 1588-style PTP for time sync.
2. Sensing Modalities and Integration Techniques
AerialSense UAVs are equipped with primary sensors including (Hsieh et al., 2016):
- LiDAR: spinning 32-beam, range up to 120 m, angular resolution ≈0.2°, ≈ 2 cm.
- EO cameras: 12 MP, focal mm, ground sampling distance .
- IR cameras: uncooled microbolometer, 640×512 px, 0.05 K temperature resolution.
- SAR: X-band, 3 cm range, 0.5 m azimuth, 50 m swath @ 200 m AGL.
- Atmospheric sensors: thermistor (100 Hz), hygrometer (1 Hz), MEMS barometer (50 Hz).
- RF modems (for communications sensing): LTE-FDD/TDD, up to 150 Mbps DL/50 Mbps UL (Abdullah et al., 4 Sep 2025).
- SDR front-ends: dual-channel, wideband capture for user localization (Paglierani et al., 17 Nov 2025).
Integrated calibration involves intrinsic/extrinsic parameter estimation (minimizing ) and GNSS+RTK for geometric referencing (\;cm). Joint bundle adjustment and LiDAR–camera edge alignment further reduce georeferencing uncertainty.
3. Data Acquisition: Algorithms, Processing, and Uncertainty Quantification
Spatial resolution is controlled via ground sampling distance and LiDAR point density:
Sensor fusion employs EKF for pose estimation, incorporating IMU, GNSS, and LiDAR odometry:
Joint bundle adjustment minimizes reprojection and depth alignment errors via:
Error propagation through EKF yields per-vertex map uncertainty, and local bundle-block adjustment reduces horizontal error below 0.15 m and vertical error below 0.20 m. Statistical mapping and real-time network sensing (RSRP, RSSI, RSRQ, SINR, throughput, RTT) are logged at 1 Hz, enabling spatiotemporal geospatial interpolation and probabilistic coverage analysis (Abdullah et al., 4 Sep 2025).
4. Application Domains: Mapping, Communications, Surveillance, Object Localization
AerialSense systems have been empirically validated in several domains:
- Topographic mapping: Sub-meter DEMs generated from multi-modal data (LiDAR, EO, SAR) with RMSE (horizontal 0.12 m, vertical 0.18 m) over 0.5 km² at GSD ≈1.8 cm (Hsieh et al., 2016).
- Surface change detection: Erosion and landslide quantification via volumetric integration () with ±0.3 m³ CI against in situ ground truth.
- Cellular network monitoring: UAV-based RAN and end-to-end performance logging demonstrates altitude-dependent trade-offs (RSRP increases with altitude; SINR and RSRQ degrade due to interference). >90% coverage at ≥5 Mbps DL/UL; RTT < 150 ms at >80% locations. Notably, high RSRP does not guarantee uniform spatial coverage (Abdullah et al., 4 Sep 2025).
- IoT sensing: Air quality mapping with spatiotemporal DNN interpolation and short-term prediction (3D RMSE ≈11%). Consistent updates (ground nodes: every 60 min; aerial sorties: every 6–12 h) ensure dense AQI coverage (Hu et al., 2018).
- Distributed object localization: Weatherized multi-camera arrays (VIS/NIR/LWIR) with multi-view triangulation, accurate extrinsic calibration (ADS-B, UAV, celestial), and robust SVD-based algorithms achieve RMS error <1 m over 15 km baselines (Szenher et al., 2023).
5. Advanced Methods: InSAR UAV Swarms, Vision-Language Benchmarks, Panoramic Segmentation
AerialSense supports highly specialized modalities:
- Multi-UAV SAR Interferometry: Swarm-based InSAR fuses multiple baselines for DEM generation. Co-evolutionary PSO algorithms optimize UAV formation , velocity , and power schedules to minimize fused DEM error under constraints. Sub-decimeter accuracy ( cm for UAVs) is achievable with proper HoA ( m) and coverage scheduling (Lahmeri et al., 15 Jul 2025).
- Vision-Language Evaluation: The AerialSense benchmark (Guo et al., 8 Dec 2025) enables rigorous zero-shot and few-shot evaluation of VLMs on UAV imagery (VG, VR, VQA tasks) over 7,119 images and 53,374 samples. VLMs evaluated with AerialVP task prompt enhancements show accuracy/mIoU improvements of 25–45 percentage points.
- Panoramic Scene Segmentation: PAL-equipped UAVs deliver 360°×70° FoV scene parsing via real-time deep U-Net segmentation, with mean IoU ≈86% at 40 FPS on aerial data (Sun et al., 2021).
6. Surveillance, Distributed Sensing, and Collaborative Robotics
AerialSense platforms perform persistent human-centric surveillance, multi-robot coordination, and teleoperation (Nguyen et al., 2022, Gawel et al., 2018):
- Surveillance tasks:
- Detection: anchor-based/anchor-free detectors (Faster R-CNN, YOLOv4, CenterNet, FCOS) achieve high AP (>60% VisDrone-DET).
- Identification/Re-ID: Deep metric learning, cross-modal alignment for faces/gait across multi-modal aerial corpora.
- Tracking: DCF, Siamese, and regression trackers; MOTA/MOTP used for benchmarking.
- Behavior analysis: Two-stream and 3D CNN architectures, LSTM/transformer sequence models, and pose-based graph methods.
- Collaborative robotics: MAV-UGV teams use real-time VIO and LiDAR SLAM for global localization, with vision servoing (AprilTag, BRISK/PNP) enabling stable third-person teleoperation. Nonlinear MPC ensures robust trajectory tracking, achieving <0.2 m positional error (Gawel et al., 2018).
- Airborne sense-and-detect of drones: Deep learning on LiDAR pillars accelerates multi-drone detection and tracking (<100 ms latency, recall ≈85%, precision ≈96%) (Manduhu et al., 2023).
7. Future Directions, Limitations, and Open Research Questions
Emerging challenges and areas for future work include:
- Multi-modal dataset expansion: Need for million-scale public aerial benchmarks (RGB+IR+radar+LiDAR) at variable altitudes/weather (Nguyen et al., 2022).
- Domain adaptation and uncertainty quantification: Robustness to OOD, adversarial, and degraded sensing scenarios; integration with XAI for explainability.
- Autonomy and efficiency: Optimization of onboard perception-control loops for real-time inference (transformer architectures, knowledge distillation, neural architecture search).
- Deployability and resilience: Edge ML on UAVs, real-time anomaly detection, adaptive path planning, and fault-tolerant sensor fusion in harsh environments.
- Fine-grained perception: Expansion to complex semantic classes, temporal aggregation for video consistency, and panoptic segmentation.
- Coordinated multi-robot operations: Integration of collision avoidance and multi-object tracking for dense robotic swarms.
AerialSense thus represents a scalable, extensible blueprint integrating layered autonomy, advanced sensing, and algorithmic optimization for reliable, high-precision UAV-based environmental monitoring, communications assessment, surveillance, and intelligent perception across scientific and operational applications (Hsieh et al., 2016, Abdullah et al., 4 Sep 2025, Szenher et al., 2023, Bastani et al., 2021, Gawel et al., 2018, Sun et al., 2021, Chakareski, 2017, Hu et al., 2018, Guo et al., 8 Dec 2025, Lahmeri et al., 15 Jul 2025, Nguyen et al., 2022, Manduhu et al., 2023, Paglierani et al., 17 Nov 2025).