DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery (2503.16964v1)

Published 21 Mar 2025 in cs.CV

Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

Summary

DroneSplat: Advancements in 3D Reconstruction from Drone Imagery

This paper presents "DroneSplat," a novel framework specifically designed for robust 3D reconstruction from drone-captured imagery in wild environments. The integration of 3D Gaussian Splatting techniques into drone-based image processing marks a significant step toward resolving persistent challenges in aerial imaging, notably dynamic distractors and viewpoint limitations.

Addressing Dynamic Distractors with Adaptive Techniques

The dynamic nature of in-the-wild drone imagery—such as moving vehicles or fluctuating shadows—poses significant issues for traditional radiance field methods like Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS), which rely on multi-view consistency. DroneSplat introduces an adaptive local-global masking approach, which adeptly distinguishes between static and dynamic elements to maintain scene integrity during reconstruction. This strategy combines local segmentation heuristics with statistical analysis to predict and mask dynamic distractors effectively, thereby reducing inaccuracies and artifacts caused by moving objects.

Enhancing Viewpoint Robustness

Drone imagery often suffers from limited viewpoints, particularly in sequences captured during single-flight operations. To ameliorate the impact of sparse view constraints, DroneSplat utilizes multi-view stereo predictions through DUSt3R, providing rich geometric priors. Furthermore, the approach incorporates a voxel-guided optimization methodology within 3DGS, ensuring that Gaussian splatting capitalizes effectively on these priors. This ensures high-quality rendering even when the available perspectives are confined to restricted or repetitive angles.

Empirical Validation and Contributions

The contributions of DroneSplat are verified through extensive experiments conducted on newly developed datasets, comprising 24 drone-captured sequences that contain both static and dynamic scenes. Comparative analyses demonstrate the framework's superiority over established baselines like GS-W and WildGaussians, particularly in handling dynamic elements and sparse viewpoints. Notably, DroneSplat achieves remarkable rendering precision and scene geometry fidelity, illustrating the efficacy of its adaptive and optimization strategies.

Implications and Future Work

The advancements embedded in DroneSplat have considerable implications for fields relying on precise 3D reconstructions from aerial footage, including urban surveying and cultural heritage preservation. The approach offers an adaptable solution for integrating dynamic real-world conditions into accurate static scene reconstructions without sacrificing detail. Future explorations may focus on refining segmentation methodologies for more granular object tracking and expanding diffusion models to enhance scene filling in zero-visibility areas dominated by distractors.

In summary, DroneSplat leverages contemporary advancements in 3D visualization and machine learning, presenting an optimized approach for seamless morphing of drone-captured imagery into detailed three-dimensional models resilient to both dynamic disruptions and viewpoint limitations. This paper serves as a commendable reference for continuing innovation in aerial scene reconstruction technologies.

Tweets

https://twitter.com/zhenjun_zhao/status/1904209561684508921