- The paper presents a novel fused approach that integrates CNN-based global perception with ToF sensor depth mapping, achieving a 100% success rate in navigation tests.
- The methodology combines semantic analysis with real-time obstacle detection to overcome the constraints of individual sensing pipelines.
- Experimental results in corridors and sharp turns confirm that the fusion strategy significantly outperforms standalone solutions in autonomous navigation.
Autonomous Navigation on Nano-UAVs: A Fusion of Depth and Vision Sensory Inputs
Introduction
The domain of autonomous nano-sized Unmanned Aerial Vehicles (UAVs) encompasses a variety of potential applications, ranging from inspection tasks in hazardous environments to inventory management within warehouse settings. These micro aerial vehicles bring unique advantages, notably their ability to safely operate in proximity to humans and their accessibility to confined spaces. A significant challenge in the deployment of such UAVs lies in the field of autonomous navigation, particularly in unknown environments. State-of-the-Art (SoA) systems typically rely on resource-intensive global and local planning techniques, making them unsuitable for the constrained computing capabilities of nano-UAVs. Thus, the focus has been on simpler, computationally feasible solutions. This paper introduces a novel approach that combines semantic information from a convolutional neural network (CNN) with depth maps for enhanced navigation capabilities, facing straight pathways, obstacle avoidance, and abrupt turns with unprecedented success.
System Design
The proposed system integrates two perception pipelines on a nano-UAV platform to facilitate autonomous navigation. The global perception pipeline leverages PULP-Dronet, a CNN designed for visual-based navigation, to extract semantic cues from the environment. It outputs a steering angle and a collision probability, translating these into control commands for the UAV. Conversely, the local perception pipeline utilizes an 8x8 pixel Time-of-Flight (ToF) sensor to generate depth maps indicative of immediate obstacles. This pipeline's outputs guide close-proximity maneuvers by identifying obstacle-free areas and directing the UAV accordingly.
A fusion mechanism combines these two pipelines, enhancing navigation efficiency by benefiting from the complementarity of depth-based and vision-based sensory inputs. The decision-making process incorporates outputs from both pipelines, adjusting the UAV's steering and speed to navigate complex environments successfully.
Experimental Setup and Results
The experimental evaluation, conducted in an office corridor, delineated scenarios involving straight pathways, static obstacles, and 90-degree turns. A comparative analysis of the performance of the global perception pipeline, the local perception pipeline, and the fused pipeline demonstrated the superiority of the fused approach. The global perception pipeline, while effective in obstacle-free environments, failed in scenarios involving obstacles. In contrast, the local perception pipeline managed obstacle avoidance but struggled with navigation decisions requiring semantic understanding, such as turning at corridor ends. The fused pipeline achieved a 100% success rate across all scenarios, effectively navigating straight corridors, avoiding obstacles, and executing turns, confirming the theoretical advantages of leveraging both depth and visual cues.
Conclusion and Future Implications
The research presents a leap forward in autonomy for nano-UAVs, underlining the feasibility of integrating global and local perception to navigate complex environments. The success of the fused pipeline not only enhances nano-UAVs' functional capabilities but also opens avenues for further exploration, such as dynamic obstacle avoidance and adaptation to diverse operational contexts. Future advancements may focus on refining the fusion mechanism, exploring scalable and more efficient deep learning models for real-time semantic processing, and expanding sensory inputs to enrich environmental understanding. This work demonstrates that even with constrained computational resources, it is possible to achieve advanced autonomous navigation by intelligently combining different sensory modalities.
Acknowledgments
Appreciation extends to D. Palossi and D. Christodoulou for their contributions and H. M\"uller for hardware support.