Fast Traversability Estimation for Wild Visual Navigation (2305.08510v2)

Published 15 May 2023 in cs.RO, cs.CV, and cs.LG

Abstract: Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we propose Wild Visual Navigation (WVN), an online self-supervised learning system for traversability estimation which uses only vision. The system is able to continuously adapt from a short human demonstration in the field. It leverages high-dimensional features from self-supervised visual transformer models, with an online scheme for supervision generation that runs in real-time on the robot. We demonstrate the advantages of our approach with experiments and ablation studies in challenging environments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex outdoor terrains - negotiating obstacles in high grass as well as a 1.4 km footpath following. While our experiments were executed with a quadruped robot, ANYmal, the approach presented can generalize to any ground robot.

Citations (56)

View on Semantic Scholar

Summary

The paper introduces WVN, a novel system leveraging DINO-ViT features and SLIC superpixels for fast traversability estimation in challenging wild terrains.
The approach employs a dual-graph framework using velocity tracking errors and reconstruction losses to generate robust traversability scores.
Experimental results in forests and grasslands show superior performance to traditional methods, enabling autonomous navigation within minutes.

Fast Traversability Estimation for Wild Visual Navigation

The paper "Fast Traversability Estimation for Wild Visual Navigation" presents a novel approach for robotic navigation in challenging natural environments like forests and grasslands, where conventional geometric sensing methods often fail to accurately predict traversable terrain. The focus is on a system, termed Wild Visual Navigation (WVN), that enables real-time traversability estimation solely through vision and adapts rapidly after a brief human demonstration, to facilitate autonomous robot navigation.

Technical Overview

The WVN system capitalizes on self-supervised learning to continually adapt the traversability predictions based on online visual inputs. The backbone of the system is the utilization of high-dimensional features extracted from self-supervised Vision Transformer models (DINO-ViT), selected for their efficacy in encapsulating semantic information necessary for distinguishing navigable terrain. This choice is substantiated by the paper through comparisons with established image-based feature extractors such as ResNet-50 and EfficientNet-B4, where DINO-ViT exhibited superior performance, affirming its suitability for the task.

A significant innovation introduced is the reduction in computational complexity achieved by segmenting images into superpixels using SLIC, then averaging the features within each segment for efficient processing. This approach enables real-time operation on constrained hardware setups typically found on mobile robotic platforms.

Self-supervised Learning Framework

The methodology integrates a dual-graph system for managing supervision and mission data, generating traversability scores through a discrepancy function based on the robot's velocity tracking error. This score, a proxy for a traversability metric, is further refined through a neural network model trained online to estimate segment-wise traversability. A confidence metric derived from reconstruction loss of segment features aids in managing uncertainty, thereby improving prediction robustness.

The WVN strategy incorporates anomaly detection techniques, where reconstruction losses for traversed terrain are used to differentiate known traversable features from potential outliers, thereby constructing a confidence-based estimation framework. This system extends beyond mere regression, capturing the inherent uncertainties in the sparse supervision typically available during robot navigation.

Experimental Validation

The approach was validated across a variety of complex environments including forests, hillsides, and grasslands, showcasing the system's ability to adapt within a minimal temporal window — typically less than 5 minutes — and provide robust predictions suitable for autonomous navigation tasks. This was complemented by quantitative evaluations with both manually labeled human-provided ground truth and self-supervised labels. The results demonstrated the WVN's capability to consistently outperform traditional methods like Random Forest and SVM in traversability prediction tasks.

The paper reports successful closed-loop integrations with robot navigation systems using visual traversability maps for planning, enabling kilometer-scale path-following and obstacle negotiation even in densely vegetated environments. This is a testament to the practical application and effectiveness of such a visual navigation system for legged robots like the ANYmal, offering promising real-world deployment capabilities.

Implications and Future Directions

The proposed WVN system underscores significant implications for advancing robotic navigation in unstructured environments. Establishing a fast, vision-driven traversability prediction mechanism enhances the functional autonomy of robotic platforms, reducing reliance on extensive pre-training datasets and complex geometric maps.

Future developments could explore enhancements such as multilateral camera integration to overcome current limitations in field of view, sophisticated segment extraction to tackle segmentation accuracy issues, and further embedding WVN within broader navigation frameworks for comprehensive path planning. Additionally, continual learning paradigms could be exploited to sustain adaptability across diverse and evolving terrains.

In conclusion, the implementation of WVN offers robust solutions for mobile robots tasked with navigating the complexities of natural and dynamic environments, heralding advancements in autonomous exploration capabilities. The strategic use of self-supervised techniques reaffirms the system's potential in bridging the gap between visual perception and autonomous navigability, paving the way for broad application in next-generation robotic systems.

PDF Markdown

Related Papers

YouTube

Show All Videos