Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar (2307.07102v1)

Published 14 Jul 2023 in cs.CV and cs.RO

Abstract: Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly. Moreover, most current multi-task perception models are huge in parameters, slow in inference and not scalable. Oriented on this, we propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation. Besides, models in Achelous family, with less than around 5 million parameters, achieve about 18 FPS on an NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny and Segformer-B0 on our collected dataset about 5 mAP$_{\text{50-95}}$ and 0.7 mIoU, especially under situations of adverse weather, dark environments and camera failure. To our knowledge, Achelous is the first comprehensive panoptic perception framework combining vision-level and point-cloud-level tasks for water-surface perception. To promote the development of the intelligent transportation community, we release our codes in \url{https://github.com/GuanRunwei/Achelous}.

Citations (11)

View on Semantic Scholar

Summary

The paper introduces a unified sensor fusion framework that combines a monocular camera and 4D mmWave radar to boost water-surface perception speed and robustness.
It leverages a Vision Transformer-based image encoder and a novel Radar Convolution network (RCNet) to efficiently extract and integrate visual and radar features.
The framework demonstrates improved operational speed (up to 18 FPS) and accuracy in challenging environments, paving the way for advanced USV navigation.

Summary of "Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar"

The paper presents Achelous, a compact and efficient framework for water-surface perception tasks, utilizing the fusion of monocular cameras and 4D mmWave radar sensors. This unified framework addresses the challenges of the relatively sluggish development of Unmanned Surface Vehicles (USVs) perception technologies compared to their Unmanned Ground Vehicles (UGVs) counterparts. Achelous focuses on improving inference speed and robustness in various operational environments, particularly unfavorable weather conditions and low-visibility situations.

Key Features and Methodologies

Combined Sensor Perception: Achelous employs both a monocular camera for capturing RGB images and a 4D mmWave radar for acquiring 3D point clouds. These sensors are synchronized temporally and spatially to enrich the perception capabilities, especially under adverse environmental conditions where traditional vision systems may fail.
Efficient Architecture: Achelous leverages a Vision Transformer (ViT)-based image encoder and a unique Radar Convolution (RadarConv) mechanism that extracts features from irregular radar point clouds more effectively than traditional convolutions. This is integrated within a unified architecture comprising a ViT-based encoder, radar feature encoder, Dual-FPN, and segmentation heads for multiple perception tasks such as object detection, semantic segmentation, and waterline segmentation.
Radar Convolution and RCNet: A novel convolution operation called Radar Conv is introduced, focusing on the spatial peculiarities of radar point clouds. RCNet, integrating RadarConv, aids in refining the feature extraction from radar data, enhancing the detection robustness against environmental interferences.
Performance Evaluation: The framework demonstrates significant improvement in latency and operational speed, achieving up to 18 FPS on NVIDIA Jetson AGX Xavier, compared to existing solutions like YOLOP and HybridNets. Achelous outperforms traditional methods in object detection (mAP $_{\text{50-95}}$ scores) and segmentation tasks in rigorous test environments, including low-lighting and foggy conditions.
Open Source and Scalability: The Achelous family is open-sourced and modular, promoting scalability for diverse application requirements within the intelligent transportation domain. This openness allows further explorations and modifications aligned with emerging technology trends.

Implications and Future Directions

The development of Achelous offers notable implications for the field of autonomous navigation in water bodies. It paves the way toward more resilient, edge-aware perception systems that can function independently without relying on network infrastructures. The incorporation of radar and visual data fusion presents a compelling approach to enhance the versatility and reliability of perception systems required for USVs.

The research supports the rise of intelligent and autonomous operations on water surfaces by proposing a cost-efficient methodology with practical relevance in critical applications, such as maritime exploration, search and rescue missions, and environmental monitoring. Future efforts could concentrate on expanding sensor fusion techniques, incorporating additional sensory data types, and advancing the perception framework for complex and dynamic maritime environments.

In conclusion, Achelous represents a substantial advancement in the mixed-sensor perception field, setting a foundation for continued advancements in autonomous maritime navigation and AI integration in multidisciplinary perception frameworks.

PDF Markdown

Related Papers

GitHub

GitHub - GuanRunwei/Achelous: Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar (155 stars)