Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous Modalities (2312.08851v1)

Published 14 Dec 2023 in cs.CV, cs.CE, and cs.RO

Abstract: Urban water-surface robust perception serves as the foundation for intelligent monitoring of aquatic environments and the autonomous navigation and operation of unmanned vessels, especially in the context of waterway safety. It is worth noting that current multi-sensor fusion and multi-task learning models consume substantial power and heavily rely on high-power GPUs for inference. This contributes to increased carbon emissions, a concern that runs counter to the prevailing emphasis on environmental preservation and the pursuit of sustainable, low-carbon urban environments. In light of these concerns, this paper concentrates on low-power, lightweight, multi-task panoptic perception through the fusion of visual and 4D radar data, which is seen as a promising low-cost perception method. We propose a framework named Achelous++ that facilitates the development and comprehensive evaluation of multi-task water-surface panoptic perception models. Achelous++ can simultaneously execute five perception tasks with high speed and low power consumption, including object detection, object semantic segmentation, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation. Furthermore, to meet the demand for developers to customize models for real-time inference on low-performance devices, a novel multi-modal pruning strategy known as Heterogeneous-Aware SynFlow (HA-SynFlow) is proposed. Besides, Achelous++ also supports random pruning at initialization with different layer-wise sparsity, such as Uniform and Erdos-Renyi-Kernel (ERK). Overall, our Achelous++ framework achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency compared to other single-task and multi-task models. We release and maintain the code at https://github.com/GuanRunwei/Achelous.

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates a low-power framework that fuses vision and 4D radar to perform five water-surface perception tasks in real-time.
It introduces HA-SynFlow, a heterogeneous-aware pruning technique that streamlines model components for edge device efficiency without performance loss.
The framework supports flexible CNN-ViT backbones and a specialized radar convolution operator, enabling energy-efficient autonomous navigation.

An Analysis of the Achelous++ Framework for Panoptic Water-Surface Perception on Edge Devices

The paper "Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous Modalities" presents a comprehensive framework aimed at enhancing panoptic perception of water surfaces. It emphasizes power efficiency, speed, and versatility in performing multiple perception tasks simultaneously. The primary contributions of this research can be summarized into several key aspects.

Achelous++ aims to address the limitations of current multi-sensor fusion and multi-task learning models, which often demand significant power and rely on high-performance GPUs for inference, resulting in increased carbon emissions. To counteract these limitations, Achelous++ adopts a low-power focus, combining vision and 4D radar data to present a lightweight, multi-task framework designed for real-time applications on edge devices.

Technical Contributions

Multi-Task Framework: Achelous++ encapsulates a suite of five distinct perception tasks, efficiently handling object detection, semantic segmentation of objects, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation. This modular framework allows for comprehensive evaluation and real-time inference on low-performance devices.
Pruning Strategy: The framework introduces Heterogeneous-Aware SynFlow (HA-SynFlow), a multi-modal pruning strategy to streamline model components without degrading performance, facilitating real-time application on edge devices. This pruning technique efficiently balances sparsity and performance across modalities, demonstrating its utility in resource-constrained environments.
Flexible Backbones: Achelous++ supports various state-of-the-art CNN-ViT hybrid networks and reparameterized networks, delivering different balances between complexity and performance to suit diverse edge device capabilities.
Radar Convolution Operator: The framework includes a specialized radar convolution operator tailored to process the sparse and irregular nature of radar point clouds, enhancing the efficacy and speed of radar-derived feature modeling.

Achelous++ achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency over competing single-task and multi-task models. The modular nature and energy efficiency underscore its adaptability to real-time applications, providing compelling evidence of its utility for autonomous navigation on water surfaces.

Implications and Future Prospects

Achelous++ demonstrates the potential of vision-radar fusion frameworks to contribute to energy-efficient autonomous systems. By optimizing model complexity, the framework facilitates long-duration operations on edge devices, promoting sustainability in computing. Moreover, the real-time capability ensures that Achelous++ serves as a practical tool for autonomous navigation, promising to improve safety and efficiency in aquatic environments.

The authors' open-source commitment through the release of their code could inspire further research and practical advancements, allowing others to build upon their framework for continued innovations in multi-modal perception systems. As autonomous systems become more prevalent, efficient frameworks like Achelous++ stand to play a pivotal role in practical deployments, driving efforts toward sustainable, low-carbon technologies in smart cities and beyond.

PDF Markdown

Related Papers

GitHub

GitHub - GuanRunwei/Achelous: Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar (140 stars)