ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation (1905.10089v1)

Published 24 May 2019 in cs.CV

Abstract: Compared to RGB semantic segmentation, RGBD semantic segmentation can achieve better performance by taking depth information into consideration. However, it is still problematic for contemporary segmenters to effectively exploit RGBD information since the feature distributions of RGB and depth (D) images vary significantly in different scenes. In this paper, we propose an Attention Complementary Network (ACNet) that selectively gathers features from RGB and depth branches. The main contributions lie in the Attention Complementary Module (ACM) and the architecture with three parallel branches. More precisely, ACM is a channel attention-based module that extracts weighted features from RGB and depth branches. The architecture preserves the inference of the original RGB and depth branches, and enables the fusion branch at the same time. Based on the above structures, ACNet is capable of exploiting more high-quality features from different channels. We evaluate our model on SUN-RGBD and NYUDv2 datasets, and prove that our model outperforms state-of-the-art methods. In particular, a mIoU score of 48.3\% on NYUDv2 test set is achieved with ResNet50. We will release our source code based on PyTorch and the trained segmentation model at https://github.com/anheidelonghu/ACNet.

Authors (4)

Xinxin Hu (10 papers)
Kailun Yang (136 papers)
Lei Fei (2 papers)
Kaiwei Wang (62 papers)

Citations (306)

View on Semantic Scholar

Summary

The paper introduces ACNet, an attention-based network that fuses complementary RGB and depth features to improve segmentation accuracy.
It employs a multi-branch architecture with an Attention Complementary Module that dynamically weighs features, achieving a 48.3% mIoU on NYUDv2.
This approach effectively addresses the challenge of integrating disparate modality features, paving the way for advanced scene understanding and real-time applications.

ACNet: Enhancing RGBD Semantic Segmentation Through Attention-Based Feature Fusion

Semantic segmentation is a fundamental task in computer vision, tasked with partitioning images into coherent, semantically meaningful segments. The integration of depth information with RGB data, termed RGBD semantic segmentation, offers enhanced performance by leveraging the geometric information from depth images. However, a critical challenge resides in effectively integrating the disparate feature distributions inherent in RGB and depth images across varying scenes. The paper "ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation" by Xinxin Hu et al., introduces a novel framework, ACNet, designed to address this challenge by utilizing attention mechanisms to optimize feature extraction from both RGB and depth modalities.

Framework and Methodology

ACNet introduces a sophisticated architecture comprising three parallel branches designed to process RGB and depth images independently and then fuse the results. The core innovation lies within the Attention Complementary Module (ACM), which functions on channel attention principles to dynamically weigh and extract features from the RGB and depth branches. This allows ACNet to selectively harness high-quality features from different image channels, effectively adapting to the distinctive information each modality offers across different scenes.

The architecture harnesses ResNet as the foundational backbone for primary feature extraction, employing separate branches for RGB and depth inputs to maintain the integrity of original feature flows. Post feature extraction, ACMs dynamically integrate these features by evaluating the informativeness across channels, thus facilitating a balanced fusion that is pivotal for effective semantic segmentation.

Experimental Results

ACNet was subjected to rigorous testing on well-established datasets, specifically NYUDv2 and SUN-RGBD. On the NYUDv2 test set using the ResNet50 backbone, ACNet achieved a mean Intersection-over-Union (mIoU) score of 48.3%, marking a superior performance over contemporary state-of-the-art methods. This metric underscores the effectiveness of integrating attention-based mechanisms for RGBD settings and the potential superiority of ACNet's multi-branch architecture in handling input variability between RGB and depth data.

Implications and Future Work

The implementation of ACM to differentially weigh RGB and depth features based on their contribution highlights an innovative direction for semantic segmentation frameworks. This has significant implications for enhancing perception systems, especially in complex, cluttered, or indoor environments where depth information significantly contributes to scene understanding. By navigating the challenge of uneven information distribution between RGB and depth images, ACNet provides a refined methodology that balances feature extraction and integration.

Moving forward, potential research developments could pivot towards optimizing the computational efficiency and real-time applicability of ACNet, broadening its utility across diverse applications, including panoramic and surrounding perception technologies. Enhancements in these areas would likely expand the scope and applicability of RGBD semantic segmentation solutions in real-world scenarios, such as autonomous navigation and augmented reality. Such advancements would further solidify ACNet's relevance and utility in the ongoing evolution of AI-driven perceptual systems.

PDF Markdown

Related Papers

GitHub

GitHub - anheidelonghu/ACNet: ACNet: Attention Complementary Network for RGBD semantic segmentation (160 stars)