Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation (2007.09183v1)

Published 17 Jul 2020 in cs.CV

Abstract: Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper. In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. The key of the proposed architecture is a novel Separation-and-Aggregation Gating operation that jointly filters and recalibrates both representations before cross-modality aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is introduced, on the one hand, to help to propagate and fuse information between the two modalities, and on the other hand, to preserve their specificity along the long-term propagation process. Besides, our proposed encoder can be easily injected into the previous encoder-decoder structures to boost their performance on RGB-D semantic segmentation. Our model outperforms state-of-the-arts consistently on both in-door and out-door challenging datasets. Code of this work is available at https://charlescxk.github.io/

View on arXiv

Authors (7)

Xiaokang Chen (39 papers)
Kwan-Yee Lin (23 papers)
Jingbo Wang (138 papers)
Wayne Wu (60 papers)
Chen Qian (226 papers)
Hongsheng Li (340 papers)
Gang Zeng (40 papers)

Citations (258)

View on Semantic Scholar

Summary

Overview of ECCV Author Guidelines

The provided document delineates comprehensive guidelines for authors submitting their papers to the European Conference on Computer Vision (ECCV). Intended as an exemplar for the submission process, it emphasizes the criticality of adhering to ECCV's formatting standards and procedural instructions. The guidelines are structured to ensure consistency, facilitate double-blind reviewing, and maintain the integrity of the conference's dissemination process.

Submission Format and Requirements

Authors are instructed to prepare their manuscripts in English, conforming to a specified format that includes constraints on length, with a maximum of 14 pages excluding references. Manuscripts exceeding this limitation will face automatic rejection, underscoring the importance of strict adherence to the prescribed format.

Anonymity and Confidentiality

The ECCV enforces a double-blind review process, requiring authors to eliminate any identifiable information from their submissions. This involves specific instructions on how to cite personal prior work without compromising anonymity. The document advises against using possessive pronouns referring to authors' previous research, and instead advocates for referencing third-person narratives. This approach aids in preserving the impartiality of the review process. Additionally, confidentiality is a core tenet, with the organization expecting reviewers to maintain discretion regarding manuscript contents.

Policy on Dual Submissions

The submission policy explicitly prohibits dual submissions, ensuring that submitted work is original and previously unpublished. Authors must acknowledge that substantial overlap with concurrent or prior submissions will result in disqualification. The guidelines further clarify the definition of what constitutes a publication, providing clarity on the treatment of arXiv and other non-peer-reviewed works.

Review Process and Responsibilities

Within the procedural framework outlined, the allocation of reviewers follows a systematic process powered by the “Toronto system,” which matches submissions with appropriate reviewers. This mechanism optimizes the alignment of expertise and subjects, contributing to a rigorous review environment.

Manuscript Preparation

Submission requires not only adherence to content requirements but also the meticulous preparation of supplementary materials, such as source files and figures. The document specifies technical aspects such as font sizes, typefaces, and layout features to be used throughout the manuscript. Further emphasis is placed on the need for consistent formatting of headings, figures, tables, and equations to retain stylistic uniformity.

Implications for Publication and Presentation

To advance to publication in the ECCV proceedings, at least one author must register and present the paper. The document details the steps required to transition an accepted initial submission into a polished camera-ready version, including compliance with formatting and copyright stipulations.

Concluding Thoughts

For researchers navigating the submissions process to ECCV, the document provides a granular road map, emphasizing adherence to protocol, preserving anonymity, and ensuring clarity and coherence in communication. This structured approach aids in maintaining the high standards expected of ECCV, thereby strengthening the conference's contributions to the field of computer vision. Future implications of these guidelines suggest a continued emphasis on transparency, originality, and uniformity in scholarly publishing practices within the domain.

PDF Markdown

Related Papers

Find Related Papers