Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation (2308.02097v1)

Published 4 Aug 2023 in cs.CV

Abstract: Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation. Early efforts focus on boosting the performance for only one task, \emph{e.g.,} fusion or segmentation, making it hard to reach~`Best of Both Worlds'. To overcome this issue, in this paper, we propose a \textbf{M}ulti-\textbf{i}nteractive \textbf{F}eature learning architecture for image fusion and \textbf{Seg}mentation, namely SegMiF, and exploit dual-task correlation to promote the performance of both tasks. The SegMiF is of a cascade structure, containing a fusion sub-network and a commonly used segmentation sub-network. By slickly bridging intermediate features between two components, the knowledge learned from the segmentation task can effectively assist the fusion task. Also, the benefited fusion network supports the segmentation one to perform more pretentiously. Besides, a hierarchical interactive attention block is established to ensure fine-grained mapping of all the vital information between two tasks, so that the modality/semantic features can be fully mutual-interactive. In addition, a dynamic weight factor is introduced to automatically adjust the corresponding weights of each task, which can balance the interactive feature correspondence and break through the limitation of laborious tuning. Furthermore, we construct a smart multi-wave binocular imaging system and collect a full-time multi-modality benchmark with 15 annotated pixel-level categories for image fusion and segmentation. Extensive experiments on several public datasets and our benchmark demonstrate that the proposed method outputs visually appealing fused images and perform averagely $7.66\%$ higher segmentation mIoU in the real-world scene than the state-of-the-art approaches. The source code and benchmark are available at \url{https://github.com/JinyuanLiu-CV/SegMiF}.

Citations (82)

View on Semantic Scholar

Summary

The paper presents SegMiF, which leverages a hierarchical interactive attention block and dynamic weighting to enhance feature fusion and segmentation.
It achieves a 7.66% mIoU improvement over state-of-the-art methods, demonstrating superior visual realism and semantic accuracy.
The study introduces a comprehensive benchmark with 1500 image pairs and 15 annotated categories to advance multi-modality research in autonomous systems.

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

The paper "Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation" introduces an architecture, SegMiF, aimed at enhancing tasks related to multi-modality image fusion and segmentation by leveraging dual-task correlation. This paper revisits the joint formulation of image fusion and segmentation to achieve better visual realism and semantic accuracy simultaneously, addressing significant challenges in autonomous systems and robotics operations like autonomous driving.

Architectural Innovations

The SegMiF architecture is constructed using a cascading system that includes a fusion network and a segmentation network. The key innovation here is the implementation of a hierarchical interactive attention (HIA) block which allows for the full bidirectional transfer of semantic/modality-oriented features. This not only refines the feature mapping but also fills the modality gap, ensuring a comprehensive feature interaction. Additionally, the dynamic weighting factor is incorporated to automatically adjust task-related weights, thus optimizing the balance between image fusion and segmentation tasks. This automated approach to weight tuning alleviates the demands of manual calibration, often required in dual-task systems.

Benchmark Proposition

Beyond the architectural contributions, the paper introduces the Full-time Multi-modality Benchmark (FMB), an imperative resource designed to facilitate advancement in the field of image fusion and segmentation. The benchmark consists of a robust data set featuring 1500 aligned pairs of infrared and visible images, along with 15 annotated pixel-level categories, catering to various conditions from dense fog to low-light scenarios. This benchmark stands out due to its multi-scene, multi-environment composition, offering researchers a rich dataset to test the scalability and robustness of multi-modality fusion models under diverse and challenging conditions.

Experimental Validation and Results

The experimental results underscore the implications of SegMiF in improving image fusion quality and segmentation accuracy. The paper reveals an average improvement of 7.66% in mIoU over the current state-of-the-art (SOTA) methods, exhibiting superior performance on both synthetic and real-world scenes. The experiment's pivotal insights are further emphasized by extensive visual data that pits SegMiF against existing methods, showcasing its ability to maintain detail fidelity and feature distinction under varying conditions.

Implications and Future Work

From a practical viewpoint, SegMiF has promising implications in enhancing scene understanding in domains reliant on sensory data integration like autonomous driving. The robust fusion and segmentation capabilities support the potential of real-time applications that require seamless scene interpretation across different platforms and environmental conditions.

From a theoretical perspective, this work opens new avenues in multi-task learning for feature-intensive processes. The interaction between task networks proposed in SegMiF lays the foundational framework for future research, potentially influencing developments in joint task optimization strategies, beyond image processing tasks.

The paper leaves several potential extensions open for future exploration, particularly around the refinement of dynamic weighting algorithms and the scalability of the hierarchical attention framework to other multi-modality tasks. These enhancements could potentially translate into even greater efficiencies and performance gains in complex AI systems. Overall, "Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation" is a substantial contribution to the AI community, providing a valuable architectural advancement and a comprehensive benchmark for future investigations.

PDF Markdown

Related Papers

GitHub

GitHub - JinyuanLiu-CV/SegMiF: ICCV2023 | Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation (100 stars)