Dual Graph Convolutional Network for Semantic Segmentation (1909.06121v3)

Published 13 Sep 2019 in cs.CV

Abstract: Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation. In contrast to previous work that uses multi-scale feature fusion or dilated convolutions, we propose a novel graph-convolutional network (GCN) to address this problem. Our Dual Graph Convolutional Network (DGCNet) models the global context of the input feature by modelling two orthogonal graphs in a single framework. The first component models spatial relationships between pixels in the image, whilst the second models interdependencies along the channel dimensions of the network's feature map. This is done efficiently by projecting the feature into a new, lower-dimensional space where all pairwise interactions can be modelled, before reprojecting into the original space. Our simple method provides substantial benefits over a strong baseline and achieves state-of-the-art results on both Cityscapes (82.0% mean IoU) and Pascal Context (53.7% mean IoU) datasets. Code and models are made available to foster any further research (\url{https://github.com/lxtGH/GALD-DGCNet}).

Citations (164)

View on Semantic Scholar

Summary

The paper introduces a dual graph convolutional framework that models both spatial and channel relationships to improve segmentation accuracy.
It efficiently captures long-range spatial dependencies using coordinate space graph convolutions with feature downsampling.
DGCNet achieves state-of-the-art IoU scores on Cityscapes and Pascal Context, validating its effective global contextual modeling.

Dual Graph Convolutional Network for Semantic Segmentation

The paper "Dual Graph Convolutional Network for Semantic Segmentation" introduces a novel approach to address the challenges of semantic segmentation, particularly the need for capturing long-range contextual information crucial for pixel-wise predictions. Unlike traditional methods that rely heavily on multi-scale feature fusion or dilated convolutions, the proposed method leverages a graph-convolutional network (GCN) framework designed to efficiently model global context through a dual-graph system.

Core Contributions

The Dual Graph Convolutional Network (DGCNet) encompasses two critical components:

Coordinate Space Graph Convolution (GCN-S): This component explicitly models spatial relationships between image pixels. By downsampling the input features, it constructs a manageable graph representation. This approach allows for efficient information propagation to capture dependencies within the spatial dimensions of the image.
Feature Space Graph Convolution (GCN-F): This component focuses on the interdependencies across channel dimensions of the network’s feature map. Given that filters in deep layers often respond to complex object parts, modeling the relationships between these elements can enhance the segmentation process.

These two components collectively improve semantic segmentation by leveraging both spatial and channel-wise contextual relationships.

Empirical Evidence

The authors provide strong empirical results to validate their approach. DGCNet demonstrates state-of-the-art performance on prominent datasets such as Cityscapes and Pascal Context, achieving 82.0% mean Intersection over Union (IoU) on Cityscapes and 53.7% mean IoU on Pascal Context. These results surpass existing state-of-the-art methods, indicating the efficacy of modeling contextual information through dual graph convolutions.

Theoretical and Practical Implications

The proposed dual graph convolutional framework enriches the theoretical understanding of semantic segmentation by illustrating how orthogonal graph structures can jointly capture different dimensions of contextual information. This advance contributes new insights to the application of GCNs in vision tasks beyond their traditional use cases.

Practically, the framework delivers robust performance improvements while maintaining manageable computational demands. The employment of spatial downsampling and feature-space projection ensures that the model remains efficient, even with the high-resolution data typical of segmentation tasks.

Speculations on Future Developments

Given the success demonstrated by DGCNet, a natural extension of this research lies in its application to other dense prediction tasks such as instance segmentation, depth estimation, and even video segmentation. Additionally, exploring different graph construction and projection strategies within the dual framework could further enhance efficiency and accuracy. There is also potential for integrating these methods with transformer models to capitalize on their strengths in handling global dependencies.

In conclusion, the paper presents a solid advancement in semantic segmentation, elevating the potential for graph-based methods in modeling complex contextual relationships. Its dual graph convolutional approach offers a promising pathway for future research and application in diverse computer vision tasks.

PDF Markdown

Related Papers

GitHub

GitHub - lxtGH/GALD-DGCNet: Source code and model GALD net (BMVC-2019) and Dual-Seg Net (BMVC-2019) (345 stars)