Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery (1709.00179v1)

Published 1 Sep 2017 in cs.CV

Abstract: Thanks to recent advances in CNNs, solid improvements have been made in semantic segmentation of high resolution remote sensing imagery. However, most of the previous works have not fully taken into account the specific difficulties that exist in remote sensing tasks. One of such difficulties is that objects are small and crowded in remote sensing imagery. To tackle with this challenging task we have proposed a novel architecture called local feature extraction (LFE) module attached on top of dilated front-end module. The LFE module is based on our findings that aggressively increasing dilation factors fails to aggregate local features due to sparsity of the kernel, and detrimental to small objects. The proposed LFE module solves this problem by aggregating local features with decreasing dilation factor. We tested our network on three remote sensing datasets and acquired remarkably good results for all datasets especially for small objects.

Citations (209)

View on Semantic Scholar

Summary

The paper introduces an LFE module that progressively reduces dilation factors to enhance local feature aggregation for accurate small object segmentation.
It addresses the limitations of aggressive dilation by mitigating sparse kernel weights, leading to marked improvements in Average Precision and Recall.
The proposed method shows strong potential for practical applications in urban planning, agriculture monitoring, and biomedical imaging.

Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery

This paper addresses the challenge of segmenting small object instances in high-resolution remote sensing imagery using Convolutional Neural Networks (CNNs). The primary difficulty in remote sensing segmentation lies in the small size and dense distribution of objects within the imagery, factors which traditional CNN approaches often fail to account for adequately. The authors propose a novel network architecture featuring a Local Feature Extraction (LFE) module, designed to complement a front-end module based on dilated convolutions.

Core Contributions

The proposed architecture is predicated on two key insights:

Aggressive Dilation Issues: While dilated convolutions are useful for increasing the receptive field without losing resolution, excessively increasing dilation factors can lead to a sparse distribution of kernel weights. This sparsity undermines the network's ability to aggregate local features, particularly detrimental when dealing with small objects.
Local Feature Extraction (LFE) Module: To counteract the drawbacks of increased dilation, the authors introduce the LFE module which progressively decreases the dilation factor through its layers. This adjustment enables denser aggregation of local features, effectively mitigating the shortcomings introduced by the sparse nature of dilated convolutions in capturing small objects.

Evaluation and Results

The proposed method was evaluated using three different remote sensing datasets. The results consistently demonstrate that the network architecture outperforms state-of-the-art models such as U-Net and DeepLab, particularly in its ability to handle small objects. For instance, on the Toyota City Dataset, the inclusion of the LFE module provided significant improvements in metrics like Average Precision (AP) and Average Recall (AR) across various object sizes, with a notable enhancement for very small objects. This performance boost underscores the efficacy of the decreasing dilation strategy in preserving spatial consistency and extracting local structures.

Numerical Analysis

Using Effective Receptive Field (ERF) analysis, the authors reveal that traditional dilated convolutions result in a problematic grid-like pattern in the ERF, which is indicative of spatial discrepancies and a lack of local feature integration. The inclusion of the LFE module successfully alleviates these issues, smoothing out the ERF and maintaining consistency across the feature map.

Implications and Future Directions

The findings suggest significant implications for future applications in remote sensing and beyond. The ability to accurately segment small, densely packed objects could enhance diverse tasks such as urban planning, agriculture monitoring, and environmental surveillance. The methodology also holds potential in other domains where small object segmentation is critical, such as biomedical imaging for cell segmentation and pedestrian detection in crowded environments.

This work opens avenues for further exploration into adaptive dilation strategies that could dynamically adjust dilation factors based on contextual understanding of the scene. Additionally, integrating the proposed architecture with other instance-aware segmentation pipelines could offer more robust solutions for complex segmentation tasks.

In conclusion, this paper presents a compelling case for revising conventional dilated convolution techniques, demonstrating through extensive experimentation and rigorous analysis that consideration of local feature density is crucial for effective small object segmentation in high-resolution remote sensing imagery.

PDF Markdown