Knowledge Adaptation for Efficient Semantic Segmentation (1903.04688v1)

Published 12 Mar 2019 in cs.CV

Abstract: Both accuracy and efficiency are of significant importance to the task of semantic segmentation. Existing deep FCNs suffer from heavy computations due to a series of high-resolution feature maps for preserving the detailed knowledge in dense estimation. Although reducing the feature map resolution (i.e., applying a large overall stride) via subsampling operations (e.g., pooling and convolution striding) can instantly increase the efficiency, it dramatically decreases the estimation accuracy. To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride. To handle the inconsistency between the features of the student and teacher network, we optimize the feature similarity in a transferred latent domain formulated by utilizing a pre-trained autoencoder. Moreover, an affinity distillation module is proposed to capture the long-range dependency by calculating the non-local interactions across the whole image. To validate the effectiveness of our proposed method, extensive experiments have been conducted on three popular benchmarks: Pascal VOC, Cityscapes and Pascal Context. Built upon a highly competitive baseline, our proposed method can improve the performance of a student network by 2.5\% (mIOU boosts from 70.2 to 72.7 on the cityscapes test set) and can train a better compact model with only 8\% float operations (FLOPS) of a model that achieves comparable performances.

Authors (6)

Tong He (124 papers)
Chunhua Shen (404 papers)
Zhi Tian (68 papers)
Dong Gong (56 papers)
Changming Sun (21 papers)
Youliang Yan (31 papers)

Citations (216)

View on Semantic Scholar

Summary

The paper introduces a knowledge distillation framework that uses a pre-trained autoencoder to compress high-resolution feature maps and improve adaptation.
The approach incorporates an affinity distillation module to capture long-range spatial dependencies, enhancing the student network's understanding.
Extensive experiments show a 2.5% mIOU gain on Cityscapes with only 8% of the FLOPS, proving significant efficiency in resource-limited settings.

Knowledge Adaptation for Efficient Semantic Segmentation

The paper presents an innovative approach to enhancing the performance of semantic segmentation models through knowledge distillation techniques specifically tailored for this task. The authors address the inherent trade-offs between accuracy and computational efficiency which often plague Fully Convolutional Networks (FCNs) used in semantic segmentation. The primary contribution of this work is a customized knowledge distillation framework designed to transfer knowledge from a larger teacher network to a compact student network more effectively, achieving a balance between computational efficiency and prediction accuracy.

The approach introduces several novel components, most notably the use of a pre-trained autoencoder to mediate the distillation process. This autoencoder takes high-resolution feature maps generated by the teacher network and translates them into a latent space. This compresses complex feature representations into a more compact format that maintains critical information and eliminates redundancies, easing the learning burden on the student network. The student model is trained to replicate this transformed knowledge, rather than directly mimicking high-resolution output from the teacher network, thereby facilitating better adaptation despite differences in network architectures.

In addition to this knowledge adaptation mechanism, the work proposes an affinity distillation module. This module is vital as it explicitly captures long-range dependencies that small models often struggle to learn due to limited receptive fields. By computing and disciplining non-local interactions across the input image, the student network gains a better understanding of spatial dependencies, contributing to enhanced segmentation performance.

The efficacy of the proposed method is validated through extensive experiments on three popular benchmarks: Pascal VOC, Cityscapes, and Pascal Context. The results demonstrate that the approach significantly boosts the performance of compact models. For instance, the method achieves a 2.5% mean Intersection over Union (mIOU) increase on the Cityscapes test set, while utilizing only 8% of the Floating Point Operations (FLOPS) required by models delivering comparable results. This improvement is achieved without increasing the model's parameter count, which is critical for deploying models in resource-constrained environments.

This research has profound implications for the deployment of semantic segmentation models in real-world applications, such as autonomous driving and video surveillance, where computational resources are limited. By optimizing the trade-off between accuracy and efficiency, the proposed framework extends the practical applicability of semantic segmentation models in various domains.

In terms of future directions, this knowledge distillation technique could be adapted and generalized for other dense prediction tasks beyond semantic segmentation, potentially enhancing model efficiency in domains such as object detection or scene understanding. Additionally, exploring other forms of representation learning for knowledge adaptation, or integrating advanced distillation strategies, may reveal further efficiencies and improvements in model performance.

In summary, this paper offers a targeted solution to one of the critical limitations of semantic segmentation models, making significant strides in improving both computational efficiency and segmentation accuracy through an inventive application of knowledge distillation principles.

PDF Markdown

Knowledge Adaptation for Efficient Semantic Segmentation (1903.04688v1)

Summary

Knowledge Adaptation for Efficient Semantic Segmentation

Related Papers