Incremental Learning Techniques for Semantic Segmentation (1907.13372v4)

Published 31 Jul 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Deep learning architectures exhibit a critical drop of performance due to catastrophic forgetting when they are required to incrementally learn new tasks. Contemporary incremental learning frameworks focus on image classification and object detection while in this work we formally introduce the incremental learning problem for semantic segmentation in which a pixel-wise labeling is considered. To tackle this task we propose to distill the knowledge of the previous model to retain the information about previously learned classes, whilst updating the current model to learn the new ones. We propose various approaches working both on the output logits and on intermediate features. In opposition to some recent frameworks, we do not store any image from previously learned classes and only the last model is needed to preserve high accuracy on these classes. The experimental evaluation on the Pascal VOC2012 dataset shows the effectiveness of the proposed approaches.

Authors (2)

Umberto Michieli (40 papers)
Pietro Zanuttigh (35 papers)

Citations (218)

View on Semantic Scholar

Summary

Incremental Learning Techniques for Semantic Segmentation

The paper "Incremental Learning Techniques for Semantic Segmentation" by Umberto Michieli and Pietro Zanuttigh tackles the specific problem of incremental learning within the context of semantic segmentation. The paper is significant, considering the existing deep learning frameworks often suffer from catastrophic forgetting when they sequentially learn new tasks. This phenomenon is particularly problematic in dynamic settings where models need to adapt to new data while preserving knowledge about previously learned classes. Unlike conventional approaches that focus predominantly on image classification and object detection, this paper addresses the intricacies and challenges of incrementally learning pixel-wise labeling tasks.

Problem Definition and Challenges

Incremental learning, especially in semantic segmentation, presents distinctive challenges. The crucial difficulty lies in the model's ability to incorporate new classes from incoming data streams without accessing or storing past task data, a constraint essential for applications with privacy concerns or limited storage capabilities. Unlike image classification tasks where images generally contain a single object, semantic segmentation requires dense labeling where an image can encompass multiple classes, including both old and new ones, adding to the complexity of the task.

Methodology

The paper proposes a series of strategies to mitigate catastrophic forgetting in semantic segmentation, executed on the Deeplab v2 architecture utilizing a ResNet-101 backbone. Two key loss mechanisms are introduced for the incremental semantic segmentation framework:

Distillation on the Output Layer (\mathcal{L}_{D}'): This loss measures cross-entropy between the softmax outputs of the current and previous models, restricted to previously learned classes, thereby ensuring that prior knowledge is retained.
Distillation on Intermediate Feature Space (\mathcal{L}_{D}''): Here, an $L2$ loss is applied to the feature representations between the models, maintaining stability of learned features across tasks.

Additionally, an encoder-freezing approach is proposed, where portions of the model responsible for feature extraction are frozen to prevent alteration during updates, reinforcing feature consistency for previously learned classes.

Experimental Evaluation

The methodologies were tested on the Pascal VOC2012 dataset, comparing performance against standard fine-tuning approaches. Fine-tuning showed significant declines in performance due to catastrophic forgetting. In contrast, applying distillation losses—particularly when combined with encoder freezing—demonstrated improvements in retaining prior knowledge and improving mean Intersection over Union (mIoU) scores in the incremental learning setting.

The \mathcal{L}_{D}'' method showed promising results, with improvements up to 6.5% in mIoU over simple fine-tuning, affirming its utility in preserving intermediate representation stability.
Encoder freezing combined with output distillation (\mathcal{L}_{D}') resulted in even greater retention of previous task performance while learning new classes, underscoring the benefit of limiting changes to the encoder.

Implications and Future Directions

The implications of these findings are two-fold: practically, the approach enables more dynamic and adaptive computer vision systems which can be beneficial in contexts like autonomous driving, where conditions and scenes continually evolve. Theoretically, this research progresses the understanding of maintaining model stability across sequential learning phases—a critical component as artificial intelligence systems are deployed in real-world environments.

Moving forward, future work could expand on this foundational paper by incorporating generative approaches, such as GANs, to synthesize pseudo-samples of previously seen classes, thus aiding in memory efficiency. Additionally, extending the paper to more diverse and large-scale datasets could further validate and refine these techniques, achieving broader applicability and robustness in incremental learning frameworks for semantic segmentation.

PDF Markdown

Related Papers

Find Related Papers