Prototype Mixture Models for Few-shot Semantic Segmentation (2008.03898v2)

Published 10 Aug 2020 in cs.CV

Abstract: Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose. Using a single prototype acquired directly from the support image to segment the query image causes semantic ambiguity. In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation. Estimated by an Expectation-Maximization algorithm, PMMs incorporate rich channel-wised and spatial semantics from limited support images. Utilized as representations as well as classifiers, PMMs fully leverage the semantics to activate objects in the query image while depressing background regions in a duplex manner. Extensive experiments on Pascal VOC and MS-COCO datasets show that PMMs significantly improve upon state-of-the-arts. Particularly, PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82\% with only a moderate cost for model size and inference speed.

Authors (5)

Boyu Yang (10 papers)
Chang Liu (864 papers)
Bohao Li (20 papers)
Jianbin Jiao (51 papers)
Qixiang Ye (110 papers)

Citations (333)

View on Semantic Scholar

Summary

The paper presents a novel PMM approach that replaces single prototypes with multiple localized representations to overcome semantic ambiguity in few-shot segmentation.
It leverages an Expectation-Maximization algorithm with cosine distance and VMF distribution to iteratively refine prototype vectors from limited support examples.
Empirical results on Pascal VOC and MS COCO demonstrate a significant boost in segmentation accuracy, including a 5.82% improvement in 5-shot tasks.

Prototype Mixture Models for Few-shot Semantic Segmentation

The paper "Prototype Mixture Models for Few-shot Semantic Segmentation" introduces an innovative approach to addressing the challenges present in few-shot semantic segmentation tasks. Generally, few-shot segmentation aims to identify object parts in a query image based on a limited number of labeled examples, called support images. The difficulty in this task arises from significant differences in appearance and pose between objects in the support and query images, requiring robust models to effectively generalize beyond the limited examples.

Key Contributions

The authors propose Prototype Mixture Models (PMMs) to address the issue of semantic ambiguity inherent in previous prototype-based models. Traditionally, a single prototype for each class is extracted from support images using global average pooling—a strategy that often fails due to its disregard for the spatial diversity of object parts. The PMM framework improves upon this by employing multiple prototypes that offer a localized representation of different regions, thus capturing more detailed semantics. These prototypes are derived via an Expectation-Maximization (EM) algorithm which is delicately built to handle spatial and channel-wise semantics for better segmentation accuracy.

Methodology

The proposed method consists of several key processes:

Training of PMMs: During training, the support image features are segmented into foreground and background. PMMs learn these features, categorizing them into different object part prototypes.
EM Algorithm: The EM algorithm is employed to update and refine the prototype vectors iteratively. The choice of cosine distance, incorporated within the von Mises-Fisher (VMF) distribution, facilitates effective alignment with the metric learning framework applied.
Segmentation: PMMs are utilized in a 'duplex' strategy during segmentation. Firstly, prototypes activate specific spatial locations in query feature maps, enhancing relevant channels (P-Match). Secondly, prototypes are treated as classifiers to yield probabilistic segmentations for query images (P-Conv).

Numerical Results and Insights

The empirical results presented in the paper reinforce the capability of PMMs in improving segmentation performance. On the Pascal VOC and MS COCO datasets, PMMs outperform existing methods by significant margins. For instance, a reported improvement of 5.82% in 5-shot segmentation on MS COCO illuminates the potential of PMMs in few-shot contexts. Moreover, the residual stacking strategy for PMMs (termed RPMMs) further augments accuracy through strategically guided residual learning.

Implications and Future Directions

The implications of PMMs in the field of few-shot learning are multifaceted. Firstly, they demonstrate how extending global prototypes to nuanced prototype mixtures can alleviate common issues such as semantic ambiguity and representation capacity shortages. Furthermore, the EM algorithm as utilized in PMMs establishes a precedent for the application of unsupervised learning techniques in a supervised learning setup to enhance model robustness to intra-class variance.

Future directions could explore the scalability of such models in larger datasets and more complex environments. Investigating alternative strategies for mixing prototypes or integrating domain adaptation processes might further enhance segmentation adaptability and accuracy. Additionally, extending the PMM framework to other tasks like few-shot object detection and recognition could tap into novel avenues of application.

In conclusion, this paper successfully delivers a well-thought-out framework that leverages prototype mixture modeling for few-shot semantic segmentation, paving the way for future research endeavours harnessing sophisticated prototype learning techniques.