Multi-label Classification with Partial Annotations using Class-aware Selective Loss (2110.10955v1)

Published 21 Oct 2021 in cs.CV

Abstract: Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset's partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. With our novel approach, we achieve state-of-the-art results on OpenImages dataset (e.g. reaching 87.3 mAP on V6). In addition, experiments conducted on LVIS and simulated-COCO demonstrate the effectiveness of our approach. Code is available at https://github.com/Alibaba-MIIL/PartialLabelingCSL.

Citations (34)

View on Semantic Scholar

Summary

The paper proposes a class-aware selective loss that leverages label likelihoods and priors to effectively manage partially annotated multi-label datasets.
It employs a temporary model to accurately estimate class distributions, addressing the limitations of naive positive counting.
Empirical results on datasets like OpenImages V6 demonstrate that the method outperforms conventional training modes with an mAP of 87.34%.

Multi-label Classification with Partial Annotations using Class-aware Selective Loss

This paper addresses the challenge of multi-label classification where datasets are partially annotated. Multi-label classification tasks often involve datasets with only a subset of labels annotated per sample—an issue particularly prevalent in large-scale datasets. The authors propose a class-aware selective loss approach to improve the classification performance on such partially annotated data.

Key Contributions

Selective Handling of Un-annotated Labels: The paper introduces a method for treating un-annotated labels based on their estimated likelihood and prior probability. This approach involves two main criteria:
- Label Likelihood: This is the estimated probability of a label being present in a particular image, derived from the model's predictions during training.
- Label Prior: Represents the prior probability of a label appearing in the dataset, estimated using a temporary model trained in Ignore mode.
Class Distribution Estimation: The authors discuss the difficulty in estimating class distributions in partially annotated datasets and propose using a dedicated temporary model to better estimate these distributions, as opposed to naíve counting of positive annotations which often misrepresents true distributions.
Partial Asymmetric Loss (P-ASL): The paper introduces an asymmetric loss function adaptable to multi-label scenarios, which emphasizes balancing by dynamically controlling the contribution from positive and negative samples, as well as decoupling focusing levels for annotated and un-annotated negative samples.

Empirical Results

The proposed method, evaluated on large-scale datasets like OpenImages V6, LVIS, and simulated versions of MS-COCO, showed superior performance compared to previous approaches. Notably, the method achieved an mAP of 87.34% on OpenImages V6, demonstrating its efficacy. Extensive experiments confirmed that their selective approach significantly outperforms conventional Ignore and Negative training modes.

Implications

The class-aware selective approach offers a profound improvement in handling the intricacies of partially annotated data in large-scale settings. It alleviates the noise introduced by un-annotated positives when treated as negatives (as in the Negative mode) and deals with the limited decision boundary associated with the Ignore mode.

Future Directions

The method opens pathways for further exploration in refining how machine learning models interpret and learn from incomplete annotation landscapes. Potential future directions include:

Exploring more sophisticated probabilistic models or neural architectures for better estimation of label likelihood and prior.
Adapting this methodology to dynamic annotation scenarios where annotation density could change or evolve over time.
Integrating this approach with active learning frameworks, potentially guiding annotation processes in large-scale settings more effectively.

In conclusion, the paper provides a robust technique for improving multi-label classification performance amidst partial annotations, which is a pertinent challenge in the era of big data and large-scale AI systems. This method could significantly impact practical applications involving incomplete data annotations, prevalent across various real-world domains.

PDF Markdown

Related Papers

GitHub

GitHub - Alibaba-MIIL/PartialLabelingCSL: Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss" (129 stars)