Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training (2006.11280v1)

Published 22 Jun 2020 in cs.LG and stat.ML

Abstract: Many real-world applications have to tackle the Positive-Unlabeled (PU) learning problem, i.e., learning binary classifiers from a large amount of unlabeled data and a few labeled positive examples. While current state-of-the-art methods employ importance reweighting to design various risk estimators, they ignored the learning capability of the model itself, which could have provided reliable supervision. This motivates us to propose a novel Self-PU learning framework, which seamlessly integrates PU learning and self-training. Self-PU highlights three "self"-oriented building blocks: a self-paced training algorithm that adaptively discovers and augments confident positive/negative examples as the training proceeds; a self-calibrated instance-aware loss; and a self-distillation scheme that introduces teacher-students learning as an effective regularization for PU learning. We demonstrate the state-of-the-art performance of Self-PU on common PU learning benchmarks (MNIST and CIFAR-10), which compare favorably against the latest competitors. Moreover, we study a real-world application of PU learning, i.e., classifying brain images of Alzheimer's Disease. Self-PU obtains significantly improved results on the renowned Alzheimer's Disease Neuroimaging Initiative (ADNI) database over existing methods. The code is publicly available at: https://github.com/TAMU-VITA/Self-PU.

Citations (74)

View on Semantic Scholar

Summary

The paper presents the Self-PU framework, which enhances PU learning by integrating self-paced training, self-calibrated loss, and self-distillation.
It dynamically identifies and augments confident samples while calibrating loss functions to improve supervision from weakly labeled data.
Experimental evaluations on standard benchmarks and ADNI brain scans demonstrate significant performance improvements over existing PU methods.

Self-PU: Innovative Enhancements for Positive-Unlabeled Training

The paper proposes a novel framework for Positive-Unlabeled (PU) learning, addressing the challenges of learning binary classifiers when only partial supervision is available. While traditional approaches have leveraged importance reweighting to estimate the risk from PU data, they often overlook the potential for leveraging self-supervised techniques to improve learning performance. This oversight is addressed by introducing Self-PU, a framework that integrates self-training and PU learning, utilizing three self-oriented components: self-paced training, self-calibrated loss, and self-distillation.

Core Contributions

Self-Paced Training: This component dynamically identifies and augments confident positive and negative samples as training progresses. This adaptive discovery and labeling process ensures that both easy and hard samples are appropriately used during training, enhancing classification robustness and reducing the bias inherently present in PU data.
Self-Calibrated Loss: The paper introduces an innovative meta-learning approach for loss reweighting, allowing fine-grained calibration of loss functions applied to unconfident examples. This calibration ensures better supervision from weakly labeled data, enhancing the model's ability to distinguish between positive and negative classes.
Self-Distillation: A teacher-student learning paradigm is employed as an effective regularization mechanism, where collaborative training between multiple networks introduces consistency constraints that stabilize learning and boost predictive accuracy.

Experimental Evaluation

The Self-PU framework was rigorously evaluated against standard benchmarks for PU learning (MNIST and CIFAR-10). Demonstrating superior performance, Self-PU surpassed existing methods by significant margins, indicating that leveraging self-supervised techniques indeed offers substantial benefits over conventional reweighting strategies. Notably, a real-world application was explored: classifying brain images from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Here, Self-PU achieved remarkable improvements over prior methods, supporting its practicality in complex and noisy datasets.

Implications and Future Directions

The findings present promising implications for PU learning, advocating for the inclusion of self-supervised approaches to better exploit the available supervisory signals in PU datasets. The experimental success of Self-PU on ADNI data sets a new precedent for applying PU learning to critical healthcare tasks, such as early Alzheimer's disease diagnosis, where accurate classification is paramount yet difficult due to mixed-labeled examples.

Future research may explore more sophisticated self-supervised techniques to further enhance PU learning efficacy. Exploring additional real-world applications in scientific and medical domains could also validate and enrich the practical impact of Self-PU, extending its influence beyond traditional PU learning boundaries. Additionally, integrating methods that address class imbalance inherently present in PU scenarios could lead to further advancements in classification accuracy.