Yes–No Bundle with Negative Risk (YNB-NR)
- YNB-NR is a phenomenon in positive-unlabeled learning where negative empirical risks lead to overfitting in highly flexible models.
- The non-negative risk estimator replaces the negative risk term with its non-negative counterpart, ensuring the empirical risk remains bounded and reliable.
- Empirical results on deep models and large-scale datasets demonstrate that nnPU significantly improves classifier robustness compared to unbiased estimators.
A Yes–No Bundle with Negative Risk (abbreviated as YNB-NR, Editor's term) refers to the situation in positive-unlabeled (PU) learning where a binary classifier, trained to distinguish "yes" (positive) examples from both "no" (negative) and unlabeled examples, suffers from negative-valued empirical risk due to the structure of standard unbiased PU risk estimators. This negative-risk phenomenon is especially prevalent when employing highly flexible models (e.g., deep neural networks) and can lead to severe overfitting, undermining the reliability of the resulting classifier. The non-negative risk estimator (nnPU) was proposed to address the YNB-NR pathology by replacing the core risk term with its non-negative part, restoring robustness without loss in statistical consistency (Kiryo et al., 2017).
1. PU Classification and the Structure of Risk Estimators
In the PU learning framework, the objective is to learn a binary decision function from positive () and unlabeled () data, given an unknown class prior . The standard risk for binary classification is decomposed as:
where and ; is the loss function.
In practice, only and samples are available. The unbiased PU empirical risk is constructed as:
with empirical averages calculated from the positive and unlabeled samples as described in the original paper. Crucially, the subtraction introduces the possibility for to become negative and arbitrarily small, especially with expressive models and unbounded loss functions.
2. Negative Risk and Overfitting Pathology
The capacity for to attain negative values (i.e., the negative risk issue) is central to the YNB-NR pathology. Unlike the true risk, which satisfies , the empirical risk can become unbounded below for flexible models, since the term is optimized by minimizing on positive data, sometimes at the expense of generalization. This leads to severe overfitting, wherein the classifier fits noise or outliers in the positive set, driving down empirical risk beyond the meaningful range.
Empirical results from deep models (e.g., multi-layer perceptrons with ReLU or Softsign, CNNs) demonstrate that, under the unbiased estimator, the training loss can decrease below zero while the test loss increases, exposing the overfitting incurred by the negative risk estimator.
3. Non-Negative Risk Estimator
To address the negative-risk phenomenon, Kiryo et al. propose the non-negative risk estimator:
Here, the term estimates the contribution of unseen negative data; by thresholding at zero, the estimator ensures empirical risks remain non-negative, matching the theoretical lower bound of the true risk. This modification both enforces non-negativity and regularizes against the excessive negative bias responsible for overfitting.
The intuition is that the negative part of the empirical risk does not contain meaningful information about classifier quality but rather reflects over-exploitation of finite-sample noise and model flexibility. The non-negative correction removes this spurious signal without sacrificing estimator consistency.
4. Theoretical Guarantees
Bias and Consistency
Under mild conditions (bounded loss , minimal signal in ), the bias of the non-negative estimator decays exponentially with sample size:
where bounds the loss and is exponentially small in the sample size. Moreover, the estimation error satisfies, with probability at least ,
Mean-Squared Error
The non-negative estimator achieves strictly lower mean-squared error (MSE) than the unbiased estimator when satisfies and under other mild assumptions. Quantitatively,
If a tolerance is accepted, the reduction in MSE is further bounded as
Generalization and Estimation Error
Let denote the true risk minimizer and the empirical minimizer under across a hypothesis class . With standard complexity controls (Lipschitzness, Rademacher complexities), the following bound holds with high probability:
demonstrating that the learning rate is unaffected by the non-negative correction.
5. Algorithmic Implementation
In large-scale scenarios, the non-negativity enforcement is incorporated into minibatch-based stochastic optimization. The core procedure maintains the clipping behavior inside each optimization step, supporting efficient execution with deep nets.
PU-ERM Algorithm (Minibatch-based):
1 2 3 4 5 6 7 8 |
r_i = U_minus(g; U^i) - p * R_minus(g; P^i) if r_i >= -β: grad = ∇θ [p * R_plus(g; P^i) + r_i] step_size = η else: grad = ∇θ [p * R_plus(g; P^i) - r_i] step_size = γ * η θ = optimizer_update(θ, grad, step_size) |
6. Experimental Evaluation and Empirical Findings
Evaluation was conducted on diverse datasets: MNIST (even vs. odd), epsilon (LIBSVM), 20Newsgroups (binary subset), and CIFAR-10 (vehicles vs. animals). The positive prior ranged from 0.4 to 0.5; models included 6-layer MLPs, embedding-based text networks, and deep CNNs with over 13 layers, using sigmoid surrogate loss and regularization. Both Adam and AdaGrad optimizers were utilized, with typical minibatch sizes (e.g., 128).
Key empirical observations:
- The unbiased PU estimator ("uPU") produces negative and unbounded empirical risks, resulting in pronounced overfitting as model flexibility increases.
- The non-negative estimator ("nnPU") consistently maintains the training risk above zero, with no observed overfitting, even for deep networks or limited positive samples.
- nnPU often yields lower test errors compared to both the unbiased PU estimator and traditional positive–negative (PN) learners, particularly when negative labels are scarce.
- nnPU is robust to mild over-estimation of the class prior , whereas under-estimation degrades performance more significantly.
7. Implications and Practical Significance
The introduction of the non-negative risk estimator fundamentally addresses the risk negativity inherent in standard PU learning workflows, especially in deep learning contexts. This enables the design of robust YNB classifiers in settings where negative-labeled data is unavailable or expensive to acquire, extending the practical applicability of PU learning. The consistent empirical improvements and preserved theoretical guarantees underline the efficacy and generalizability of this approach. A plausible implication is that the non-negativity principle may serve as a general regularization tool in other semi-supervised or weakly supervised risk estimation settings.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free