Single Positive Multi-Label Learning

Updated 1 September 2025

SPMLL is a weak supervision paradigm where each instance has one confirmed positive label and all other labels remain unobserved, reducing annotation costs.
It employs tailored risk estimators and loss functions to mitigate challenges from extreme label sparsity, noise, and class imbalance.
Innovations such as pseudo-labeling, entropy maximization, and bias-aware calibration enable performance that approaches fully supervised multi-label models.

Single Positive Multi-Label Learning (SPMLL) is a structured weak supervision paradigm for multi-label classification in which each training instance is annotated with exactly one confirmed positive label and all other potential labels remain unobserved. This scenario, representing an extreme case of missing labels, is especially pertinent in domains where exhaustive multi-label annotation is prohibitively expensive or practically infeasible. SPMLL raises critical challenges regarding noise, bias, and imbalance, but has enabled a wide array of algorithmic innovations that approach the utility of fully supervised multi-label models while greatly reducing the annotation burden.

1. Problem Definition and Motivation

In standard multi-label classification, an instance $x$ is annotated with a binary label vector $y \in \{0,1\}^L$ , each entry indicating the presence or absence of one of $L$ classes. By contrast, SPMLL restricts supervision to a single confirmed positive label per instance; the remainder are unobserved (denoted as $\emptyset$ ). No explicit negatives are ever provided. Such label sparsity is encountered in settings with limited annotation budgets, combinatorial label spaces, species distribution (presence-only data), or human-centric tasks where annotators reliably report only the most salient object or category per instance.

The primary challenges in SPMLL arise from (i) extreme label sparsity, (ii) an absence of confirmed negatives, making naive extensions of classical partial-label methods degenerate, (iii) severe class imbalance, and (iv) unreliable inference of inter-class correlations. Overcoming these obstacles requires both theoretical innovations in risk estimation and practical techniques for mitigating supervision noise and bias (Cole et al., 2021, Zhou et al., 2022, Xu et al., 2022, Arroyo et al., 2023).

2. Algorithmic Approaches and Loss Functions

SPMLL methods can be broadly categorized by their treatment of unobserved labels and their strategies for risk estimation:

Label Handling Strategies:

Assume-Negative (AN): All unobserved labels are treated as negatives, leading to the loss

$\mathcal{L}_{AN}(f_n, z_n) = -\frac{1}{L} \sum_i [\mathbb{1}\{z_{ni}=1\}\log(f_{ni}) + \mathbb{1}\{z_{ni} \neq 1\}\log(1-f_{ni})].$

This approach is prone to a high rate of false negatives and is especially damaging in SPMLL due to the prevalence of unobserved positives (Cole et al., 2021).

Weak Assume-Negative and Label Smoothing: To mitigate the harshness of the AN assumption, the negative loss term is down-weighted (WAN) with $\gamma=1/(L-1)$ , or soft label targets (label smoothing, with small $\epsilon$ ) are used for missing labels (AN-LS) (Cole et al., 2021).
Treat as Unknown (Entropy Maximization): Instead of assigning hard pseudo-labels, all unannotated labels are treated as unknown. Entropy-Maximization (EM) loss applies a regularizing term that maximizes the entropy for missing labels, resulting in low-gradient ambiguous predictions:

$\mathcal{L}_{EM}(f_n, z_n) = -\log(f_{nc^+}) + \alpha\, H(f_{nc^-})$

where $H(p) = -p\log p - (1-p)\log(1-p)$ and $c^+$ is the annotated class (Zhou et al., 2022).

Pseudo-Labeling and Label Enhancement: Approaches such as ROLE, SMILE, and AEVLP iteratively estimate pseudo-labels. The most advanced methods refine soft pseudo-label estimates using criteria such as expectation-maximization (ROLE), variational inference with latent label modeling (SMILE), and dynamic CLIP-based pseudo-label generators (AEVLP) (Cole et al., 2021, Xu et al., 2022, Tran et al., 28 Aug 2025). The Generalized Pseudo-Label Robust (GPR) loss (Tran et al., 28 Aug 2025) and Generalized Robust Loss (Chen et al., 6 May 2024) subsume and extend earlier methods by weighting loss terms based on pseudo-label confidence, prior expected label counts, and dynamically adapting to pseudo-label quality.
Regularization with Expected Positives and High-Rank Priors: Batch-level constraints match the expected number of positive labels per instance to a known scalar or regularize with a high-rankness term $-\lambda \log\det(\mathbf{Y}_{\text{pred}}^\top \mathbf{Y}_{\text{pred}})$ to encourage diversity among label predictions (Cole et al., 2021, Li et al., 2023).

Unified Frameworks and Extensions:

Numerous strategies are unified via risk decoupling frameworks that distinguish observed positives from missing/unobserved labels through estimated confidence-weighted losses, yielding flexible coordination of the trade-off between false positives and false negatives (Chen et al., 6 May 2024). Notably, many classic and modern SPMLL loss functions emerge as special cases of such general frameworks.

3. Theoretical Risk Estimation and Guarantees

A central goal in SPMLL is to develop estimators that, despite training with only single positive labels, achieve empirical risk minimization consistent with the fully supervised multi-label risk. Several papers formalize unbiased risk estimators and provide convergence guarantees:

The general approach is to decompose the risk in terms of the observed $(x,\gamma)$ pairs, and employ soft label estimations $d^j = p(y^j=1|x)$ to recover unobserved positive labels. The key estimator is

$R_{sp}(f) = \mathbb{E}_{x,\gamma}\left[\frac{1}{c\,p(y^\gamma=1|x)} \sum_j d^j \ell^j + (1-d^j)\bar{\ell}^j \right]$

with $\ell^j = \log f_j(x)$ , $\bar{\ell}^j = \log(1-f_j(x))$ (Xu et al., 2022).

When soft labels are inferred from data and feature-space geometry (e.g., via graph-structured variational inference in SMILE), the entire procedure is shown to preserve risk consistency; the excess risk is bounded as

$R(\hat{f}_{sp}) - R(f^*) \leq 4\sqrt{2} \kappa c (\rho^+ + \rho^-) \sum_j \mathcal{R}_n(\mathcal{H}_{y_j}) + M\sqrt{(2\log(2/\delta))/n}$

where $\mathcal{R}_n(\cdot)$ is the empirical Rademacher complexity (Xu et al., 2022).

CRISP (Liu et al., 2023) introduces class-prior estimation with theorems bounding the estimation error of the class-priors and showing that the empirical risk minimizer using these priors converges to the fully supervised minimizer. The empirical risk incorporates the estimated priors and aligns expected outputs for unlabeled data with these calibrated priors.

4. Impact of Data Bias and Empirical Evaluation Protocols

A notable methodological advance is the introduction of explicit bias models for simulating the selection of the single positive label in an otherwise multi-positive ground-truth setting (Arroyo et al., 2023). Instead of randomly selecting a positive, models such as size bias, location bias, and semantic bias define sampling distributions according to object area, centrality, or empirical mention frequency:

Uniform: $P_{\text{uniform}}(i) = \mathbb{1}_{[y_i=1]} / \#\{i: y_i=1\}$
Size-based: $P_{\text{size}}(i) \propto \text{area of object}_i$
Location-based: $P_{\text{location}}(i) \propto 1/\text{distance to center}$
Semantic-based: $P_{\text{semantic}}(i) \propto \text{annotator frequency}_i$

Empirical findings show that absolute method performance can drop substantially (notably—7.3 mAP for size-bias vs. uniform), but the relative ranking of algorithms (e.g., ROLE and EM outperforming AN/AN-LS) is often stable, suggesting that uniform benchmarks are a reasonable, but potentially optimistic, surrogate for real-world bias scenarios (Arroyo et al., 2023).

5. Applications, Extensions, and Practical Considerations

SPMLL is directly motivated by applications where dense annotation is impossible: large-scale image or video tagging, context recognition (where verbs are semantically ambiguous and only one is given, as in imSitu situation recognition (Lin et al., 29 Aug 2025)), species distribution modeling, and many web-mined datasets. The methodology is relevant for transfer to:

Generative modeling: S2M sampling enables conditional GANs trained under SPML to produce multi-label outputs using joint density estimation via MCMC (Cho et al., 2022).
Zero-shot recognition: Vision-language networks, graph-based label correlations, and pseudo-labeling (e.g., SigRL (Zhang et al., 4 Apr 2025), VLPL (Xing et al., 2023)) allow generalization to unseen classes by infusing external semantic priors.
Patch-based architectures: Lightweight models leveraging spatial self-similarity and local attention can be trained from scratch on SPML annotations, matching or approaching larger pre-trained models (Jouanneau et al., 2022).
Structured output tasks: In situation recognition, the SPMLL formulation better reflects the natural ambiguity of verb descriptions, and models such as GE-VerbMLP use GCNs to capture label correlations and adversarial training for robust separation (Lin et al., 29 Aug 2025).

Annotation bias, label noise introduced by over-reliance on negative assumptions, severe class or instance imbalance, and unreliable pseudo-labeling are all recurring issues. Empirical studies show that techniques which either model the annotation bias (CRISP, bias-aware benchmarks), use robust risk estimators and batch-level regularization, or rely on dynamic (epoch-wise) and multi-focus pseudo-labeling (DAMP) (Tran et al., 28 Aug 2025) are superior with respect to both mean average precision and stability under varied data and supervision regimes.

6. Future Directions

Prominent topics for further research include:

Bias-aware SPMLL: Explicit modeling of human annotation bias and its correction during training or sampling (Arroyo et al., 2023, Liu et al., 2023); exploring priors or reweighting schemes to handle non-uniform label frequencies.
Robust pseudo-labeling: Enhanced dynamic strategies for pseudo-label assignment from multi-modal (vision-language) sources, especially methods that avoid confirmation bias or compounding of errors across epochs (Xing et al., 2023, Tran et al., 28 Aug 2025).
Unified frameworks: Continued formalization of risk minimization, with tractable estimators and calibration for class imbalance and noise; theoretical analyses of convergence under dynamic pseudo-labels or external semantic priors (Xu et al., 2022, Chen et al., 6 May 2024, Tran et al., 28 Aug 2025).
Cross-modal and zero-shot learning: Application of language-driven models and ensemble semantic guidance for unseen label generalization or transfer learning (Xing et al., 2023, Zhang et al., 4 Apr 2025).
Scalable, architecture-agnostic solutions: Efficient techniques deployable without extensive pretraining, leveraging graph-based label interaction modules or light, patch-based encoders (Jouanneau et al., 2022, Zhang et al., 4 Apr 2025).

Broader impacts include the construction of fairer benchmarks, accurate handling of ambiguous or overlapping labels, and the potential to transform data annotation strategies in large-scale machine perception and recognition systems.

7. Summary Table of Main SPMLL Algorithms

Method/Class	Key Principle	Characteristic Innovation
AN (Assume-Negative)	Treat unobserved as negative	Simple, but high false negative rate
WAN/AN-LS	Re-weight/smooth negatives	Reduces overconfidence, less penalizing unknowns
ROLE	Online label estimation + batch regularization	Alternating updates of labels and model
EM Loss	Entropy on unknowns	Encourages ambiguous predictions
SMILE	Unbiased risk estimator + label enhancement	Variational inference of latent soft labels
OPML	One pair of labels per update	Margin-based, robust to label noise
CRISP	Class-priors estimation + unbiased risk	Addresses class imbalance and bias
GPR Loss	Robust loss for diverse pseudo-labels	Adaptively weights label types
AEVLP (DAMP+GPR)	Multi-focus, CLIP-based pseudo-labels	Dynamic, patch-based, robust to noise
SigRL	Graph-based multi-label correlation + visual reconstruction	Semantic/visual alignment with label graphs
GE-VerbMLP	GCN for verb ambiguity + adversarial training	Robust multi-label SR

Each of these methods represents a distinct approach to handling missing labels, annotation sparsity, and supervision noise inherent to SPMLL.

SPMLL has matured rapidly, yielding methods that not only mitigate the damage incurred by missing supervision and annotation bias but also generalize, in many cases, to the performance levels of fully supervised baselines. The development of robust risk estimators, label enhancement frameworks, dynamic pseudo-labeling strategies, and graph-based correlation models are now foundational for practitioners seeking to develop scalable, efficient, and reliable multi-label models in the face of extreme annotation sparsity.