Partial Label Learning
- Partial Label Learning is a weakly supervised approach where each instance is paired with a candidate set containing the unknown true label.
- It employs probabilistic, optimization-based, and representation-learning techniques to define surrogate risk and handle annotation ambiguity.
- Advanced methods include EM-style disambiguation and contrastive representation learning to refine labels and improve robustness under noise.
Partial label learning (PLL) encompasses a class of weakly supervised learning protocols in which, for each training instance, only a set of candidate labels is observed, among which exactly one is the true (unknown) label. Unlike traditional supervised learning which requires fully specified ground-truth labels, PLL accommodates situations arising in low-quality annotation, ambiguous labeling, programmatic weak supervision, or scenarios with severe cost constraints on label curation. In the classical regime, the candidate set is guaranteed to contain the true label, but modern extensions permit cases with noisy or unreliable candidate sets. PLL has become a central framework for robust learning under annotation ambiguity, spawning a diversity of probabilistic, optimization-based, and representation-learning approaches.
1. Core Problem Definition and Regimes
Formally, in the standard multi-class PLL setting, each input is associated with a candidate set , which contains the unknown true label . The learner receives only pairs, not . The objective is to learn a classifier with low misclassification error with respect to the ground-truth label.
Several variants exist:
- Classical PLL: The true label is always in . Each is a non-singleton subset, introducing ambiguity but not outright error.
- Noisy/unreliable PLL: The true label may be omitted from with probability , generalizing to settings of partial annotation with false negatives (Shi et al., 2023, Shi et al., 2023).
- Partial label regression: The true label is an unknown member of a set of candidate real values 0, extending PLL to structured outputs and continuous targets (Cheng et al., 2023).
- Partial multi-label learning: Each instance is associated with a candidate set, of which multiple (not necessarily one) labels are relevant. Some labels in 1 are spurious, but more than one label may be positive (Yu et al., 2020).
This taxonomy distinguishes PLL conceptually from multi-label, complementary-label, or standard semi-supervised learning.
2. Principled Loss Functions and Probabilistic Foundations
The fundamental methodological challenge in PLL is how to define a surrogate risk that leverages only the ambiguous candidate sets without bias toward any particular candidate.
Marginal Likelihood-Based Approaches
Probabilistic models formalize the marginal likelihood of the observed candidate event. The ProPaLL method introduces a direct marginalization over all plausible true labels in 2 by defining
3
which for a Bernoulli output architecture becomes
4
with 5. The negative log-likelihood objective 6 is minimized via standard stochastic optimization (Struski et al., 2022).
Properness and Consistent Risk Estimation
Risk-rewriting under properness provides a sufficient condition for unbiased risk estimation in PLL: 7 where 8 represents the normalized posterior 9, structurally satisfying the marginalization constraint over 0 (Wu et al., 2021).
These probabilistic and risk-consistent estimators unify and generalize classic PLL surrogates such as average loss, min-loss identification, and progressive identification via soft weighting (Cheng et al., 2023).
3. Algorithms: From Classic to Deep and Structured PLL
PLL algorithms are designed to resolve label ambiguity through direct risk minimization, EM-style disambiguation, or contrastive and representation learning.
Direct Optimization and Label-Refinement
Some algorithms treat the identity of the true label as a latent variable, repeatedly refining soft label assignments and updating the classifier. Examples include:
- Self-guided retraining (SURE): Alternates between model fitting and confidence-disambiguation via infinity-norm regularization, producing strictly one-hot pseudo-labels (Feng et al., 2019).
- Progressive identification: Reweights candidate labels via softmaxed loss-based weighting, concentrating mass onto the best-explained candidate label (Cheng et al., 2023).
- Conformal candidate cleaning: Iteratively prunes candidate sets using conformal-prediction-based coverage guarantees (Fuchs et al., 11 Feb 2025).
Probabilistic and Generative Models
- Variational inference: Amortized VI directly models 1, parameterizing a Dirichlet posterior over true labels, and trains via a 2-ELBO comprising CVAE reconstruction, candidate-set regularization, and prior-KL (Fuchs et al., 24 Oct 2025).
- Multi-level adversarial models: MGPLL employs bi-directional mappings between label/feature spaces, adversarial denoising of label candidates, and reconstruction constraints (Yan et al., 2020).
- Noisy labeler models: Extend Snorkel-style weak supervision, learning accuracies of partial-labelers and identifying true labels via an EM framework, achieving identifiability up to permutation (Yu et al., 2021).
Contrastive Representation Learning
Contemporary methods such as PiCO develop jointly contrastive and prototype-based modules, aligning feature embeddings for the same class and iteratively refining pseudo-labels via prototype similarity. Theoretical analysis interprets this as an EM lower-bound on the PLL likelihood under von Mises–Fisher mixture assumptions (Wang et al., 2022). Extensions address noisy PLL through clean-sample selection and semi-supervised contrastive objectives (PiCO+) (Wang et al., 2022).
Semi-Supervised and Meta-Learning Perspectives
Advances in deep semi-supervised learning are leveraged:
- PLSP: Combines high-confidence pseudo-labeled sets, strong consistency on ambiguous samples, and complementary regularization to suppress non-candidates, transferring paradigms from FlexMatch and ISDA into PLL (Li et al., 2022).
- Few-shot PLL: Embedding-prototype rectification with iterative smoothing permits PLL under hard support-set constraints, greatly improving few-shot transfer (Zhao et al., 2021).
- Momentum curriculum learning: PLMCL unifies pseudo-label momentum updates and label-wise curriculum to stabilize multi-label PLL in sparse annotation regimes (Abdelfattah et al., 2022).
4. Noisy and Instance-Dependent Partial Label Learning
Real-world PLL generates candidate sets via complex, possibly instance-dependent or error-prone processes:
- Instance-dependent noise: VALEN models ambiguous label distributions as variational Dirichlets, leveraging GCNs over feature and label graphs and iteratively enhances label quality (Xu et al., 2021).
- Unreliable PLL (UPLL): Allows for omission of the true label with nonzero probability. UPLLRS recursively separates highly unreliable samples, exploits reliable subsets for EM-style disambiguation, and applies semi-supervised consistency training to improve robustness at high noise rates (Shi et al., 2023). URRL further integrates unreliability-robust contrastive learning, KNN-based set correction, and consistency regularization, with explicit EM-theoretic justification and state-of-the-art results under high unreliability (Shi et al., 2023).
5. Theoretical Guarantees, Consistency, and Open Challenges
Multiple frameworks establish theoretical guarantees for PLL:
- Risk-consistency: Unbiased estimators equalize the empirical PLL risk and the true classification risk under properness, with generalization error bounded in terms of function class Rademacher complexity (Wu et al., 2021).
- Model-consistency and convergence: Progressive identification and min-loss selectors achieve model-consistency—recovery of the oracle supervised regressor/classifier—together with 3 learning rates (Cheng et al., 2023).
- Identifiability: Probabilistic models for multiple partial labelers are generically identifiable (up to label permutation), provided labeler accuracy exceeds random expectation and nondegeneracy holds (Yu et al., 2021).
However, formal convergence, consistency under adversarial or highly instance-dependent noise, and the extension of loss-corrected risk consistency to deep architectures and structured output remain open research topics (Wang et al., 2022, Struski et al., 2022).
6. Applications, Extensions, and Impact
PLL has immediate utility in domains such as:
- Automated theorem proving: Alternative proofs of a theorem can be framed as a partial-label set over possible derivations; PLL-guided loss functions significantly improve proof search coverage and sample efficiency (Zombori et al., 4 Jul 2025).
- Multi-label image classification: Sparse or ambiguous annotation is handled via momentum and pseudo-label curricula (Abdelfattah et al., 2022).
- Programmatic and crowdsourced annotation: Weak supervision via partial labelers (zero-shot, attribute-based, or heuristic) enables scaling supervision without hand-labels (Yu et al., 2021, Saravanan et al., 2024).
The continued development of robust, scalable, and theoretically grounded PLL approaches is likely to impact large-scale, ambiguous, or weakly labeled data settings across natural language processing, vision, scientific learning, and beyond. Extensions to unsupervised, transductive, multi-task, structured-output, and high-class-cardinality regimes remain active research frontiers.