Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting (2505.04733v2)

Published 7 May 2025 in cs.LG

Abstract: We introduce a framework for robust uncertainty quantification in situations where labeled training data are corrupted, through noisy or missing labels. We build on conformal prediction, a statistical tool for generating prediction sets that cover the test label with a pre-specified probability. The validity of conformal prediction, however, holds under the i.i.d assumption, which does not hold in our setting due to the corruptions in the data. To account for this distribution shift, the privileged conformal prediction (PCP) method proposed leveraging privileged information (PI) -- additional features available only during training -- to re-weight the data distribution, yielding valid prediction sets under the assumption that the weights are accurate. In this work, we analyze the robustness of PCP to inaccuracies in the weights. Our analysis indicates that PCP can still yield valid uncertainty estimates even when the weights are poorly estimated. Furthermore, we introduce uncertain imputation (UI), a new conformal method that does not rely on weight estimation. Instead, we impute corrupted labels in a way that preserves their uncertainty. Our approach is supported by theoretical guarantees and validated empirically on both synthetic and real benchmarks. Finally, we show that these techniques can be integrated into a triply robust framework, ensuring statistically valid predictions as long as at least one underlying method is valid.

Summary

The paper introduces two novel methodologies, Privileged Conformal Prediction (PCP) and Uncertain Imputation (UI), to address the challenge of generating valid prediction sets in conformal prediction with corrupted training data.
Both PCP and UI methods are theoretically proven and empirically shown to maintain valid prediction coverage despite inaccuracies in weight estimation or underlying label corruption, offering robust uncertainty quantification.
These methods improve the robustness of conformal prediction in non-ideal, real-world settings with suboptimal data quality, enabling more reliable uncertainty quantification in applications like healthcare or finance.

Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting

The paper "Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting" presents an advanced methodology for generating predictive uncertainty estimates in the presence of corrupted training data. In contexts where label corruption is prevalent, often due to noise or incomplete information, ensuring reliable prediction sets becomes challenging. This work proposes a dual approach leveraging conformal prediction methodologies to construct valid prediction intervals despite label imperfections.

Problem Addressed

The traditional formulation of conformal prediction assumes i.i.d. conditions among data points, which cannot be guaranteed in scenarios involving label corruption. The authors tackle this issue by addressing the distributional shift and the inherent uncertainty brought about by data corruption. Two innovative techniques are proposed—privileged conformal prediction (PCP) and uncertain imputation (UI)—to account for shifts in data distribution and mitigate the effects of noisy labels.

Methodological Contributions

Privileged Conformal Prediction (PCP): This method integrates privileged information—an ancillary set of features only accessible during training—to re-weight the data distribution. The essence of PCP lies in its capacity to provide valid prediction sets based on the assumption of precise weight calculations, correcting the distributional shift introduced by label corruption. The authors analyze the robustness of PCP to inaccuracies in these weights, demonstrating that PCP retains its validity under certain perturbations in weight estimation.
Uncertain Imputation (UI): This novel conformal prediction approach circumvents weight estimation by imputing corrupted labels while consciously maintaining their uncertainty. Unlike PCP, UI does not rely on estimating the likelihood ratios of labels, but instead focuses on a noise-preservation imputation process. The paper provides theoretical guarantees of UI's validity, supported by empirical evaluations on datasets featuring synthetic and real-world synthetic benchmarks.

Key Insights and Results

The research establishes theoretical conditions under which both PCP and UI can attain the desired coverage levels, despite inaccuracies in weight estimation or label corruption. Empirically, UI demonstrates robustness, achieving valid prediction intervals irrespective of the underpinnings of label noise. The integration of both PCP and UI into a triply robust framework ensures that prediction validity is preserved as long as one of the methodologies' assumptions holds true. This option allows practitioners to apply a broader ensemble of predictive techniques to safeguard against various corruption scenarios.

Implications and Future Directions

This work highlights the practical and theoretical implications of handling label corruption within predictive model deployment. By explicitly addressing the distribution shifts and uncertainties introduced by corrupted labels, the paper provides concrete strategies for deploying machine learning models in environments with suboptimal data quality. Future exploration could focus on enhancing the adaptability of these methods across diverse datasets with heterogeneous noise features.

Overall, the paper's contributions lie in strengthening the validity of conformal prediction methodologies in non-ideal environments. The proposed methodologies represent a significant step forward in the domain of uncertainty quantification, paving the way for more robust applications in real-world systems, including medical, financial, and sociotechnical domains.