Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Does Unlabeled Data Provably Help Out-of-Distribution Detection? (2402.03502v1)

Published 5 Feb 2024 in cs.LG and stat.ML

Abstract: Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at https://github.com/deeplearning-wisc/sal.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (104)
  1. Positive unlabeled contrastive learning. arXiv preprint arXiv:2206.01206, 2022.
  2. The power of localization for efficiently learning linear separators with noise. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pp.  449–458, 2014.
  3. Feed two birds with one scone: Exploiting wild data for both out-of-distribution generalization and detection. In International Conference on Machine Learning, 2023.
  4. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
  5. Towards open set deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1563–1572, 2016.
  6. Discriminative out-of-distribution detection for semantic segmentation. arXiv preprint arXiv:1808.07703, 2018.
  7. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp.  93–104, 2000.
  8. Partial optimal tranport with applications on positive-unlabeled learning. Advances in Neural Information Processing Systems, 33:2903–2913, 2020.
  9. Describing textures in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3606–3613, 2014.
  10. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint arXiv:1911.05911, 2019.
  11. Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742–864, 2019a.
  12. Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning, pp.  1596–1606, 2019b.
  13. Outlier-robust sparse mean estimation for heavy-tailed distributions. Advances in Neural Information Processing Systems, 35:5164–5177, 2022a.
  14. Streaming algorithms for high-dimensional robust statistics. In International Conference on Machine Learning, pp.  5061–5117, 2022b.
  15. Extremely simple activation shaping for out-of-distribution detection. In International Conference on Learning Representations, 2023.
  16. Siren: Shaping representations for detecting out-of-distribution objects. In Advances in Neural Information Processing Systems, 2022a.
  17. Unknown-aware object detection: Learning what you don’t know from videos in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022b.
  18. Vos: Learning what you don’t know by virtual outlier synthesis. In Proceedings of the International Conference on Learning Representations, 2022c.
  19. Dream the impossible: Outlier imagination with diffusion models. In Advances in Neural Information Processing Systems, 2023.
  20. Convex formulation for learning from positive and unlabeled data. In International conference on machine learning, pp.  1386–1394, 2015.
  21. Is out-of-distribution detection learnable? In Advances in Neural Information Processing Systems, 2022.
  22. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
  23. Exploring the limits of out-of-distribution detection. Advances in Neural Information Processing Systems, 34:7068–7081, 2021.
  24. Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data. In Conference on Learning Theory, pp.  2668–2703, 2022.
  25. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, pp.  1050–1059, 2016.
  26. A framework for benchmarking class-out-of-distribution detection and its application to imagenet. In International Conference on Learning Representations, 2023.
  27. Mixture proportion estimation and pu learning: a modern approach. Advances in Neural Information Processing Systems, 34:8532–8544, 2021.
  28. Domain adaptation under open set label shift. Advances in Neural Information Processing Systems, 35:22531–22546, 2022.
  29. Selectivenet: A deep neural network with an integrated reject option. In Proceedings of the International Conference on Machine Learning, pp.  2151–2159, 2019.
  30. How to overcome curse-of-dimensionality for ood detection? In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  31. Margin based pu learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  32. A critical analysis of out-of-distribution detection for document understanding. In EMNLP-Findings, 2023.
  33. Topological structure learning for weakly-supervised out-of-distribution detection. In Proceedings of the 31st ACM International Conference on Multimedia, pp.  4858–4866, 2023.
  34. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  41–50, 2019.
  35. A baseline for detecting misclassified and out-of-distribution examples in neural networks. Proceedings of the International Conference on Learning Representations, 2017.
  36. Deep anomaly detection with outlier exposure. In Proceedings of the International Conference on Learning Representations, 2019.
  37. Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933.
  38. Pu learning for matrix completion. In International conference on machine learning, pp.  2445–2453, 2015.
  39. On the importance of gradients for detecting distributional shifts in the wild. In Advances in Neural Information Processing Systems, 2021.
  40. Ood-maml: Meta-learning for few-shot out-of-distribution detection and classification. Advances in Neural Information Processing Systems, 33:3907–3916, 2020.
  41. Training ood detectors in their natural habitats. In International Conference on Machine Learning, 2022.
  42. Outlier-robust moment-estimation via sum-of-squares. arXiv preprint arXiv:1711.11581, 2017.
  43. Being bayesian, even just a bit, fixes overconfidence in relu networks. In International conference on machine learning, pp.  5436–5446, 2020.
  44. Learning multiple layers of features from tiny images. 2009.
  45. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, volume 30, pp.  6402–6413, 2017.
  46. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In Proceedings of the International Conference on Learning Representations, 2018a.
  47. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in Neural Information Processing Systems, 31, 2018b.
  48. Sharper generalization bounds for learning with gradient-dominated objective functions. In International Conference on Learning Representations, 2021.
  49. Learning from positive and unlabeled examples. In International Conference on Algorithmic Learning Theory, pp.  71–85. Springer, 2000.
  50. Enhancing the reliability of out-of-distribution image detection in neural networks. In Proceedings of the International Conference on Learning Representations, 2018.
  51. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. Advances in Neural Information Processing Systems, 33:7498–7512, 2020a.
  52. Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475, 2020b.
  53. A simple baseline for bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32:13153–13164, 2019.
  54. Predictive uncertainty estimation via prior networks. Advances in Neural Information Processing Systems, 31, 2018.
  55. Reverse kl-divergence training of prior networks: Improved uncertainty and adversarial robustness. In Advances in Neural Information Processing Systems, 2019.
  56. Towards neural networks that provably know when they don’t know. In Proceedings of the International Conference on Learning Representations, 2020.
  57. How does fine-tuning impact out-of-distribution detection for vision-language models? International Journal of Computer Vision, 2023.
  58. Delving into out-of-distribution detection with vision-language representations. In Advances in Neural Information Processing Systems, 2022a.
  59. POEM: out-of-distribution detection with posterior sampling. In Proceedings of the International Conference on Machine Learning, pp.  15650–15665, 2022b.
  60. On the impact of spurious correlation for out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022c.
  61. How to exploit hyperspherical embeddings for out-of-distribution detection? In Proceedings of the International Conference on Learning Representations, 2023.
  62. Learning to reject meets ood detection: Are all abstentions created equal? arXiv preprint arXiv:2301.12386, 2023.
  63. Reading digits in natural images with unsupervised feature learning. 2011.
  64. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  427–436, 2015.
  65. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. Advances in neural information processing systems, 29, 2016.
  66. Art B Owen. A robust hybrid of lasso and ridge regression. Contemporary Mathematics, 443(7):59–72, 2007.
  67. A simple fix to mahalanobis distance for improving near-ood detection. CoRR, abs/2106.09022, 2021.
  68. Out-of-distribution detection and selective generation for conditional language models. In International Conference on Learning Representations, 2023.
  69. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999.
  70. Ssd: A unified framework for self-supervised outlier detection. In International Conference on Learning Representations, 2021.
  71. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. ISBN 1107057132.
  72. Jacob Steinhardt. Does robustness imply tractability? a lower bound for planted clique in the semi-random model. arXiv preprint arXiv:1704.05120, 2017.
  73. Dice: Leveraging sparsification for out-of-distribution detection. In Proceedings of European Conference on Computer Vision, 2022.
  74. React: Out-of-distribution detection with rectified activations. In Advances in Neural Information Processing Systems, volume 34, 2021.
  75. Out-of-distribution detection with deep nearest neighbors. In Proceedings of the International Conference on Machine Learning, pp.  20827–20840, 2022.
  76. Csi: Novelty detection via contrastive learning on distributionally shifted instances. In Advances in Neural Information Processing Systems, 2020.
  77. Non-parametric outlier synthesis. In Proceedings of the International Conference on Learning Representations, 2023.
  78. John Wilder Tukey. A survey of sampling from contaminated distributions. Contributions to probability and statistics, pp.  448–485, 1960.
  79. Is fine-tuning needed? pre-trained language models are near perfect for out-of-domain detection. In Annual Meeting of the Association for Computational Linguistics, 2023.
  80. Uncertainty estimation using a single deep deterministic neural network. In Proceedings of the International Conference on Machine Learning, pp.  9690–9700, 2020.
  81. Roman Vershynin. High-dimensional probability. 2018.
  82. Can multi-label classification networks know what they don’t know? Proceedings of the Advances in Neural Information Processing Systems, 2021.
  83. Watermarking for out-of-distribution detection. Advances in Neural Information Processing Systems, 35:15545–15557, 2022a.
  84. Learning to augment distributions for out-of-distribution detection. In Advances in Neural Information Processing Systems, 2023a.
  85. Out-of-distribution detection with implicit outlier transformation. In International Conference on Learning Representations, 2023b.
  86. Out-of-distribution detection via conditional kernel independence model. In Advances in Neural Information Processing Systems, 2022b.
  87. Mitigating neural network overconfidence with logit normalization. In Proceedings of the International Conference on Machine Learning, pp.  23631–23644, 2022.
  88. Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In International Conference on Learning Representations, 2020.
  89. Energy-based out-of-distribution detection for graph neural networks. In International Conference on Learning Representations, 2023.
  90. Positive-unlabeled reward learning. In Conference on Robot Learning, pp.  205–219, 2021.
  91. Semantically coherent out-of-distribution detection. In Proceedings of the International Conference on Computer Vision, pp.  8281–8289, 2021a.
  92. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021b.
  93. Openood: Benchmarking generalized out-of-distribution detection. Advances in Neural Information Processing Systems, 35:32598–32611, 2022.
  94. Auto: Adaptive outlier optimization for online test-time ood detection. arXiv preprint arXiv:2303.12267, 2023.
  95. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  96. Wide residual networks. In Richard C. Wilson, Edwin R. Hancock, and William A. P. Smith (eds.), Proceedings of the British Machine Vision Conference, 2016.
  97. Openood v1. 5: Enhanced benchmark for out-of-distribution detection. arXiv preprint arXiv:2306.09301, 2023.
  98. Dist-pu: Positive-unlabeled learning from a label distribution perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14461–14470, 2022.
  99. Out-of-distribution detection learning with unreliable out-of-distribution sources. arXiv preprint arXiv:2311.03236, 2023.
  100. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
  101. STEP: Out-of-distribution detection in the presence of limited in-distribution labeled data. In Advances in Neural Information Processing Systems, 2021.
  102. Unleashing mask: Explore the intrinsic out-of-distribution detection capability. arXiv preprint arXiv:2306.03715, 2023a.
  103. Diversified outlier exposure for out-of-distribution detection via informative extrapolation. arXiv preprint arXiv:2310.13923, 2023b.
  104. Boosting out-of-distribution detection with typical features. In Advances in Neural Information Processing Systems, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xuefeng Du (26 papers)
  2. Zhen Fang (58 papers)
  3. Ilias Diakonikolas (160 papers)
  4. Yixuan Li (183 papers)
Citations (15)

Summary

An Analysis of Unlabeled Data in Out-of-Distribution Detection

The paper "How Does Unlabeled Data Provably Help Out-of-Distribution Detection?" by Du et al. examines the role of unlabeled data in enhancing out-of-distribution (OOD) detection. With the growing importance of deploying machine learning models in real-world applications, ensuring their robustness against OOD inputs is increasingly critical. The authors propose a novel framework, named SAL (Separate And Learn), which leverages such unlabeled data to improve the reliability and effectiveness of OOD detection systems.

Framework Overview

The central contribution of this work is the introduction of SAL, a two-step learning framework designed to handle unlabeled data for OOD detection. The framework first separates candidate outliers from the unlabeled dataset and then trains a binary classifier using these outliers alongside labeled in-distribution (ID) data. SAL operates under the premise that unlabeled data, typically a mixture of ID and OOD samples, can be a powerful resource if properly disentangled.

  1. Separation of Candidate Outliers: The separation process employs a singular value decomposition (SVD)-based approach to filter out potential OOD data. By examining the gradients derived from a model trained on labeled ID data, the method identifies outliers by projecting these gradients onto their principal components. This approach is underpinned by theoretical guarantees on the separability of ID and OOD samples, taking into account the gradient norms and their alignment with top singular vectors.
  2. Learning with Filtered Outliers: After identifying candidate OOD samples, the subsequent learning stage optimizes a binary classifier. This classifier is trained to distinguish between ID and the separated outlier set, effectively incorporating the diverse OOD data into the training process. The formulation is backed by rigorous theoretical analysis, providing error bounds that assure the learnability and generalization capability of the classifier.

Theoretical Insights

The authors delve into the theoretical foundations of their method, presenting a comprehensive analysis from the lens of distribution discrepancy and learnability. The framework is underpinned by several key theoretical results:

  • Separability Bounds: The paper establishes conditions under which the OOD samples can be effectively separated with minimal error rates. These bounds depend on the discrepancy between distributions and the size of the ID data, ensuring that, with sufficient samples, the model can achieve low misclassification rates.
  • Generalization Error: By framing the learning task within the context of a binary classification problem, the authors quantify the generalization error bound of the trained OOD detector. The results highlight the dependency of error bounds on the quality of the separation, which is determined by the effectiveness of the filtering mechanism in differentiating ID from OOD data.

Empirical Evaluation

The empirical results corroborate the theoretical claims, with SAL demonstrating state-of-the-art performance across several benchmarks, including challenging datasets like Cifar-100 and various OOD detection tasks. The performance gains underscore the efficacy of utilizing unlabeled wild data, which, when appropriately leveraged, significantly reduces false positive rates without compromising ID classification accuracy.

Implications and Future Directions

This work opens avenues for further exploration into leveraging unlabeled data in various AI applications, particularly where data labeling is impractical or cost-prohibitive. Future research could extend SAL to other domains, incorporate more sophisticated machine learning models, or explore semi-supervised extensions that harness partial label information.

In summary, "How Does Unlabeled Data Provably Help Out-of-Distribution Detection?" introduces a compelling framework with strong theoretical guarantees that significantly advance the understanding of using unlabeled data for OOD detection. This paper provides a robust foundation for developing more resilient AI systems capable of operating reliably in dynamic and unpredictable environments.