On Orderings of Probability Vectors and Unsupervised Performance Estimation
Abstract: Unsupervised performance estimation, or evaluating how well models perform on unlabeled data is a difficult task. Recently, a method was proposed by Garg et al. [2022] which performs much better than previous methods. Their method relies on having a score function, satisfying certain properties, to map probability vectors outputted by the classifier to the reals, but it is an open problem which score function is best. We explore this problem by first showing that their method fundamentally relies on the ordering induced by this score function. Thus, under monotone transformations of score functions, their method yields the same estimate. Next, we show that in the binary classification setting, nearly all common score functions - the $L\infty$ norm; the $L2$ norm; negative entropy; and the $L2$, $L1$, and Jensen-Shannon distances to the uniform vector - all induce the same ordering over probability vectors. However, this does not hold for higher dimensional settings. We conduct numerous experiments on well-known NLP data sets and rigorously explore the performance of different score functions. We conclude that the $L\infty$ norm is the most appropriate.
- Leveraging unlabeled data to predict out-of-distribution performance. In International Conference on Learning Representations (ICLR), 2022.
- A theory of learning from different domains. Machine learning, 79(1):151–175, 2010a.
- Impossibility theorems for domain adaptation. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 129–136, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010b. PMLR. URL https://proceedings.mlr.press/v9/david10a.html.
- Predicting with confidence on unseen distributions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1134–1144, 2021.
- Are labels always necessary for classifier accuracy evaluation? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15069–15078, 2021.
- What does rotation prediction tell us about classifier accuracy under varying testing environments? In International Conference on Machine Learning, pages 2579–2589. PMLR, 2021.
- Assessing generalization of SGD via disagreement. In International Conference on Learning Representations, 2022. URL https://iclr.cc/virtual/2022/spotlight/6301.
- Mandoline: Model evaluation under distribution shift. In International Conference on Machine Learning, pages 1617–1629. PMLR, 2021.
- Estimating accuracy from unlabeled data: A bayesian approach. In International Conference on Machine Learning, pages 1416–1425. PMLR, 2016.
- Neural unsupervised domain adaptation in nlp—a survey. arXiv preprint arXiv:2006.00632, 2020.
- Stephen A Rhoades. The herfindahl-hirschman index. Fed. Res. Bull., 79:188, 1993.
- Edward H Simpson. Measurement of diversity. Nature, 163(4148):688–688, 1949.
- CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1404. URL https://www.aclweb.org/anthology/D18-1404.
- Semeval 2018 task 2: Multilingual emoji prediction. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 24–33, 2018.
- Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020, mar 2020. URL https://arxiv.org/abs/2003.04807. Data available at https://github.com/PolyAI-LDN/task-specific-datasets.
- Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
- Jacob Cohen. Statistical power analysis for the behavioral sciences. Routledge, 2013.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.