Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data (2310.09926v2)
Abstract: Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a <category>", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.
- Confident object detection via conformal prediction and conformal risk control: an application to railway signaling. arXiv preprint arXiv:2304.06052, 2023.
- A gentle introduction to conformal prediction and distribution-free uncertainty quantification. (arXiv:2107.07511), December 2022. doi: 10.48550/arXiv.2107.07511. URL http://arxiv.org/abs/2107.07511. arXiv:2107.07511 [cs, math, stat].
- Dbpedia: A nucleus for a web of open data. In Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (eds.), The Semantic Web, Lecture Notes in Computer Science, pp. 722–735, Berlin, Heidelberg, 2007. Springer. ISBN 978-3-540-76298-0. doi: 10.1007/978-3-540-76298-0_52.
- Ms marco: A human generated machine reading comprehension dataset. (arXiv:1611.09268), Oct 2018. URL http://arxiv.org/abs/1611.09268. arXiv:1611.09268 [cs].
- Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Beijing.; Cambridge Mass., 1st edition edition, Aug 2009. ISBN 978-0-596-51649-9.
- Reproducible scaling laws for contrastive language-image learning. (arXiv:2212.07143), Dec 2022. URL http://arxiv.org/abs/2212.07143. arXiv:2212.07143 [cs].
- Few-shot conformal prediction with auxiliary tasks. In International Conference on Machine Learning, pp. 3329–3339. PMLR, 2021.
- Evaluating deep neural networks trained on clinical images in dermatology with the fitzpatrick 17k dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1820–1828, 2021.
- Towards reliable zero shot classification in self-supervised models with conformal prediction. (arXiv:2210.15805), Oct 2022. doi: 10.48550/arXiv.2210.15805. URL http://arxiv.org/abs/2210.15805. arXiv:2210.15805 [cs].
- Conformal prediction with large language models for multi-choice question answering. arXiv preprint arXiv:2305.18404, 2023.
- Performance evaluation in content-based image retrieval: overview and proposals. Pattern recognition letters, 22(5):593–601, 2001.
- Learning transferable visual models from natural language supervision. (arXiv:2103.00020), Feb 2021. URL http://arxiv.org/abs/2103.00020. arXiv:2103.00020 [cs].
- Sentence-bert: Sentence embeddings using siamese bert-networks. (arXiv:1908.10084), Aug 2019a. URL http://arxiv.org/abs/1908.10084. arXiv:1908.10084 [cs].
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019b. URL http://arxiv.org/abs/1908.10084.
- A primer on contrastive pretraining in language processing: Methods, lessons learned, and perspectives. ACM Computing Surveys, 55(10):1–17, 2023.
- F. B. Rogers. Medical subject headings. Bulletin of the Medical Library Association, 51(1):114–116, Jan 1963. ISSN 0025-7338.
- LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://openreview.net/forum?id=M3Y74vmsMcY.
- Conformal prediction under ambiguous ground truth. (arXiv:2307.09302), Jul 2023. URL http://arxiv.org/abs/2307.09302. arXiv:2307.09302 [cs, stat].
- Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
- Algorithmic learning in a random world, volume 29. Springer, 2005.
- Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
- Large-scale domain-specific pretraining for biomedical vision-language processing, 2023a. URL https://arxiv.org/abs/2303.00915.
- Large-scale domain-specific pretraining for biomedical vision-language processing. (arXiv:2303.00915), Mar 2023b. doi: 10.48550/arXiv.2303.00915. URL http://arxiv.org/abs/2303.00915. arXiv:2303.00915 [cs].