Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
10 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data (2310.09926v2)

Published 15 Oct 2023 in cs.AI

Abstract: Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a <category>", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Confident object detection via conformal prediction and conformal risk control: an application to railway signaling. arXiv preprint arXiv:2304.06052, 2023.
  2. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. (arXiv:2107.07511), December 2022. doi: 10.48550/arXiv.2107.07511. URL http://arxiv.org/abs/2107.07511. arXiv:2107.07511 [cs, math, stat].
  3. Dbpedia: A nucleus for a web of open data. In Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux (eds.), The Semantic Web, Lecture Notes in Computer Science, pp.  722–735, Berlin, Heidelberg, 2007. Springer. ISBN 978-3-540-76298-0. doi: 10.1007/978-3-540-76298-0_52.
  4. Ms marco: A human generated machine reading comprehension dataset. (arXiv:1611.09268), Oct 2018. URL http://arxiv.org/abs/1611.09268. arXiv:1611.09268 [cs].
  5. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Beijing.; Cambridge Mass., 1st edition edition, Aug 2009. ISBN 978-0-596-51649-9.
  6. Reproducible scaling laws for contrastive language-image learning. (arXiv:2212.07143), Dec 2022. URL http://arxiv.org/abs/2212.07143. arXiv:2212.07143 [cs].
  7. Few-shot conformal prediction with auxiliary tasks. In International Conference on Machine Learning, pp. 3329–3339. PMLR, 2021.
  8. Evaluating deep neural networks trained on clinical images in dermatology with the fitzpatrick 17k dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1820–1828, 2021.
  9. Towards reliable zero shot classification in self-supervised models with conformal prediction. (arXiv:2210.15805), Oct 2022. doi: 10.48550/arXiv.2210.15805. URL http://arxiv.org/abs/2210.15805. arXiv:2210.15805 [cs].
  10. Conformal prediction with large language models for multi-choice question answering. arXiv preprint arXiv:2305.18404, 2023.
  11. Performance evaluation in content-based image retrieval: overview and proposals. Pattern recognition letters, 22(5):593–601, 2001.
  12. Learning transferable visual models from natural language supervision. (arXiv:2103.00020), Feb 2021. URL http://arxiv.org/abs/2103.00020. arXiv:2103.00020 [cs].
  13. Sentence-bert: Sentence embeddings using siamese bert-networks. (arXiv:1908.10084), Aug 2019a. URL http://arxiv.org/abs/1908.10084. arXiv:1908.10084 [cs].
  14. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019b. URL http://arxiv.org/abs/1908.10084.
  15. A primer on contrastive pretraining in language processing: Methods, lessons learned, and perspectives. ACM Computing Surveys, 55(10):1–17, 2023.
  16. F. B. Rogers. Medical subject headings. Bulletin of the Medical Library Association, 51(1):114–116, Jan 1963. ISSN 0025-7338.
  17. LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://openreview.net/forum?id=M3Y74vmsMcY.
  18. Conformal prediction under ambiguous ground truth. (arXiv:2307.09302), Jul 2023. URL http://arxiv.org/abs/2307.09302. arXiv:2307.09302 [cs, stat].
  19. Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
  20. Algorithmic learning in a random world, volume 29. Springer, 2005.
  21. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
  22. Large-scale domain-specific pretraining for biomedical vision-language processing, 2023a. URL https://arxiv.org/abs/2303.00915.
  23. Large-scale domain-specific pretraining for biomedical vision-language processing. (arXiv:2303.00915), Mar 2023b. doi: 10.48550/arXiv.2303.00915. URL http://arxiv.org/abs/2303.00915. arXiv:2303.00915 [cs].
Citations (4)

Summary

We haven't generated a summary for this paper yet.