Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning (2405.12217v2)

Published 20 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Recent studies indicate that large multimodal models (LMMs) potentially act as general-purpose assistants and are highly robust against different distributions. Despite this, domain-specific adaptation is still necessary particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an effective alternative for enhancing LMMs' adaptability. Our study addresses this by evaluating an unsupervised ICL method which selects in-context examples through a nearest example search based on feature similarity. We uncover that its effectiveness is limited by the deficiencies of pre-trained vision encoders under distribution shift scenarios. To address these challenges, we propose InvariantSelectPR, a novel method leveraging Class-conditioned Contrastive Invariance (CCI) for more robust demonstration selection. Specifically, CCI enhances pre-trained vision encoders by improving their discriminative capabilities across different classes and ensuring invariance to domain-specific variations. This enhancement allows the encoders to effectively identify and retrieve the most informative examples, which are then used to guide LMMs in adapting to new query samples under varying distributions. Our experiments show that InvariantSelectPR substantially improves the adaptability of LMMs, achieving significant performance gains on benchmark datasets, with a 34.2%$\uparrow$ accuracy increase in 7-shot on Camelyon17 and 16.9%$\uparrow$ increase in 7-shot on HAM10000 compared to the baseline zero-shot performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  2. Anthropic. Model card and evaluations for claude models. https://www-cdn.anthropic.com/files/4zrzovbb/website/bd2a28d2535bfb0494cc8e2a3bf135d2e7523226.pdf, 2023. Accessed: 2024-03-07.
  3. Anthropic. Claude 3 haiku: our fastest model yet. 2024. Available at: https://www.anthropic.com/news/claude-3-haiku.
  4. Openflamingo, Mar. 2023. URL https://doi.org/10.5281/zenodo.7733589.
  5. Metareg: Towards domain generalization using meta-regularization. In NeurIPS, 2018.
  6. Towards in-context scene understanding. Advances in Neural Information Processing Systems, 36, 2024.
  7. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE transactions on medical imaging, 38(2):550–560, 2018.
  8. Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35:25005–25017, 2022.
  9. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006.
  10. What makes multimodal in-context learning work? arXiv e-prints, pages arXiv–2404, 2024.
  11. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  12. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  13. Domain generalization by solving jigsaw puzzles. In CVPR, 2019.
  14. Learning to balance specificity and invariance for in and out of domain generalization. In ECCV, 2020.
  15. Free: Feature refinement for generalized zero-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 122–131, 2021.
  16. Transzero: Attribute-guided transformer for zero-shot learning. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 330–338, 2022.
  17. Evolving semantic prototype improves generative zero-shot learning. In International Conference on Machine Learning, pages 4611–4622. PMLR, 2023.
  18. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  19. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  20. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020. URL https://api.semanticscholar.org/CorpusID:225039882.
  21. Explore in-context learning for 3d point cloud understanding. Advances in Neural Information Processing Systems, 36, 2024.
  22. Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pages 1180–1189. PMLR, 2015.
  23. I. Gulrajani and D. Lopez-Paz. In search of lost domain generalization. In International Conference on Learning Representations, 2020.
  24. Semi-supervised screening of covid-19 from positive and unlabeled data with constraint non-negative risk estimator. In Information Processing in Medical Imaging: 27th International Conference, IPMI 2021, Virtual Event, June 28–June 30, 2021, Proceedings 27, pages 611–623. Springer, 2021.
  25. Towards accurate and robust domain adaptation under multiple noisy environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6460–6479, 2022a.
  26. Learning transferable parameters for unsupervised domain adaptation. IEEE Transactions on Image Processing, 31:6424–6439, 2022b.
  27. How well does gpt-4v (ision) adapt to distribution shifts? a preliminary investigation. arXiv preprint arXiv:2312.07424, 2023.
  28. How well does GPT-4v(ision) adapt to distribution shifts? a preliminary investigation. In ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2024. URL https://openreview.net/forum?id=J8V4EwZkez.
  29. Drugood: Out-of-distribution (ood) dataset curator and benchmark for ai-aided drug discovery–a focus on affinity prediction problems with noise annotations. arXiv preprint arXiv:2201.09637, 2022.
  30. Many-shot in-context learning in multimodal foundation models, 2024.
  31. Obelics: An open web-scale filtered dataset of interleaved image-text documents. Advances in Neural Information Processing Systems, 36, 2024.
  32. Learning to generalize: Meta-learning for domain generalization. In AAAI, 2018a.
  33. Deep domain generalization via conditional invariant adversarial networks. In ECCV, 2018b.
  34. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
  35. What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804, 2021.
  36. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
  37. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
  38. Z. Mao and Y. Yu. Tuning llms with contrastive alignment instructions for machine translation in unseen, low-resource languages. arXiv preprint arXiv:2401.05811, 2024.
  39. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
  40. OpenAI. Gpt-4v(ision) system card. 2023. URL https://cdn.openai.com/papers/GPTV_System_Card.pdf.
  41. Reliable and trustworthy machine learning for health using dataset shift detection. Advances in Neural Information Processing Systems, 34:3043–3056, 2021.
  42. Moment matching for multi-source domain adaptation. In ICCV, 2019.
  43. Efficient domain generalization via common-specific low-rank decomposition. In ICML, 2020.
  44. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  45. Capabilities of gemini models in medicine. arXiv preprint arXiv:2404.18416, 2024.
  46. CLIPood: Generalizing CLIP to out-of-distributions. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 31716–31731. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/shu23a.html.
  47. B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pages 443–450. Springer, 2016.
  48. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  49. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
  50. L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  51. Deep hashing network for unsupervised domain adaptation. In CVPR, 2017.
  52. Generalizing to unseen domains via adversarial data augmentation. In NeurIPS, 2018.
  53. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023a.
  54. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, pages 2097–2106, 2017.
  55. Images speak in images: A generalist painter for in-context visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6830–6839, 2023b.
  56. In-context learning unlocked for diffusion models. Advances in Neural Information Processing Systems, 36, 2024.
  57. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  58. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
  59. The learnability of in-context learning. Advances in Neural Information Processing Systems, 36, 2024.
  60. A fine-grained analysis on distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dl4LetuLdyK.
  61. Fundamental limitations of alignment in large language models. arXiv preprint arXiv:2304.11082, 2023.
  62. Infoprompt: Information-theoretic soft prompt tuning for natural language understanding. Advances in Neural Information Processing Systems, 36, 2024.
  63. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080, 2021.
  64. ManyDG: Many-domain generalization for healthcare applications. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=lcSfirnflpW.
  65. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 2023b.
  66. What makes good examples for visual in-context learning? Advances in Neural Information Processing Systems, 36, 2024.
  67. Contrastive counterfactual learning for causality-aware interpretable recommender systems. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 3564–3573, 2023a.
  68. Emerging synergies in causality and deep generative models: A survey. arXiv preprint arXiv, 2301, 2023b.
  69. Hcvp: Leveraging hierarchical contrastive visual prompt for domain generalization. arXiv preprint arXiv:2401.09716, 2024.
  70. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Guanglin Zhou (9 papers)
  2. Zhongyi Han (17 papers)
  3. Shiming Chen (29 papers)
  4. Biwei Huang (54 papers)
  5. Liming Zhu (101 papers)
  6. Salman Khan (244 papers)
  7. Xin Gao (208 papers)
  8. Lina Yao (194 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com