Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Global Optimal Visual In-Context Learning Prompt Selection (2405.15279v2)

Published 24 May 2024 in cs.CV

Abstract: Visual In-Context Learning (VICL) is a prevailing way to transfer visual foundation models to new tasks by leveraging contextual information contained in in-context examples to enhance learning and prediction of query sample. The fundamental problem in VICL is how to select the best prompt to activate its power as much as possible, which is equivalent to the ranking problem to test the in-context behavior of each candidate in the alternative set and select the best one. To utilize more appropriate ranking metric and leverage more comprehensive information among the alternative set, we propose a novel in-context example selection framework to approximately identify the global optimal prompt, i.e. choosing the best performing in-context examples from all alternatives for each query sample. Our method, dubbed Partial2Global, adopts a transformer-based list-wise ranker to provide a more comprehensive comparison within several alternatives, and a consistency-aware ranking aggregator to generate globally consistent ranking. The effectiveness of Partial2Global is validated through experiments on foreground segmentation, single object detection and image colorization, demonstrating that Partial2Global selects consistently better in-context examples compared with other methods, and thus establish the new state-of-the-arts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35:25005–25017, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Learning to rank with nonsmooth cost functions. Advances in neural information processing systems, 19, 2006.
  4. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  5. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111:98–136, 2015.
  6. Explore in-context learning for 3d point cloud understanding. Advances in Neural Information Processing Systems, 36, 2024.
  7. In-context learning creates task vectors. arXiv preprint arXiv:2310.15916, 2023.
  8. Statistical ranking and combinatorial hodge theory. Mathematical Programming, 127(1):203–244, 2011.
  9. Deep attentive ranking networks for learning to order sentences. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8115–8122, 2020.
  10. Visual instruction tuning. Advances in neural information processing systems, 36, 2024.
  11. Rankcse: Unsupervised sentence representations learning via learning to rank. arXiv preprint arXiv:2305.16726, 2023.
  12. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  13. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  14. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295, 2023.
  15. Neuralndcg: Direct optimisation of a ranking metric via differentiable relaxation of sorting. arXiv preprint arXiv:2102.07831, 2021.
  16. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  17. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  18. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410, 2017.
  19. Exploring effective factors for improving visual in-context learning. arXiv preprint arXiv:2304.04748, 2023.
  20. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  21. Boltzrank: learning to maximize expected ranking gain. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1089–1096, 2009.
  22. Images speak in images: A generalist painter for in-context visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6830–6839, 2023.
  23. In-context learning unlocked for diffusion models. Advances in Neural Information Processing Systems, 36:8542–8562, 2023.
  24. Hodgerank on random graphs for subjective video quality assessment. IEEE Transactions on Multimedia, 14(3):844–857, 2012.
  25. Hodgerank with information maximization for crowdsourced pairwise ranking aggregation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  26. What makes good examples for visual in-context learning? Advances in Neural Information Processing Systems, 36, 2024.
  27. Can we edit factual knowledge by in-context learning? arXiv preprint arXiv:2305.12740, 2023.
  28. Visual in-context learning for large vision-language models. arXiv preprint arXiv:2402.11574, 2024.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com