Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models (2404.00884v1)

Published 1 Apr 2024 in cs.CL and cs.AI

Abstract: LLMs have shown promising abilities of in-context learning (ICL), adapting swiftly to new tasks with only few-shot demonstrations. However, current few-shot methods heavily depend on high-quality, query-specific demos, which are often lacking. When faced with out-of-demonstration (OOD) queries, methods that rely on hand-crafted demos or external retrievers might fail. To bridge the gap between limited demos and OOD queries, we propose Self-Demos, a novel prompting method that elicits the inherent generalizability in LLMs by query-aware demo generation. The generated demos strategically interpolate between existing demos and the given query, transforming the query from OOD to ID. To evaluate the effectiveness of our approach, we manually constructed OOD-Toolset, a dataset in the tool-using scenario with over 300 real-world APIs and 1000 instances, each consisting of three tool-use cases as demos and an OOD query. Thorough experiments on our dataset and two public math benchmarks have shown that our method can outperform state-of-the-art baselines in the OOD setting. Moreover, we conduct a range of analyses to validate Self-Demos's generalization and provide more insights.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. In-context examples selection for machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 8857–8873. Association for Computational Linguistics.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. How many demonstrations do you need for in-context learning?
  5. Self-icl: Zero-shot in-context learning with self-generated demonstrations. CoRR, abs/2305.15035.
  6. Training verifiers to solve math word problems. CoRR, abs/2110.14168.
  7. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. CoRR, abs/2205.05718.
  8. A survey on in-context learning.
  9. Ambiguity-aware in-context learning with large language models. CoRR, abs/2309.07900.
  10. Overthinking the truth: Understanding how language models process false demonstrations. CoRR, abs/2307.09476.
  11. Measuring mathematical problem solving with the MATH dataset. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  12. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 6769–6781. Association for Computational Linguistics.
  13. Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator. CoRR, abs/2206.08082.
  14. Look at the first sentence: Position bias in question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 1109–1121. Association for Computational Linguistics.
  15. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
  16. Diverse demonstrations improve in-context compositional generalization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 1401–1422. Association for Computational Linguistics.
  17. Self-prompting large language models for open-domain QA. CoRR, abs/2212.08635.
  18. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, DeeLIO@ACL 2022, Dublin, Ireland and Online, May 27, 2022, pages 100–114. Association for Computational Linguistics.
  19. Dr.icl: Demonstration-retrieved in-context learning. CoRR, abs/2305.14128.
  20. Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
  21. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11048–11064. Association for Computational Linguistics.
  22. Webgpt: Browser-assisted question-answering with human feedback. CoRR, abs/2112.09332.
  23. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. CoRR, abs/2311.16452.
  24. OpenAI. 2022. Openai: Introducing chatgpt. Website. https://openai.com/blog/chatgpt.
  25. Efficiently scaling transformer inference. CoRR, abs/2211.05102.
  26. Tool learning with foundation models.
  27. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  28. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 2655–2671. Association for Computational Linguistics.
  29. Synthetic prompting: Generating chain-of-thought demonstrations for large language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 30706–30775. PMLR.
  30. XRICL: cross-lingual retrieval-augmented in-context learning for cross-lingual text-to-sql semantic parsing. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 5248–5259. Association for Computational Linguistics.
  31. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. CoRR, abs/2206.04615.
  32. GPT-4 doesn’t know it’s wrong: An analysis of iterative prompting for reasoning problems. CoRR, abs/2310.12397.
  33. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. CoRR, abs/2306.05301.
  34. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  35. Can large language models really improve by self-critiquing their own plans? CoRR, abs/2310.08118.
  36. Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowl. Data Eng., 35(8):8052–8072.
  37. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  38. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  39. Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022.
  40. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
  41. The rise and potential of large language model based agents: A survey. CoRR, abs/2309.07864.
  42. Large language models as analogical reasoners. CoRR, abs/2310.01714.
  43. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  44. Don’t make your LLM an evaluation benchmark cheater. CoRR, abs/2311.01964.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.