Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieved In-Context Principles from Previous Mistakes (2407.05682v1)

Published 8 Jul 2024 in cs.CL

Abstract: In-context learning (ICL) has been instrumental in adapting LLMs to downstream tasks using correct input-output examples. Recent advances have attempted to improve model performance through principles derived from mistakes, yet these approaches suffer from lack of customization and inadequate error coverage. To address these limitations, we propose Retrieved In-Context Principles (RICP), a novel teacher-student framework. In RICP, the teacher model analyzes mistakes from the student model to generate reasons and insights for preventing similar mistakes. These mistakes are clustered based on their underlying reasons for developing task-level principles, enhancing the error coverage of principles. During inference, the most relevant mistakes for each question are retrieved to create question-level principles, improving the customization of the provided guidance. RICP is orthogonal to existing prompting methods and does not require intervention from the teacher model during inference. Experimental results across seven reasoning benchmarks reveal that RICP effectively enhances performance when applied to various prompting strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. arXiv preprint arXiv:1905.13319.
  2. Learning from mistakes makes llm better reasoner. arXiv preprint arXiv:2310.20689.
  3. Qwen technical report. arXiv preprint arXiv:2309.16609.
  4. William Berman. 2006. When will they ever learn? learning and teaching from mistakes in the clinical context. Clinical L. Rev., 13:115.
  5. Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. arXiv preprint arXiv:2303.16421.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS).
  7. Contrastive chain-of-thought prompting. arXiv preprint arXiv:2311.09277.
  8. Dante R Chialvo and Per Bak. 1999. Learning from mistakes. Neuroscience, 90(4):1137–1148.
  9. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  10. Investigating symbolic capabilities of large language models. arXiv preprint arXiv:2405.13209.
  11. Amy C Edmondson. 1996. Learning from mistakes is easier said than done: Group and organizational influences on the detection and correction of human error. The journal of applied behavioral science, 32(1):5–28.
  12. Complexity-based prompting for multi-step reasoning. In The Eleventh International Conference on Learning Representations.
  13. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
  14. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  15. Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071.
  16. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301.
  17. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.
  18. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983.
  19. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  20. Stefanie Krause and Frieder Stolzenburg. 2023. Commonsense reasoning and explainable artificial intelligence using large language models. In European Conference on Artificial Intelligence, pages 302–319. Springer.
  21. Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857.
  22. Turning dust into gold: Distilling complex reasoning capabilities from llms by leveraging negative data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 18591–18599.
  23. Program induction by rationale generation: Learning to solve and explain algebraic word problems. arXiv preprint arXiv:1705.04146.
  24. Logiqa: A challenge dataset for machine reading comprehension with logical reasoning. arXiv preprint arXiv:2007.08124.
  25. Teaching small language models to reason. arXiv preprint arXiv:2212.08410.
  26. Are nlp models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191.
  27. Evaluating explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375.
  28. Limitations of language models in arithmetic and symbolic induction. arXiv preprint arXiv:2208.05051.
  29. Autoact: Automatic agent learning from scratch via self-planning. arXiv preprint arXiv:2401.05268.
  30. Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361.
  31. Can language models teach weaker agents? teacher explanations improve students via theory of mind. arXiv preprint arXiv:2306.09299.
  32. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36.
  33. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  34. Automatic prompt augmentation and selection with chain-of-thought from labeled data. arXiv preprint arXiv:2302.12822.
  35. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
  36. Can llms learn from previous mistakes? investigating llms’ errors to boost for reasoning. arXiv preprint arXiv:2403.20046.
  37. Tpd: Enhancing student language model reasoning via principle discovery and guidance. arXiv preprint arXiv:2401.13849.
  38. Learning from failure: Integrating negative examples when fine-tuning large language models as agents. arXiv preprint arXiv:2402.11651.
  39. Democratizing reasoning ability: Tailored learning from large language model. arXiv preprint arXiv:2310.13332.
  40. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.
  41. Evaluating mathematical reasoning beyond accuracy. arXiv preprint arXiv:2404.05692.
  42. C-pack: Packaged resources to advance general chinese embedding.
  43. Lpml: llm-prompting markup language for mathematical reasoning. arXiv preprint arXiv:2309.13078.
  44. Compositional exemplars for in-context learning. In International Conference on Machine Learning, pages 39818–39833. PMLR.
  45. Lumos: Learning agents with unified data, modular design, and open-source llms. arXiv preprint arXiv:2311.05657.
  46. Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems. arXiv preprint arXiv:2305.11374.
  47. In-context principle learning from mistakes. arXiv preprint arXiv:2402.05403.
  48. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
  49. Large language models as commonsense knowledge for large-scale task planning. Advances in Neural Information Processing Systems, 36.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hao Sun (383 papers)
  2. Yong Jiang (194 papers)
  3. Bo Wang (823 papers)
  4. Yingyan Hou (9 papers)
  5. Yan Zhang (954 papers)
  6. Pengjun Xie (85 papers)
  7. Fei Huang (409 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com