Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAIL: Search-Augmented Instruction Learning (2305.15225v2)

Published 24 May 2023 in cs.CL

Abstract: LLMs have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information. In this work, we propose search-augmented instruction learning (SAIL), which grounds the language generation and instruction following abilities on complex search results generated by in-house and external search engines. With an instruction tuning corpus, we collect search results for each training case from different search APIs and domains, and construct a new search-grounded training set containing \textit{(instruction, grounding information, response)} triplets. We then fine-tune the LLaMA-7B model on the constructed training set. Since the collected results contain unrelated and disputing languages, the model needs to learn to ground on trustworthy search results, filter out distracting passages, and generate the target response. The search result-denoising process entails explicit trustworthy information selection and multi-hop reasoning, since the retrieved passages might be informative but not contain the instruction-following answer. Experiments show that the fine-tuned SAIL-7B model has a strong instruction-following ability, and it performs significantly better on transparency-sensitive tasks, including open-ended question answering and fact checking.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879.
  5. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  7. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457.
  8. Hate speech dataset from a white supremacy forum. EMNLP 2018, page 11.
  9. Climate-fever: A dataset for verification of real-world climate claims. arXiv preprint arXiv:2012.00614.
  10. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  11. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3309–3326.
  12. Large language models can self-improve. arXiv preprint arXiv:2210.11610.
  13. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  14. Neema Kotonya and Francesca Toni. 2020. Explainable automated fact-checking for public health claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7740–7754, Online. Association for Computational Linguistics.
  15. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  16. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  17. Hongyin Luo and James Glass. 2023. Logic against bias: Textual entailment mitigates stereotypical sentence reasoning. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1243–1254, Dubrovnik, Croatia. Association for Computational Linguistics.
  18. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
  19. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
  20. Can a suit of armor conduct electricity? a new dataset for open book question answering. In Conference on Empirical Methods in Natural Language Processing.
  21. Cross-task generalization via natural language crowdsourcing instructions. In ACL.
  22. OpenAI. 2022. Introducing chatgpt.
  23. OpenAI. 2023. Gpt-4 technical report.
  24. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  25. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  26. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  27. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
  28. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490.
  29. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  30. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  31. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
  32. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  33. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  34. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  35. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  36. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  37. Super-naturalinstructions:generalization via declarative instructions on 1600+ tasks. In EMNLP.
  38. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  39. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  40. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  41. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  42. Interpretable unified language checking. arXiv preprint arXiv:2304.03728.
Citations (20)

Summary

We haven't generated a summary for this paper yet.