Reducing hallucination in structured outputs via Retrieval-Augmented Generation (2404.08189v1)
Abstract: A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While LLMs (LLM) have taken the world by storm, without eliminating or at least reducing hallucinations, real-world GenAI systems may face challenges in user adoption. In the process of deploying an enterprise application that produces workflows based on natural language requirements, we devised a system leveraging Retrieval Augmented Generation (RAG) to greatly improve the quality of the structured output that represents such workflows. Thanks to our implementation of RAG, our proposed system significantly reduces hallucinations in the output and improves the generalization of our LLM in out-of-domain settings. In addition, we show that using a small, well-trained retriever encoder can reduce the size of the accompanying LLM, thereby making deployments of LLM-based systems less resource-intensive.
- Cambridge. 2023. Why hallucinate? https://dictionary.cambridge.org/editorial/woty.
- Accelerating large language model decoding with speculative sampling.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Dhairya Dalal and Byron V Galbraith. 2020. Evaluating sequence-to-sequence learning models for if-then program synthesis. arXiv preprint arXiv:2002.03485.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
- The faiss library.
- Simcse: Simple contrastive learning of sentence embeddings. In 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 6894–6910. Association for Computational Linguistics (ACL).
- Retrieval-augmented generation for large language models: A survey.
- Realm: retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, pages 3929–3938.
- Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, pages 1735–1742.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics, pages 874–880. Association for Computational Linguistics.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Joao Gante. 2023. Assisted generation: a new direction toward low-latency text generation.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
- The stack: 3 tb of permissively licensed source code. Transactions on Machine Learning Research.
- Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096.
- Fast inference from transformers via speculative decoding.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 9459–9474.
- Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161.
- Structure-aware language model pretraining improves dense retrieval on structured data. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11560–11574, Toronto, Canada. Association for Computational Linguistics.
- Latent attention for if-then program synthesis. Advances in Neural Information Processing Systems, 29.
- Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
- Language models of code are few-shot commonsense learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1384–1403.
- Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005.
- Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855.
- Codegen: An open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations.
- Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 878–888.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803.
- Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7567–7578.
- Brandon T Willard and Rémi Louf. 2023. Efficient guided generation for llms. arXiv preprint arXiv:2307.09702.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
- Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921.
- Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
- Patrice Béchard (3 papers)
- Orlando Marquez Ayala (5 papers)