Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reducing hallucination in structured outputs via Retrieval-Augmented Generation (2404.08189v1)

Published 12 Apr 2024 in cs.LG, cs.AI, cs.CL, and cs.IR

Abstract: A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While LLMs (LLM) have taken the world by storm, without eliminating or at least reducing hallucinations, real-world GenAI systems may face challenges in user adoption. In the process of deploying an enterprise application that produces workflows based on natural language requirements, we devised a system leveraging Retrieval Augmented Generation (RAG) to greatly improve the quality of the structured output that represents such workflows. Thanks to our implementation of RAG, our proposed system significantly reduces hallucinations in the output and improves the generalization of our LLM in out-of-domain settings. In addition, we show that using a small, well-trained retriever encoder can reduce the size of the accompanying LLM, thereby making deployments of LLM-based systems less resource-intensive.

Reducing Hallucination in Generative AI through Retrieval-Augmented Generation for Structured Output Tasks

Introduction to Retrieval-Augmented Generation (RAG) in Workflow Generation

LLMs are pivotal in transforming natural language inputs into structured outputs such as workflows, which are executed automatically under specified conditions. These advancements are crucial in automating repetitive tasks and improving productivity within enterprise systems. However, the effectiveness of Generative AI (GenAI) applications is marred by the propensity of LLMs to produce hallucinated outputs—generating incorrect or non-existent elements in the structured output. Addressing this challenge, the integration of Retrieval-Augmented Generation (RAG) with LLMs presents a promising solution. By retrieving and incorporating external knowledge before generation, RAG significantly mitigates the occurrence of hallucination, thereby enhancing the trustworthiness and applicability of GenAI systems in real-world settings.

Methodological Overview

The RAG framework introduced in this paper leverages a dual-component approach comprising a retriever model and a generative model. The retriever model is trained to map natural language queries to relevant structured information, such as steps and tables required in the workflow generation task. This mapping facilitates the reduction of hallucinated content by ensuring that the generated outputs are grounded in existing, real-world entities. The generative model, or LLM, is fine-tuned in conjunction with the retrieved content to produce the final structured output in JSON format. This methodology not only curtails hallucination but also allows for the utilization of smaller LLMs without a compromise on performance, presenting a cost-effective solution for deploying GenAI systems.

Results and Contributions

The implementation of RAG in workflow generation yields significant improvements in reducing hallucinations, with a marked decrease in non-existent steps and tables in the generated output. Notably, the fine-tuning of RAG enables the deployment of smaller LLMs alongside a compact retriever model, efficiently managing resource consumption without detracting from model performance. This approach delineates a practical pathway for employing GenAI systems within enterprise applications, ensuring both reliability and scalability.

Implications and Future Directions

The practical implications of this research are manifold, addressing both the theoretical and commercial challenges in the deployment of GenAI systems. The reduction of hallucination in structured output tasks not only enhances the trustworthiness of GenAI applications but also broadens their potential for adoption across various domains. Moreover, the efficiency gains from utilizing smaller models underscore the feasibility of deploying sophisticated AI solutions in resource-constrained settings. Future explorations will focus on refining the synergy between the retriever and the LLM, possibly through joint training or innovative architectural designs, to further optimize the generation process and reduce hallucinations.

Ethical Considerations

While this paper takes significant strides in mitigating the risks associated with hallucination in GenAI systems, it does not eliminate them. Deployed systems incorporate additional safeguards, such as indicating potentially unreliable steps to users, emphasizing the importance of human oversight in AI-generated outputs. Continuous efforts towards understanding and addressing the limitations of GenAI systems are crucial in ensuring their ethical and responsible application.

Conclusion

The integration of Retrieval-Augmented Generation with LLMs presents a compelling approach to reducing hallucinations in structured output tasks, paving the way for more reliable and scalable GenAI systems in enterprise settings. Through methodological innovation and pragmatic application, this paper contributes valuable insights into the ongoing development and deployment of AI technologies, with a clear path forward for future research and implementation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Cambridge. 2023. Why hallucinate? https://dictionary.cambridge.org/editorial/woty.
  2. Accelerating large language model decoding with speculative sampling.
  3. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  4. Dhairya Dalal and Byron V Galbraith. 2020. Evaluating sequence-to-sequence learning models for if-then program synthesis. arXiv preprint arXiv:2002.03485.
  5. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
  6. The faiss library.
  7. Simcse: Simple contrastive learning of sentence embeddings. In 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pages 6894–6910. Association for Computational Linguistics (ACL).
  8. Retrieval-augmented generation for large language models: A survey.
  9. Realm: retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, pages 3929–3938.
  10. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, pages 1735–1742.
  11. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  12. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In EACL 2021-16th Conference of the European Chapter of the Association for Computational Linguistics, pages 874–880. Association for Computational Linguistics.
  13. Mistral 7b. arXiv preprint arXiv:2310.06825.
  14. Joao Gante. 2023. Assisted generation: a new direction toward low-latency text generation.
  15. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
  16. The stack: 3 tb of permissively licensed source code. Transactions on Machine Learning Research.
  17. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096.
  18. Fast inference from transformers via speculative decoding.
  19. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 9459–9474.
  20. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161.
  21. Structure-aware language model pretraining improves dense retrieval on structured data. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11560–11574, Toronto, Canada. Association for Computational Linguistics.
  22. Latent attention for if-then program synthesis. Advances in Neural Information Processing Systems, 29.
  23. Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
  24. Language models of code are few-shot commonsense learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1384–1403.
  25. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005.
  26. Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855.
  27. Codegen: An open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations.
  28. Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 878–888.
  29. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
  30. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  31. Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803.
  32. Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7567–7578.
  33. Brandon T Willard and Rémi Louf. 2023. Efficient guided generation for llms. arXiv preprint arXiv:2307.09702.
  34. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
  35. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921.
  36. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Patrice Béchard (3 papers)
  2. Orlando Marquez Ayala (5 papers)
Citations (15)
Youtube Logo Streamline Icon: https://streamlinehq.com