Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reinforcement Learning for Optimizing RAG for Domain Chatbots (2401.06800v1)

Published 10 Jan 2024 in cs.CL and cs.AI

Abstract: With the advent of LLMs (LLM), conversational assistants have become prevalent for domain use cases. LLMs acquire the ability to contextual question answering through training, and Retrieval Augmented Generation (RAG) further enables the bot to answer domain-specific questions. This paper describes a RAG-based approach for building a chatbot that answers user's queries using Frequently Asked Questions (FAQ) data. We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model, both in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. As an LLM, we use an open API-based paid ChatGPT model. We noticed that a previously retrieved-context could be used to generate an answer for specific patterns/sequences of queries (e.g., follow-up queries). Hence, there is a scope to optimize the number of LLM tokens and cost. Assuming a fixed retrieval model and an LLM, we optimize the number of LLM tokens using Reinforcement Learning (RL). Specifically, we propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost. The policy model can perform two actions: to fetch FAQ context or skip retrieval. We use the open API-based GPT-4 as the reward model. We then train a policy model using policy gradient on multiple training chat sessions. As a policy model, we experimented with a public gpt-2 model and an in-house BERT model. With the proposed RL-based optimization combined with similarity threshold, we are able to achieve significant cost savings while getting a slightly improved accuracy. Though we demonstrate results for the FAQ chatbot, the proposed RL approach is generic and can be experimented with any existing RAG pipeline.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. 2023. Augmented Embeddings for Custom Retrievals. arXiv:2310.05380.
  2. Understanding the impact of entropy on policy optimization. arXiv:1811.11214.
  3. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv:2310.11511.
  4. RRAML: Reinforced Retrieval Augmented Machine Learning.
  5. Self-supervised Pretraining of Visual Features in the Wild. arXiv:2103.01988.
  6. Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings. arXiv:2308.02575.
  7. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
  8. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110.
  9. Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172.
  10. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. arXiv:2303.16634.
  11. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. arXiv:2305.15294.
  12. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748.
  13. Text Embeddings by Weakly-Supervised Contrastive Pre-training. arXiv:2212.03533.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mandar Kulkarni (13 papers)
  2. Praveen Tangarajan (1 paper)
  3. Kyung Kim (2 papers)
  4. Anusua Trivedi (8 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com