Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Emotional RAG LLMs: Reading Comprehension for the Open Internet (2408.11189v2)

Published 20 Aug 2024 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Queries to LLMs can be divided into two parts: the instruction/question and the accompanying context. The context for retrieval-augmented generation (RAG) systems in most benchmarks comes from Wikipedia-like texts written in a neutral and factual tone. However, real-world RAG applications often retrieve internet-based text with diverse tones and linguistic styles, posing challenges for downstream tasks. This paper introduces (a) a dataset that transforms RAG-retrieved passages into emotionally inflected and sarcastic text, (b) an emotion translation model for adapting text to different tones, and (c) a prompt-based method to improve LLMs' pragmatic interpretation of retrieved text.

Summary

  • The paper introduces a novel prompt-based approach that enhances retrieval-augmented generation systems by accurately processing sarcastic text.
  • It details a methodology that integrates a sarcasm-poisoned retrieval corpus created from the Natural Questions dataset using dense retrieval and synthetic data generation.
  • Experimental results show that the intent prompt significantly boosts model performance across various LLMs, even though sarcastic passages slightly affect recall metrics.

Reading with Intent

Overview

The paper "Reading with Intent" addresses the complex problem of how Retrieval-Augmented Generation (RAG) systems, reliant on external information sources, handle emotionally inflected text, specifically sarcasm. The authors identify a significant shortcoming in RAG systems: the inability to consistently leverage detected sarcasm for subsequent text processing. To improve the comprehension and response generation of these systems, the authors focus on sarcastic content in particular, developing synthetic sarcastic passages and introducing a prompting system designed to enhance the model's ability to interpret and generate responses in the presence of sarcasm.

Methodology

Dataset Creation

The authors construct a sarcasm-poisoned retrieval corpus, based on the Natural Questions (NQ) dataset. This involves several steps:

  1. Retrieval: Using an off-the-shelf dense retrieval method (GPL), the authors retrieve the top 200 passages for each query in the NQ validation set.
  2. Sarcasm Poisoning: Synthetic sarcasm is generated using the Llama3-70B-Instruct model. This model generates two types of sarcastic passages: factually correct but sarcastic, and sarcastic with fact distortions.
  3. Integration: Three test datasets are created. The first has all passages sarcastically rephrased but factually consistent. The second replaces parts of the retrieval results with fact-distorted sarcastic and correct sarcastic pairs. In the third, the retrieval corpus is updated with sarcastic passages, allowing the retrieval method to determine the new top-10 passages.

Reading with Intent

The core of the proposed solution focuses on improving reading comprehension by including a prompt-based approach that explicitly instructs the model to consider the emotional intent of the texts it processes. Additionally, the authors enhance this approach by training a smaller model to generate binary intent tags that mark passages as sarcastic or not. These tags help the main model focus on interpreting the text's connotation more accurately.

Experimental Setup

The experiment involved testing various LLMs, including Llama2, Mistral, Phi-3, and Qwen2 series, across four datasets (NQ, FS NQ, PS-M NQ, and PS-A NQ). Models were evaluated based on their ability to include the correct answer in their responses. An intent classifier was also trained on the SARC dataset to aid in the experiments. The performance was measured using a combination of recall and sarcasm retrieval metrics.

Results

The results showed that:

  • The "Reading with Intent" prompting system consistently boosted performance across the Llama2 and Mistral models across all datasets.
  • Smaller models using the enhanced prompt performed comparably to larger models with the base prompt.
  • The approach incorporated with non-oracle intent tags also outperformed the baseline but with more moderate improvements.
  • Ablation studies indicated that the intent prompt had a more significant impact on performance than intent tags alone.
  • The retrieval experiments revealed that sarcastic passages, when added to the corpus, caused a small decline in 'Recall at K' metrics but exhibited significant over-representation among top retrievals.

Discussion

The findings suggest that instructing models explicitly to consider the connotative intent of text significantly enhances their understanding and generation abilities, especially when dealing with emotionally inflected content. The research also highlights the persistent challenge of sarcasm in NLP and suggests that improvements in this area can be both practically and theoretically impactful for future developments in AI.

One of the notable limitations highlighted is the straightforwardness of sarcasm detection in the synthetic dataset, implying that future work could focus on generating more nuanced sarcastic content. Another limitation is the prompt-based nature of the solution, suggesting that instruction-tuning the model for better integration of reading intent might offer further gains.

Conclusion

"Reading with Intent" pioneers addressing the intersection of sentiment analysis and reading comprehension within RAG systems. By constructing a dataset with sarcasm-poisoned passages and implementing a context-aware prompting system, the authors bridge a gap in understanding emotionally inflected texts in AI. Their work lays the groundwork for more nuanced AI deployments that can better interpret and generate text reflective of human communication's complex, often emotional nature. Future improvements in sarcasm detection and model instruction could further refine these systems, highlighting the importance of this research in advancing NLP capabilities.