Influence of Emotional Payload in Prompt Tone on LLM Behavior

Ascertain whether the emotional payload carried by polite or rude phrasing in prompts affects the behavior or accuracy of large language models such as ChatGPT-4o, beyond the models’ token-level processing of the text.

Background

The paper’s prompts include explicit polite and rude prefixes, and the results show higher accuracy for ruder tones. The authors question whether this difference arises from any sensitivity to emotional valence or solely from token-level statistical properties.

They explicitly state uncertainty regarding whether LLMs process or are influenced by the emotional content of prompt tone, framing a concrete question about the role of affective semantics in model outputs.

References

After all, the politeness phrase is just a string of words to the LLM, and we don't know if the emotional payload of the phrase matters to the LLM (Bos, 2024).

— Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper) (2510.04950 - Dobariya et al., 6 Oct 2025) in Section 5, Discussion and conclusions

Influence of Emotional Payload in Prompt Tone on LLM Behavior

Background

References

Related Problems