Ask Again, Then Fail: Large Language Models' Vacillations in Judgment (2310.02174v5)

Published 3 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We observe that current conversational LLMs often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current LLMs. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches LLMs to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.

References (59)

Citations (7)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - NUSTM/LLMs-Waver-In-Judgements (12 stars)

Tweets

https://twitter.com/grayground_x/status/1791499358183309423

https://twitter.com/SinclairWang1/status/1760863344670683472

https://twitter.com/SinclairWang1/status/1791505369761710495

https://twitter.com/SinclairWang1/status/1760858911580553706

https://twitter.com/grayground_x/status/1791491629309911450

https://twitter.com/realmofresearch/status/1802747882250141754

Ask Again, Then Fail: Large Language Models' Vacillations in Judgment (2310.02174v5)

Summary

Related Papers

GitHub

Tweets