Preventing Hallucinations in Claude3-Sonnet under the REAPER Planning Prompt

Establish whether prompt-tuning strategies can prevent hallucinations in retrieval plan outputs produced by the Claude3-Sonnet language model when executing the REAPER planning prompt for a conversational shopping assistant, ensuring the model uses only the specified tools and generates valid multi-step tool sequences and parameters.

Background

The authors tested the REAPER planning prompt with Claude3-Sonnet and observed persistent hallucinations, such as introducing extraneous steps or explanations beyond the specified tools and plan format. Although Claude3-Sonnet is a larger model, its outputs did not reliably adhere to the required tool-only plan constraints.

Because REAPER’s reliability depends on strict instruction following and non-hallucinatory tool usage, identifying prompt-tuning or other strategies to eliminate these hallucinations in Claude3-Sonnet remains an unresolved question with practical significance for high-latency, high-capability models.

References

We also tested out our REAPER prompt on Claude3-Sonnet (Figure~(ii)) and could not prevent hallucinations.

REAPER: Reasoning based Retrieval Planning for Complex RAG Systems (2407.18553 - Joshi et al., 26 Jul 2024) in Section 6.1 (Comparison with Open Models)