Optimal tradeoff between critique comprehensiveness and hallucinations for annotator performance

Determine the precision–recall balance between the comprehensiveness of large language model–generated code critiques and the rate of hallucinations or nitpicks that maximizes the performance of human annotators within a reinforcement learning from human feedback (RLHF) pipeline evaluating model-written code, including selecting inference-time length controls in Force Sampling Beam Search to achieve this optimum.

Background

The paper introduces Force Sampling Beam Search (FSBS) to adjust the length and content of LLM-generated critiques, observing a tradeoff: longer, more comprehensive critiques tend to include more hallucinations and nitpicks. While FSBS enables exploration of different points on this tradeoff frontier, the authors explicitly note they do not know which balance best improves annotators’ effectiveness within an RLHF pipeline.

Establishing this optimum is crucial for deploying critics that help humans evaluate model-written code without overwhelming them with spurious issues, thereby improving label quality and downstream policy training.

References

Using we can trade off comprehensiveness and hallucinations; though we do not currently know what balance is optimal for improving the performance of annotators in an RLHF pipeline.

— LLM Critics Help Catch LLM Bugs (2407.00215 - McAleese et al., 2024) in Figure “nitpick_comprehensiveness_elo_pareto” caption, Results, subsection “Tradeoffs”

Optimal tradeoff between critique comprehensiveness and hallucinations for annotator performance

Sponsor

Background

References

Related Problems