Optimal tradeoff between critique comprehensiveness and hallucinations for annotator performance
Determine the precision–recall balance between the comprehensiveness of large language model–generated code critiques and the rate of hallucinations or nitpicks that maximizes the performance of human annotators within a reinforcement learning from human feedback (RLHF) pipeline evaluating model-written code, including selecting inference-time length controls in Force Sampling Beam Search to achieve this optimum.
References
Using we can trade off comprehensiveness and hallucinations; though we do not currently know what balance is optimal for improving the performance of annotators in an RLHF pipeline.
— LLM Critics Help Catch LLM Bugs
(2407.00215 - McAleese et al., 28 Jun 2024) in Figure “nitpick_comprehensiveness_elo_pareto” caption, Results, subsection “Tradeoffs”