Explain Claude-3-Sonnet’s parity with Claude-3-Opus on long-form factuality
Determine the reasons why Claude-3-Sonnet achieves similar long-form factuality as Claude-3-Opus despite being a smaller model when evaluated on LongFact-Objects using the SAFE (Search-Augmented Factuality Evaluator) pipeline and aggregated with F1@K.
References
Notably, we found that Claude-3-Sonnet achieves similar long-form factuality as Claude-3-Opus despite being a smaller model, but without access to further details about these models, it is unclear why this was the case.
— Long-form factuality in large language models
(2403.18802 - Wei et al., 27 Mar 2024) in Section 6, Larger LLMs are more factual (sec:main-results)