Verify GPQA Diamond training exposure of gpt-4o-mini and gpt-4o
Determine whether OpenAI’s gpt-4o-mini and gpt-4o were exposed during training to GPQA Diamond multiple-choice questions in their default unshuffled form, in order to assess potential positional-answer biases and interpret reported performance differences between shuffled and unshuffled settings.
References
However, due to the proprietary nature of the model, we can not verify conclusively whether gpt-4o-mini or gpt-4o was exposed to GPQA Diamond questions (in their default, unshuffled state) during training.
— Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures
(2502.05078 - Pandey et al., 7 Feb 2025) in Discussion