Dice Question Streamline Icon: https://streamlinehq.com

Reasons for the o3 preview–release accuracy discrepancy on ARC-AGI-1

Identify the causes of the large discrepancy in ARC-AGI-1 accuracy between the o3-preview (pre-release) model and the subsequently released o3 model.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper notes external observations that o3-preview’s ARC-AGI-1 performance differs substantially from that of the released o3, which bears on interpreting progress and reproducibility in ARC-like reasoning.

The authors explicitly state that the reasons for this discrepancy are unknown, suggesting a need for technical analysis of model, data, or evaluation differences.

References

The reasons for this discrepancy are not known.

Do AI Models Perform Human-like Abstract Reasoning Across Modalities? (2510.02125 - Beger et al., 2 Oct 2025) in Section: Discussion (footnote)