Quantify the magnitude of knowledge-dependent overfitting on ARC-AGI

Quantify the magnitude of the contribution of knowledge-dependent benchmark overfitting to model performance on ARC-AGI-1 and ARC-AGI-2, to separate genuine generalization from effects due to pretraining exposure and related data leakage.

Background

The authors contend that a new form of overfitting—arising from strong prior exposure to domain knowledge—now assists models in solving ARC tasks. While they present circumstantial evidence of this effect, they explicitly state that its size is not currently measurable by them.

Understanding the size of this effect is crucial for interpreting ARC-AGI performance, distinguishing true reasoning generalization from contamination-driven gains, and informing future benchmark design (e.g., ARC-AGI-3).

References

Although we assess that this new form of ``overfitting'' assists models in solving ARC, we cannot precisely quantify the magnitude of this effect.

ARC Prize 2025: Technical Report  (2601.10904 - Chollet et al., 15 Jan 2026) in Section: Characterizing AGI through continual benchmark adaptation