Presence of FCI items in GPT-4o and o1 training data

Determine whether the Force Concept Inventory (FCI) assessment questions and the corresponding answer key are included in the training datasets of OpenAI’s GPT-4o-2024-11-20 and o1-2024-12-17 models used in this study, in order to clarify whether model performance could be influenced by prior exposure to the assessment materials.

Background

This study translates the Force Concept Inventory (FCI) into 53 languages using OpenAI’s GPT-4o and evaluates problem-solving performance of GPT-4o and GPT-o1 on the translated and back-translated assessments. The authors observe high accuracy on text-only items and lower accuracy on image-required items, and they analyze how translation nuances affect correctness.

In discussing limitations, the authors note uncertainty about whether FCI content may have been present in the models’ training corpora. Establishing the presence or absence of FCI questions and answer keys in the training datasets is crucial for interpreting performance results and distinguishing genuine reasoning from potential memorization or prior exposure.

References

Finally, the possibility that the FCI assessment questions and answer key were present in the model's training data cannot be fully excluded; however, the observed performance shifts in response to translation errors suggest that model outputs are still sensitive to semantic changes, not just memorized content.

— Translating the Force Concept Inventory in the age of AI (2508.13908 - Babayeva et al., 19 Aug 2025) in Section 5 (Limitations, implications, and further questions)

Presence of FCI items in GPT-4o and o1 training data

Sponsor

Background

References

Related Problems