Traceability of GPT-3 Training Data for Affordance Examples

Ascertain whether the GPT-3 training corpus included examples from Arthur Glenberg’s earlier latent semantic analysis affordance research that were reused in the Glenberg and Jones probes, in order to evaluate potential test contamination.

Background

Glenberg and Jones evaluated BERT, RoBERTa, and GPT-3 on sentences probing object affordances. During discussion, participants raised concerns that GPT-3 may have been trained on earlier affordance examples from Glenberg’s prior work, which could inflate apparent performance by exposing the model to test items during training.

The authors report that it is not possible to verify whether such examples were included, highlighting a broader transparency issue for large models’ training data and its impact on fair evaluation.

References

Finally, there were some concerns that the earlier research by Glenberg (and thus the examples used in the current study) had been included in the training data for GPT-3. Jones said that there was no way to know this, but that AI performance on other tests had been shown to suffer when examples of those tests were removed from training data.

Embodied, Situated, and Grounded Intelligence: Implications for AI  (2210.13589 - Millhouse et al., 2022) in Discussion — “Language Comprehension Requires Affordances” (Glenberg & Jones)