Ascertain training-data contamination of large language models with The Outer Worlds content
Ascertain whether The Outer Worlds game data underlying the Knudge dialogues are included in the pretraining corpora of the large language models (e.g., T5 and GPT-3) used in the experiments, in order to assess potential training-data contamination and interpret results appropriately.
Sponsor
References
It is difficult to know whether the game data used for experimentation is part of the training data for such models, as The Outer Worlds came out in 2019.
— Ontologically Faithful Generation of Non-Player Character Dialogues
(2212.10618 - Weir et al., 2022) in Limitations