Trustworthiness of LLM answers under fake knowledge cutoffs
Determine principled criteria for evaluating the trustworthiness of GPT-4o’s answers to economic forecasting questions when the model is instructed via system and/or user prompts to ignore information beyond an artificial knowledge cutoff but has memorized the true outcomes for the target periods; specifically, establish whether such answers should be treated as genuine forecasts or as contaminated retrieval of memorized data.
References
While it is feasible to make the model provide worse answers, it is unclear how seriously we should take the answers of a model that pretends not to know something when, in reality, it memorized the correct answer.
— The Memorization Problem: Can We Trust LLMs' Economic Forecasts?
(2504.14765 - Lopez-Lira et al., 20 Apr 2025) in Section 1: Introduction