Do large language models perform genuine reasoning beyond interpolation?
Determine whether large language models trained via next-token prediction genuinely perform reasoning beyond interpolation of training data, by formulating precise criteria for reasoning and establishing whether such models satisfy these criteria.
References
One essential question remains: whether an LLM merely interpolates training data or is capable of genuine reasoning.
— The Mathematics of Artificial Intelligence
(2501.10465 - Peyré, 15 Jan 2025) in Conclusion