Faithfulness of model-generated natural-language rules to internal reasoning
Ascertain and quantify how faithfully the natural-language rules generated by AI models for ConceptARC tasks represent the models’ actual internal reasoning procedures that produce the output grids.
References
We cannot be certain that the natural-language rules generated by the AI models we evaluated are faithful representations of the actual reasoning the models do to solve a task, though in general the output grids generated seem to align with the rules.
— Do AI Models Perform Human-like Abstract Reasoning Across Modalities?
(2510.02125 - Beger et al., 2 Oct 2025) in Section: Limitations