Attribution of benchmark performance to genuine reasoning
Determine whether the superior performance of deep learning systems on the GLUE and SuperGLUE benchmarks is attributable to genuine reasoning capabilities rather than to the learning of shallow heuristic patterns.
References
However it is not understood if such performance is attributed to underlying reasoning capabilities .
— Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems
(2111.05364 - Faldu et al., 2021) in Section 1: Introduction