Formal logical reasoning capability of transformer-based LLMs in practice

Establish whether transformer-based large language models can perform formal logical reasoning in practice to solve tasks that require such reasoning, rather than relying primarily on probabilistic pattern-matching of training data.

Background

Prior theoretical work analyzes the computational properties and limitations of transformers, and practical techniques like Chain-of-Thought or scratchpads can provide auxiliary memory, yet often require generating many tokens. Despite these insights, empirical evidence suggests token sensitivity, distributional pattern matching, and performance variance in reasoning tasks, leaving the question of genuine formal reasoning unresolved.

This problem targets a core capability question: whether current transformer-based LLMs actually execute formal logical reasoning processes sufficient to solve reasoning tasks in practice, as opposed to succeeding by matching previously seen patterns or computation subgraphs from pretraining.

References

While these works provide insights into the theoretical computational complexity of transformers, in practice, it remains unclear whether these LLMs can perform formal logical reasoning to solve tasks.

— GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models (2410.05229 - Mirzadeh et al., 7 Oct 2024) in Section 2 (Related Work: Reasoning Language Models)

Formal logical reasoning capability of transformer-based LLMs in practice

Background

References

Related Problems