Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
81 tokens/sec
Gemini 2.5 Pro Premium
33 tokens/sec
GPT-5 Medium
31 tokens/sec
GPT-5 High Premium
22 tokens/sec
GPT-4o
78 tokens/sec
DeepSeek R1 via Azure Premium
92 tokens/sec
GPT OSS 120B via Groq Premium
436 tokens/sec
Kimi K2 via Groq Premium
209 tokens/sec
2000 character limit reached

Can Transformers Reason Logically? A Study in SAT Solving (2410.07432v2)

Published 9 Oct 2024 in cs.LG, cs.AI, and cs.LO

Abstract: We formally study the logical reasoning capabilities of decoder-only Transformers in the context of the boolean satisfiability (SAT) problem. First, we prove by construction that decoder-only Transformers can decide 3-SAT, in a non-uniform model of computation, using backtracking and deduction via Chain-of-Thought (CoT). %We prove its correctness by showing trace equivalence to the well-known DPLL SAT-solving algorithm. Second, we implement our construction as a PyTorch model with a tool (PARAT) that we designed to empirically demonstrate its correctness and investigate its properties. Third, rather than \textit{programming} a transformer to reason, we evaluate empirically whether it can be \textit{trained} to do so by learning directly from algorithmic traces (``reasoning paths'') from our theoretical construction. The trained models demonstrate strong out-of-distribution generalization on problem sizes seen during training but has limited length generalization, which is consistent with the implications of our theoretical result

Summary

  • The paper introduces a decoder-only Transformer that mimics DPLL backtracking for multi-step logical deduction in SAT solving.
  • It presents PARAT, a compiler that converts high-level sequence operations into stable Transformer model weights.
  • Empirical results on random 3-SAT instances show the model accurately solves formulas, validating its theoretical design.

Can Transformers Reason Logically? A Study in SAT Solving

This paper investigates the logical reasoning capabilities of Transformer-based LLMs in the context of solving the Boolean satisfiability (SAT) problem. The authors explore the ability of Transformers to perform multi-step logical deduction and backtracking, particularly using the concept of Chain-of-Thought (CoT) reasoning.

Theoretical Contributions

The authors propose a decoder-only Transformer model capable of deciding SAT problems through backtracking and deduction. The approach is grounded in theoretical formulations showing trace equivalence to the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, a well-known SAT-solving method. The primary theoretical result states that a Transformer with O(p2)O(p^2) parameters can decide all 3-SAT instances with at most pp variables and cc clauses. The construction emphasizes that Transformers can process constraints in parallel, allowing efficient deductions from logical clauses. However, the construction requires an exponentially growing number of CoT steps in the worst-case scenario.

PARAT Compiler

To translate theoretical constructions into practical implementations, the authors developed PARAT, a compiler that converts high-level sequence operations into Transformer model weights. This tool handles complex operations such as Averaging Hard Attention, enabling a more intuitive and numerically stable implementation of Transformer algorithms.

Empirical Validation

The authors implemented their theoretical constructions using PyTorch and empirically verified the model's correctness on random 3-SAT instances. Their compiled model can solve SAT formulas with up to 20 variables and 88 clauses with perfect accuracy, indicating that the proposed model accurately simulates logical deduction similar to human reasoning.

Implications and Future Work

The paper demonstrates that Transformers, theoretically and empirically, can perform logical deductions and solve SAT problems, shedding light on their potential in formal reasoning tasks. This bears significant implications for AI, suggesting paths for enhancing LLMs' reasoning capabilities.

The exploration raises questions about future research in AI, particularly regarding the limitations of key-value memories and attention mechanisms in achieving length-agile reasoning. Although the current Transformer models can simulate SAT-solving through CoT, extending these techniques for scalability and efficiency remains a challenge.

The paper hints at broader attempts to incorporate algorithmic reasoning components in LLMs, potentially bridging gaps between data-driven approaches and algorithmic reasoning. However, learning algorithms from data, especially for logic-based applications, still presents challenges. Further work could explore enhancing Transformers with architectural changes or training strategies to improve their reasoning generalizations across varied task lengths.

Conclusion

This paper provides a rigorous investigation into the logical reasoning capabilities of Transformers, both theoretically and empirically. By demonstrating that Transformers can be built to solve the SAT problem using CoT, the authors contribute meaningful insights into the operational depth of LLMs, setting the stage for continued exploration in enhancing AI reasoning capabilities.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.