Can state-of-the-art LLMs grasp the hierarchical complexity of formal languages?

Determine whether state-of-the-art large language models can grasp the structured, hierarchical complexity of formal languages as defined in the Theory of Computation, i.e., whether they fundamentally understand and can reason over the tiered structure of formal language classes and their associated computational mechanisms.

Background

The paper argues that existing benchmarks for software engineering focus on functional correctness and lack a principled, computation- and complexity-based evaluation. As a result, whether current LLMs can internalize and reason about the structured complexity of formal languages remains unsettled.

To address this gap, the authors introduce ChomskyBench, aiming to evaluate LLMs across the Chomsky Hierarchy. The open question motivates the benchmark’s design and the empirical study reported in the paper.

References

Therefore, it is still unknown whether state-of-the-art (SOTA) LLMs can grasp the structured, hierarchical complexity of formal languages as defined by Theory of Computation.

— Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy (2604.02709 - Dong et al., 3 Apr 2026) in Abstract

Can state-of-the-art LLMs grasp the hierarchical complexity of formal languages?

Background

References

Related Problems