Scalability of GreatGrammar to larger-batch inference

Determine whether GreatGrammar, the grammar-constrained decoding method introduced by Park, Zhou, and D’Antoni (2025), scales efficiently beyond batch size 1 in large-batch inference for large language models, by establishing its performance characteristics under larger batch sizes.

Background

The paper evaluates constrained decoding approaches for structured LLM generation and emphasizes scalability under large-batch inference. While several systems are discussed, GreatGrammar is noted as being evaluated only at batch size 1, leaving its performance at higher batch sizes unspecified.

The authors’ primary contribution, Pre^3, focuses on deterministic pushdown automata to improve efficiency and scalability. In contrasting prior work, they explicitly flag the lack of evidence regarding GreatGrammar’s scalability to larger batch sizes as an open question.

References

Similarly, GreatGrammar demonstrates strong efficiency in handling complex grammars but is only evaluated with batch size equals to 1, leaving its scalability to larger batches an open question.

— Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation (2506.03887 - Chen et al., 4 Jun 2025) in Related Work, LLM Constrained Decoding paragraph

Scalability of GreatGrammar to larger-batch inference

Background

References

Related Problems