Scalability of GreatGrammar to larger-batch inference
Determine whether GreatGrammar, the grammar-constrained decoding method introduced by Park, Zhou, and D’Antoni (2025), scales efficiently beyond batch size 1 in large-batch inference for large language models, by establishing its performance characteristics under larger batch sizes.
References
Similarly, GreatGrammar demonstrates strong efficiency in handling complex grammars but is only evaluated with batch size equals to 1, leaving its scalability to larger batches an open question.
— Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation
(2506.03887 - Chen et al., 4 Jun 2025) in Related Work, LLM Constrained Decoding paragraph