Picard: Parsing Incrementally for Constrained Auto-Regressive Decoding from LLMs
This paper presents Picard, a method designed to enhance the constrained decoding capabilities of LLMs, specifically in the context of formal language outputs such as SQL. The primary challenge addressed is the tendency of pre-trained models to generate invalid SQL, which limits their usability in applications demanding high precision and adherence to formal specifications.
Methodology
Picard introduces an innovative approach by applying incremental parsing techniques to guide auto-regressive decoding. Unlike prior methods requiring custom vocabularies or architectures, Picard integrates seamlessly with existing LLM frameworks and can be activated during inference without necessitating changes in the model's pre-training or fine-tuning stages. This compatibility extends across models, including large transformers like T5.
The method operates by incrementally filtering out inadmissible tokens during decoding, thus ensuring the generated SQL adheres to syntactic and semantic constraints. Picard leverages monadic combinators for incremental parsing, allowing for multiple operation modes—including lexing and parsing with or without semantic guards. These modes progressively enhance the constraints enforced on the token predictions, thereby refining the validity of the generated queries.
Experimental Results
Empirical evaluations on the Spider and CoSQL datasets demonstrate the efficacy of Picard in improving the performance of fine-tuned T5 models. Particularly notable are the results with the T5-3B model, which achieves state-of-the-art performance on both datasets when augmented with Picard. For example, exact-set-match accuracy on the Spider test set reaches 71.9%, accompanied by execution accuracy of 75.1%.
Picard’s ability to significantly reduce invalid SQL generation is highlighted by the reduction in execution errors, notably from 12% in non-constrained setups to just 2% with Picard's implementation. This improvement is achieved without the need for excessively large beam sizes, contrasting favorably with other validity-filtering approaches which rely on beams of size 16 or more.
Implications and Future Work
The introduction of Picard underscores a pivotal advance in the domain of constrained decoding for natural LLMs, providing a robust solution for generating formal languages like SQL. The implications are substantial for enterprise applications where precision is paramount.
Future research could extend Picard’s capabilities with additional checks and constraints to further align generated queries with complex schema requirements. Moreover, exploring Picard’s applicability to domains beyond SQL could widen its utility in various formal language parsing tasks.
This work contributes significantly to the theory and practice of AI, showcasing a practical method for enhancing the reliability of LLMs in real-world applications while maintaining compatibility with existing model architectures and workflows.