DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines (2312.13382v2)
Abstract: Chaining LLM (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems. We also propose strategies to use assertions at inference time for automatic self-refinement with LMs. We report on four diverse case studies for text generation and find that LM Assertions improve not only compliance with imposed rules but also downstream task performance, passing constraints up to 164% more often and generating up to 37% more higher-quality responses. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspy
- Jass - java with assertions. In Havelund, K. and Rosu, G. (eds.), Workshop on Runtime Verification, RV 2001, in connection with CAV 2001, Paris, France, July 23, 2001, volume 55 of Electronic Notes in Theoretical Computer Science, pp. 103–117. Elsevier, 2001. doi: 10.1016/S1571-0661(04)00247-6. URL https://doi.org/10.1016/S1571-0661(04)00247-6.
- Prompting is programming: A query language for large language models. Proceedings of the ACM on Programming Languages, 7(PLDI):1946–1969, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Chase, H. LangChain, October 2022. URL https://github.com/langchain-ai/langchain.
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495, 2023.
- Enabling large language models to generate text with citations. arXiv preprint arXiv:2305.14627, 2023.
- Trueteacher: Learning factual consistency evaluation with large language models. arXiv preprint arXiv:2305.11171, 2023.
- Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2):1–119, 2017.
- Lexically constrained decoding for sequence generation using grid beam search. arXiv preprint arXiv:1704.07138, 2017.
- Improved lexically constrained decoding for translation and monolingual rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 839–850, 2019.
- Model assertions for monitoring and improving ml models. Proceedings of Machine Learning and Systems, 2:481–496, 2020.
- Baleen: Robust multi-hop reasoning at scale via condensed retrieval. Advances in Neural Information Processing Systems, 34:27670–27682, 2021.
- Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp, 2022.
- Dspy: Compiling declarative language model calls into self-improving pipelines. CoRR, abs/2310.03714, 2023. doi: 10.48550/ARXIV.2310.03714. URL https://doi.org/10.48550/arXiv.2310.03714.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
- Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251, 2023.
- On the synthesis of a reactive module. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 179–190, 1989.
- Python Software Foundation. 7. simple statements. https://docs.python.org/3/reference/simple_stmts.html#the-assert-statement, 2023. Accessed: 2023-12-01.
- Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement. arXiv preprint arXiv:2310.08559, 2023.
- Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails, 2023.
- Colbertv2: Effective and efficient retrieval via lightweight late interaction. arXiv preprint arXiv:2112.01488, 2021.
- Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- The art of llm refinement: Ask, refine, and trust. arXiv preprint arXiv:2311.07961, 2023.
- Solar-Lezama, A. Program synthesis by sketching. University of California, Berkeley, 2008.
- Solar-Lezama, A. The sketching approach to program synthesis. In Asian symposium on programming languages and systems, pp. 4–13. Springer, 2009.
- Template-based program verification and program synthesis. International Journal on Software Tools for Technology Transfer, 15:497–518, 2013.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509, 2022.
- Llms cannot find reasoning errors, but can correct them! arXiv preprint arXiv:2311.08516, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Pinpoint, not criticize: Refining large language models via fine-grained actionable feedback. arXiv preprint arXiv:2311.09336, 2023.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Automatic evaluation of attribution by large language models. arXiv preprint arXiv:2305.06311, 2023.