TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks (2401.12869v1)
Abstract: LLMs (LMs) can solve tasks such as answering questions about tables or images by writing programs. However, using primitive functions often leads to verbose and error-prone programs, and higher-level functions require expert design. To enable better solutions without human labor, we ask code LMs to curate reusable high-level functions, and use them to write solutions. We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions, by generating via using, growing, and periodically trimming the toolbox. On 11 datasets from math, table question answering, and image reasoning tasks, TROVE consistently yields simpler solutions with higher accuracy than baselines using CODELLAMA and previous methods using GPT, while using 79-98% smaller toolboxes. TROVE further enables 31% faster and 13% more accurate human verification than baselines. With the same pipeline, it creates diverse functions for varied tasks and datasets, providing insights into their individual characteristics.
- Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. Transactions of the Association for Computational Linguistics, 2013. URL https://doi.org/10.1162/tacl_a_00209.
- Top-down synthesis for library learning. Proc. ACM Program. Lang., 7(POPL), jan 2023. doi: 10.1145/3571234. URL https://doi.org/10.1145/3571234.
- Large language models as tool makers. arXiv preprint arXiv:2305.17126, 2023. URL https://arxiv.org/pdf/2305.17126.
- Api-assisted code generation for question answering on varied table structures. arXiv preprint arXiv:2310.14687, 2023. URL https://arxiv.org/pdf/2310.14687.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
- HiTab: A hierarchical table dataset for question answering and natural language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022.
- Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=lH1PV42cbF.
- Dreamcoder: growing generalizable, interpretable knowledge with wake–sleep bayesian program learning. Philosophical Transactions of the Royal Society A, 381(2251):20220050, 2023.
- Assistgpt: A general multi-modal assistant that can plan, execute, inspect, and learn. arXiv preprint arXiv:2306.08640, 2023a.
- Pal: Program-aided language models. In International Conference on Machine Learning, pp. 10764–10799. PMLR, 2023b.
- Lilo: Learning interpretable libraries by compressing and documenting code. arXiv preprint arXiv:2310.19791, 2023.
- Visual programming: Compositional visual reasoning without training. arXiv preprint arXiv:2211.11559, 2022. URL https://arxiv.org/pdf/2211.11559.
- Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021. URL https://arxiv.org/pdf/2103.03874.
- Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6700–6709, 2019.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022a.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022b.
- Learning minimal abstractions. In Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 31–42, 2011.
- Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv preprint arXiv:2209.14610, 2023. URL https://arxiv.org/pdf/2209.14610.
- Clin: A continually learning language agent for rapid task adaptation and generalization. arXiv preprint arXiv:2310.10134, 2023.
- Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning, pp. 26106–26128. PMLR, 2023.
- Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, July 2015. URL https://aclanthology.org/P15-1142.
- Creator: Disentangling abstract and concrete reasonings of large language models through tool creation. arXiv preprint arXiv:2305.14318, 2023. URL https://arxiv.org/pdf/2305.14318.
- Natural language to code translation with execution. In Proceedings of EMNLP. Association for Computational Linguistics, December 2022. URL https://aclanthology.org/2022.emnlp-main.231.
- Program synthesis and semantic parsing with learned code idioms, 2019.
- Modular visual question answering via code generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2023.
- Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2023b. URL https://openreview.net/forum?id=1PL1NIMMrw.
- Leveraging language to learn program abstractions and search heuristics. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 11193–11204. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/wong21a.html.
- Lego-prover: Neural theorem proving with growing libraries. arXiv preprint arXiv:2310.00656, 2023.
- Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381, 2023.
- A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696, 2017.
- Craft: Customizing llms by creating and retrieving from specialized toolsets. arXiv preprint arXiv:2309.17428, 2023.
- Online learning of relaxed ccg grammars for parsing to logical form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 678–687, 2007.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.