Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AskIt: Unified Programming Interface for Programming with Large Language Models (2308.15645v2)

Published 29 Aug 2023 in cs.PL, cs.AI, and cs.SE

Abstract: LLMs exhibit a unique phenomenon known as emergent abilities, demonstrating adeptness across numerous tasks, from text summarization to code generation. While these abilities open up novel avenues in software design and crafting, their incorporation presents substantial challenges. Developers face decisions regarding the use of LLMs for directly performing tasks within applications as well as for generating and executing code to accomplish these tasks. Moreover, effective prompt design becomes a critical concern, given the necessity of extracting data from natural language outputs. To address these complexities, this paper introduces AskIt, a domain-specific language (DSL) specifically designed for LLMs. AskIt simplifies LLM integration by providing a unified interface that not only allows for direct task execution using LLMs but also supports the entire cycle of code generation and execution. This dual capability is achieved through (1) type-guided output control, (2) template-based function definitions, and (3) prompt generation for both usage modes. Our evaluations underscore AskIt's effectiveness. Across 50 tasks, AskIt generated concise prompts, achieving a 16.14 % reduction in prompt length compared to benchmarks. Additionally, by enabling a seamless transition between using LLMs directly in applications and for generating code, AskIt achieved significant efficiency improvements, as observed in our GSM8K benchmark experiments. The implementations of AskIt in TypeScript and Python are available at https://github.com/katsumiok/ts-askit and https://github.com/katsumiok/pyaskit, respectively.

Insights into "AskIt: Unified Programming Interface for Programming with LLMs"

The paper under review introduces AskIt, a domain-specific language (DSL) designed to streamline the integration of LLMs into software development. This work primarily tackles the dual challenge developers face in using LLMs: leveraging them for direct task execution and employing them for code generation. The paper posits that AskIt provides a unified interface that simplifies both applications by integrating type-guided output control, template-based function definitions, and code generation capabilities.

Summary of Contributions

The authors present several notable contributions with AskIt:

  1. Unified Programming Interface: AskIt's unified interface allows seamless shifting between using LLMs directly within applications and employing them for code generation. The interface is built using function call syntax, making it adaptable to various programming tasks.
  2. Type-Guided Output Control: The DSL encapsulates output specifications within type parameters, avoiding the need for intricate prompt engineering traditionally required for parsing natural language outputs from LLMs. This approach significantly aids in response extraction by defining expected outcomes using type information.
  3. Template-Based Function Definitions: AskIt supports the creation of reusable functions defined through prompt templates, reducing redundancy and enabling straightforward code maintenance and reuse.
  4. Code Generation: AskIt extends its DSL capabilities by allowing developers to specify which tasks are codable, facilitating automatic function generation with the help of LLMs like GPT-4. This generated code boasts performance improvements over directly using LLMs for inherently codable tasks.
  5. Empirical Evaluation: Through diverse experiments — from implementing common coding tasks to tackling well-established benchmarks like HumanEval and GSM8K — AskIt demonstrated significant reductions in prompt length and code generation efficiency.

Key Findings

  • Efficiency Gains: For codable tasks, AskIt reduced lines of code (LOC) by 6.56 and 5.52 on average for TypeScript and Python, respectively. This suggests considerable gains in code succinctness and maintainability.
  • Prompt Length Reduction: Directly answerable tasks benefited from a reduction in prompt length by 16.14% on average, indicating AskIt's potential to streamline interaction with LLMs by minimizing prompt verbosity.
  • Performance Improvement: The transition from direct task execution using LLMs to execution with generated code led to significant speed improvements in mathematical problem-solving tasks using the GSM8K benchmark, with speedup ratios exceeding 275,000 in TypeScript and 6,969,000 in Python.

Implications and Future Directions

The emergence of DSLs like AskIt highlights a critical shift towards more ergonomic integrations of LLMs in programming environments, addressing challenges in prompt engineering and response parsing. The experimental results underscore AskIt's potential to improve software development efficiency, particularly when tasks are divided based on their codability and direct answerability.

Theoretically, AskIt represents a significant step toward harmonizing LLM interactions in software development, hinting at broader implications for how future programming languages might incorporate LLM-driven enhancements.

Practically, AskIt's strategy of employing type systems for response specifications offers a promising direction for enhancing the robustness and reliability of LLM-based systems, especially in safety-critical applications where deterministic responses are paramount.

Looking forward, developments in AI could further expand the functionality of AskIt, potentially supporting even more intricate task definitions and fostering its adoption beyond current application domains. Integrating continuous feedback loops into AskIt for refining LLM responses or crafting more complex DSL architectures are potential avenues for future research.

In conclusion, AskIt provides a pragmatic and sophisticated approach to LLM integration, balancing the nuanced demands of direct task execution with the potential of code generation, all through a unified and intuitive programming interface. Its contributions to making LLM technology more accessible and efficient in conventional software development contexts are both timely and impactful.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus, “Emergent abilities of large language models,” 2022.
  2. D. Saxton, E. Grefenstette, F. Hill, and P. Kohli, “Analysing mathematical reasoning abilities of neural models,” 2019.
  3. P. Lewis, L. Denoyer, and S. Riedel, “Unsupervised question answering by cloze translation,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.   Association for Computational Linguistics, 2019. [Online]. Available: https://doi.org/10.18653%2Fv1%2Fp19-1484
  4. A. Narayan, P. Talukdar, and C. Cardie, “Abstractive text summarization using pointer-generator networks,” arXiv preprint arXiv:1801.07038, 2018.
  5. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, D. Amodei, and T. Mordatch, “Language models are few-shot learners,” arXiv preprint arXiv:2005.14165, 2020.
  6. A. Prakash, S. A. Hasan, K. Lee, V. Datla, A. Qadir, J. Liu, and O. Farri, “Neural paraphrase generation with stacked residual lstm networks,” 2016.
  7. A. Srivastava, E. Grefenstette, D. Das, I. Sutskever, and R. S. Zemel, “Improving language understanding by generating text,” arXiv preprint arXiv:1801.06146, 2018.
  8. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba, “Evaluating large language models trained on code,” 2021.
  9. Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago et al., “Competition-level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022.
  10. D. Fried, A. Aghajanyan, J. Lin, S. Wang, E. Wallace, F. Shi, R. Zhong, W. tau Yih, L. Zettlemoyer, and M. Lewis, “Incoder: A generative model for code infilling and synthesis,” 2023.
  11. R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim, Q. Liu, E. Zheltonozhskii, T. Y. Zhuo, T. Wang, O. Dehaene, M. Davaadorj, J. Lamy-Poirier, J. Monteiro, O. Shliazhko, N. Gontier, N. Meade, A. Zebaze, M.-H. Yee, L. K. Umapathi, J. Zhu, B. Lipkin, M. Oblokulov, Z. Wang, R. Murthy, J. Stillerman, S. S. Patel, D. Abulkhanov, M. Zocca, M. Dey, Z. Zhang, N. Fahmy, U. Bhattacharyya, W. Yu, S. Singh, S. Luccioni, P. Villegas, M. Kunakov, F. Zhdanov, M. Romero, T. Lee, N. Timor, J. Ding, C. Schlesinger, H. Schoelkopf, J. Ebert, T. Dao, M. Mishra, A. Gu, J. Robinson, C. J. Anderson, B. Dolan-Gavitt, D. Contractor, S. Reddy, D. Fried, D. Bahdanau, Y. Jernite, C. M. Ferrandis, S. Hughes, T. Wolf, A. Guha, L. von Werra, and H. de Vries, “Starcoder: may the source be with you!” 2023.
  12. B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. Bhatt, C. C. Ferrer, A. Grattafiori, W. Xiong, A. Défossez, J. Copet, F. Azhar, H. Touvron, L. Martin, N. Usunier, T. Scialom, and G. Synnaeve, “Code llama: Open foundation models for code,” 2023.
  13. N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Rajamani, and R. Sharma, “Jigsaw: Large language models meet program synthesis,” in Proceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 1219–1231. [Online]. Available: https://doi.org/10.1145/3510003.3510203
  14. OpenAI, “Gpt-4 technical report,” 2023.
  15. K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman, “Training verifiers to solve math word problems,” 2021.
  16. T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 22 199–22 213.
  17. S. Gulwani, “Automating string processing in spreadsheets using input-output examples,” SIGPLAN Not., vol. 46, no. 1, p. 317–330, jan 2011. [Online]. Available: https://doi.org/10.1145/1925844.1926423
  18. D. E. Shaw, W. R. Swartout, and C. C. Green, “Inferring lisp programs from examples.” in IJCAI, vol. 75, 1975, pp. 260–267.
  19. J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35.   Curran Associates, Inc., 2022, pp. 24 824–24 837. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf
  20. S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, J. Wang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “Metagpt: Meta programming for a multi-agent collaborative framework,” 2023.
  21. E. Zelikman, Q. Huang, G. Poesia, N. D. Goodman, and N. Haber, “Parsel: Algorithmic reasoning with language models by composing decompositions,” 2023.
  22. D. Huang, Z. Nan, X. Hu, P. Jin, S. Peng, Y. Wen, R. Zhang, Z. Du, Q. Guo, Y. Pu, and Y. Chen, “Anpl: Compiling natural programs with interactive decomposition,” 2023.
  23. N. Muennighoff, Q. Liu, A. Zebaze, Q. Zheng, B. Hui, T. Y. Zhuo, S. Singh, X. Tang, L. von Werra, and S. Longpre, “Octopack: Instruction tuning code large language models,” 2023.
  24. A. Zhou, K. Yan, M. Shlapentokh-Rothman, H. Wang, and Y.-X. Wang, “Language agent tree search unifies reasoning acting and planning in language models,” 2023.
  25. K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman, “Training verifiers to solve math word problems,” CoRR, vol. abs/2110.14168, 2021. [Online]. Available: https://arxiv.org/abs/2110.14168
  26. L. Beurer-Kellner, M. Fischer, and M. Vechev, “Prompting is programming: A query language for large language models,” Proc. ACM Program. Lang., vol. 7, no. PLDI, jun 2023. [Online]. Available: https://doi.org/10.1145/3591300
  27. S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive apis,” arXiv preprint arXiv:2305.15334, 2023.
  28. K. Okuda, “Artifact for cgo’24 paper: ’askit: Unified programming interface for programming with large language models’,” https://doi.org/10.5281/zenodo.10327179, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Katsumi Okuda (2 papers)
  2. Saman Amarasinghe (30 papers)
Citations (2)