DocPrompting: Generating Code by Retrieving the Docs (2207.05987v3)

Published 13 Jul 2022 in cs.CL, cs.AI, and cs.SE

Abstract: Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus, existing models inherently cannot generalize to using unseen functions and libraries, because these would never appear in the training data. In contrast, when human programmers use functions and libraries for the first time, they frequently refer to textual resources such as code manuals and documentation, to explore and understand the available functionality. Inspired by this observation, we introduce DocPrompting: a natural-language-to-code generation approach that explicitly leverages documentation by (1) retrieving the relevant documentation pieces given an NL intent, and (2) generating code based on the NL intent and the retrieved documentation. DocPrompting is general: it can be applied to any programming language and is agnostic to the underlying neural model. We demonstrate that DocPrompting consistently improves NL-to-code models: DocPrompting improves strong base models such as CodeT5 by 2.85% in pass@1 (52% relative gain) and 4.39% in pass@10 (30% relative gain) in execution-based evaluation on the popular Python CoNaLa benchmark; on a new Bash dataset tldr, DocPrompting improves CodeT5 and GPT-Neo1.3B by up to absolute 6.9% exact match.

Authors (6)

Shuyan Zhou (28 papers)
Uri Alon (40 papers)
Frank F. Xu (27 papers)
Zhiruo Wang (18 papers)
Zhengbao Jiang (25 papers)
Graham Neubig (342 papers)

Citations (107)

View on Semantic Scholar

Summary

An Overview of "DocPrompting: Generating Code by Retrieving the Docs"

The paper "DocPrompting: Generating Code by Retrieving the Docs" addresses the challenge of natural language to code generation (NL2Code), particularly focusing on the limitations of existing models when dealing with unseen functions and libraries. The authors identify a critical gap in the ability of traditional models to extend their understanding beyond the functions and libraries available at their training time. This limitation arises because of the rapid evolution and expansion of publicly available source-code libraries.

Key Contributions

DocPrompting Mechanism: The authors introduce a novel approach called DocPrompting, which enhances the code generation process by leveraging relevant code documentation. This method is inspired by the natural behavior of human programmers who refer to documentation to understand and use unfamiliar APIs. The DocPrompting approach has two key phases:
- Retrieving relevant documentation based on the natural language intent.
- Generating code by conditioning on both the NL intent and retrieved documentation.
Generality and Flexibility: The approach is designed to be generalizable across any programming language and is agnostic to the underlying neural model architecture, making it a versatile solution in the field of NL2Code.
Performance Improvements: Empirical evaluations demonstrate that integrating DocPrompting into existing NL2Code models results in significant performance enhancements. For instance, DocPrompting improves CodeT5's pass@$1$ by 2.85% and pass@$10$ by 4.39% on the Python CoNaLa benchmark. On a newly introduced Bash dataset (tldr), DocPrompting achieves up to a 6.9% exact match improvement when used with models like CodeT5 and GPT-Neo-1.3B.

Methodology

The DocPrompting methodology follows the retrieve-then-generate paradigm prevalent in open-domain question answering. The authors utilize dense retrieval methods, such as SimCSE, and sparse retrievers like BM25, to fetch relevant code documentation from a large pool of available documents. Subsequently, this retrieved documentation is used in conjunction with neural code generation models, including T5, CodeT5, GPT-Neo, and Codex, to generate accurate code snippets even for unseen library functions.

Empirical Evaluation and Benchmarks

The effectiveness of DocPrompting is tested on two benchmarks:

CoNaLa (Python): Redesigned to ensure that test examples contain functions not present in the training set, providing a robust evaluation of generalization capabilities.
tldr (Bash): A new benchmark created by curating NL-Bash command pairs from a community-driven source, testing the model's ability to handle a completely novel command set during testing.

Implications and Future Directions

DocPrompting's introduction has significant practical and theoretical implications. It paves the way for continuous integration and adaptation to new libraries without requiring retraining of the base models. This method can also be seamlessly extended to other domains where external documentation might play a crucial role, such as API usage and library updates.

Looking forward, the research opens possibilities for exploring more sophisticated retrieval and filtering methods, potentially involving joint retriever-generator frameworks, which could mitigate the propagation of errors between the components. Additionally, it suggests a direction toward leveraging broader documentation resources like tutorials and community forums, further broadening the utility of such models.

In conclusion, the DocPrompting approach marks a substantial advancement in the domain of NL2Code generation, promising robust adaptability and improved handling of unseen code functionalities by effectively harnessing the power of documentation.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/ga89qin/status/1833188358463299778

YouTube

Show All Videos