Compositional API Recommendation for Library-Oriented Code Generation (2402.19431v1)
Abstract: LLMs have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.
- 2022. ADA Embedding. https://openai.com/blog/new-and-improved-embedding-model.
- 2022. ChatGPT. https://chat.openai.com/.
- 2022. GitHub Copilot. https://github.com/features/copilot.
- How Do In-Context Examples Affect Compositional Generalization? arXiv:2305.04835 [cs.CL]
- Code generation tools (almost) for free? a study of few-shot, pre-trained language models on code. arXiv preprint arXiv:2206.01335 (2022).
- Grounded copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 85–111.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Raymond PL Buse and Westley Weimer. 2012. Synthesizing API usage examples. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 782–792.
- Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 160–172.
- Searching connected API subgraph via text phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.
- Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397 (2022).
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Source Code Recommender Systems: The Practitioners’ Perspective. arXiv preprint arXiv:2302.04098 (2023).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
- Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
- Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 631–642.
- Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java expressions from free-form queries. In Proceedings of the 2015 acm sigplan international conference on object-oriented programming, systems, languages, and applications. 416–432.
- On the effectiveness of pretrained models for api learning. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 309–320.
- Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1634–1645.
- API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 293–304.
- Multi-Modal API Recommendation. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 272–283.
- Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
- Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092–1097.
- Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 5315–5333.
- Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.
- CodeGen4Libs: A Two-stage Approach for Library-oriented Code Generation. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Kirchberg, Luxembourg, September 11-15, 2023. IEEE, 0–0.
- James Martin and Jin LC Guo. 2022. Deep api learning revisited. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 321–330.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
- Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. 111–120.
- Larry R Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications 5, 64-67 (2001), 2.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Revisiting, benchmarking and exploring API recommendation: How far are we? IEEE Transactions on Software Engineering 49, 4 (2022), 1876–1897.
- SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering. 357–367.
- Mohammad Masudur Rahman and Chanchal Roy. 2018. Nlp2api: Query reformulation for code search using crowdsourced knowledge and extra-large data analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 714–714.
- Rack: Automatic api recommendation using crowdsourced knowledge. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 349–359.
- Automatic Code Summarization via ChatGPT: How Far Are We? arXiv preprint arXiv:2305.12865 (2023).
- Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
- Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.
- Execution-based evaluation for open-domain code generation. arXiv preprint arXiv:2212.10481 (2022).
- Documentation-Guided API Sequence Search without Worrying about the Text-API Semantic Gap. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 343–354.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Clear: contrastive learning for api recommendation. In Proceedings of the 44th International Conference on Software Engineering. 376–387.
- API recommendation for machine learning libraries: how far are we?. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 370–381.
- Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th international conference on mining software repositories. 476–486.
- Private-library-oriented code generation with large language models. arXiv preprint arXiv:2307.15370 (2023).
- When language model meets private library. arXiv preprint arXiv:2210.17236 (2022).
- CERT: Continual Pre-training on Sketches for Library-oriented Code Generation. arXiv preprint arXiv:2206.06888 (2022).
- Coder reviewer reranking for code generation. In International Conference on Machine Learning. PMLR, 41832–41846.
- MAPO: Mining and recommending API usage patterns. In ECOOP 2009–Object-Oriented Programming: 23rd European Conference, Genoa, Italy, July 6-10, 2009. Proceedings 23. Springer, 318–343.
- Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations.
- Zexiong Ma (7 papers)
- Shengnan An (12 papers)
- Bing Xie (25 papers)
- Zeqi Lin (25 papers)