Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compositional API Recommendation for Library-Oriented Code Generation (2402.19431v1)

Published 29 Feb 2024 in cs.SE, cs.AI, and cs.CL

Abstract: LLMs have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. 2022. ADA Embedding. https://openai.com/blog/new-and-improved-embedding-model.
  2. 2022. ChatGPT. https://chat.openai.com/.
  3. 2022. GitHub Copilot. https://github.com/features/copilot.
  4. How Do In-Context Examples Affect Compositional Generalization? arXiv:2305.04835 [cs.CL]
  5. Code generation tools (almost) for free? a study of few-shot, pre-trained language models on code. arXiv preprint arXiv:2206.01335 (2022).
  6. Grounded copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 85–111.
  7. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  8. Raymond PL Buse and Westley Weimer. 2012. Synthesizing API usage examples. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 782–792.
  9. Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 160–172.
  10. Searching connected API subgraph via text phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.
  11. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397 (2022).
  12. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  13. Source Code Recommender Systems: The Practitioners’ Perspective. arXiv preprint arXiv:2302.04098 (2023).
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  15. A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
  16. Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
  17. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 631–642.
  18. Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java expressions from free-form queries. In Proceedings of the 2015 acm sigplan international conference on object-oriented programming, systems, languages, and applications. 416–432.
  19. On the effectiveness of pretrained models for api learning. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 309–320.
  20. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1634–1645.
  21. API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 293–304.
  22. Multi-Modal API Recommendation. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 272–283.
  23. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  24. Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092–1097.
  25. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 5315–5333.
  26. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 120–130.
  27. CodeGen4Libs: A Two-stage Approach for Library-oriented Code Generation. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Kirchberg, Luxembourg, September 11-15, 2023. IEEE, 0–0.
  28. James Martin and Jin LC Guo. 2022. Deep api learning revisited. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 321–330.
  29. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
  30. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. 111–120.
  31. Larry R Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications 5, 64-67 (2001), 2.
  32. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  33. Revisiting, benchmarking and exploring API recommendation: How far are we? IEEE Transactions on Software Engineering 49, 4 (2022), 1876–1897.
  34. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering. 357–367.
  35. Mohammad Masudur Rahman and Chanchal Roy. 2018. Nlp2api: Query reformulation for code search using crowdsourced knowledge and extra-large data analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 714–714.
  36. Rack: Automatic api recommendation using crowdsourced knowledge. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 349–359.
  37. Automatic Code Summarization via ChatGPT: How Far Are We? arXiv preprint arXiv:2305.12865 (2023).
  38. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
  39. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.
  40. Execution-based evaluation for open-domain code generation. arXiv preprint arXiv:2212.10481 (2022).
  41. Documentation-Guided API Sequence Search without Worrying about the Text-API Semantic Gap. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 343–354.
  42. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  43. Clear: contrastive learning for api recommendation. In Proceedings of the 44th International Conference on Software Engineering. 376–387.
  44. API recommendation for machine learning libraries: how far are we?. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 370–381.
  45. Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th international conference on mining software repositories. 476–486.
  46. Private-library-oriented code generation with large language models. arXiv preprint arXiv:2307.15370 (2023).
  47. When language model meets private library. arXiv preprint arXiv:2210.17236 (2022).
  48. CERT: Continual Pre-training on Sketches for Library-oriented Code Generation. arXiv preprint arXiv:2206.06888 (2022).
  49. Coder reviewer reranking for code generation. In International Conference on Machine Learning. PMLR, 41832–41846.
  50. MAPO: Mining and recommending API usage patterns. In ECOOP 2009–Object-Oriented Programming: 23rd European Conference, Genoa, Italy, July 6-10, 2009. Proceedings 23. Springer, 318–343.
  51. Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zexiong Ma (7 papers)
  2. Shengnan An (12 papers)
  3. Bing Xie (25 papers)
  4. Zeqi Lin (25 papers)
Citations (7)