Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey (2308.01191v3)

Published 2 Aug 2023 in cs.SE

Abstract: Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile, LLMs possess diverse code-related knowledge, making them versatile for various software engineering challenges. However, LLMs' performance in code clone detection is unclear and needs more study for accurate assessment. In this paper, we provide the first comprehensive evaluation of LLMs for clone detection, covering different clone types, languages, and prompts. We find advanced LLMs excel in detecting complex semantic clones, surpassing existing methods. Adding intermediate reasoning steps via chain-of-thought prompts noticeably enhances performance. Additionally, representing code as vector embeddings, especially with text encoders, effectively aids clone detection.Lastly, the ability of LLMs to detect code clones differs among various programming languages. Our study suggests that LLMs have potential for clone detection due to their language capabilities, offering insights for developing robust LLM-based methods to enhance software engineering.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Shihan Dou (46 papers)
  2. Junjie Shan (12 papers)
  3. Haoxiang Jia (7 papers)
  4. Wenhao Deng (4 papers)
  5. Zhiheng Xi (37 papers)
  6. Wei He (188 papers)
  7. Yueming Wu (16 papers)
  8. Tao Gui (127 papers)
  9. Yang Liu (2253 papers)
  10. Xuanjing Huang (287 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.