Learning Autocompletion from Real-World Datasets (2011.04542v1)

Published 9 Nov 2020 in cs.SE

Abstract: Code completion is a popular software development tool integrated into all major IDEs. Many neural LLMs have achieved promising results in completion suggestion prediction on synthetic benchmarks. However, a recent study When Code Completion Fails: a Case Study on Real-World Completions demonstrates that these results may not translate to improvements in real-world performance. To combat this effect, we train models on real-world code completion examples and find that these models outperform models trained on committed source code and working version snapshots by 12.8% and 13.8% accuracy respectively. We observe this improvement across modeling technologies and show through A/B testing that it corresponds to a 6.2% increase in programmers' actual autocompletion usage. Furthermore, our study characterizes a large corpus of logged autocompletion usages to investigate why training on real-world examples leads to stronger models.

Authors (3)

Gareth Ari Aye (3 papers)
Seohyun Kim (10 papers)
Hongyu Li (107 papers)

Citations (33)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Learning Autocompletion from Real-World Datasets (2011.04542v1)

Summary

Related Papers