Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks (2405.09075v1)
Abstract: At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them. This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers. We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks.
- S. Luan, D. Yang, C. Barnaby, K. Sen, and S. Chandra, “Aroma: Code recommendation via structural code search,” Proceedings of the ACM on Programming Languages, vol. 3, 10 2019.
- F. Silavong, S. Moran, A. Georgiadis, R. Saphal, and R. Otter, “Senatus - a fast and accurate code-to-code recommendation engine,” in MSR ’22, 2022, pp. 511–523.
- R. Holmes, R. J. Walker, and G. C. Murphy, “Strathcona example recommendation tool,” in ESEC/FSE ’13, 2005, pp. 237–240.
- A. Zagalsky, O. Barzilay, and A. Yehudai, “Example overflow: Using social media for code recommendation,” in RSSE ’12, 2012, pp. 38–42.
- AI Terms, “What is TabNine?” https://aiterms.net/tabnine/, online; accessed 4 November 2022.
- T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, and C. Willing, “Jupyter notebooks—a publishing format for reproducible computational workflows,” in ELPUB ’16, 2016, pp. 87–90.
- L. Quaranta, F. Calefato, and F. Lanubile, “Kgtorrent: A dataset of python jupyter notebooks from kaggle,” in MSR ’21, 2021, pp. 550–554.
- Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” EMNLP ’20, pp. 1536–1547, Feb 2020.
- D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” arXiv, 3 2022. [Online]. Available: http://arxiv.org/abs/2203.03850
- N. Ritta, T. Settewong, R. G. Kula, C. Ragkhitwetsagul, T. Sunetnanta, and K. Matsumoto, “Reusing My Own Code: Preliminary Results for Competitive Coding in Jupyter Notebooks,” in APSEC ’22, 2022.
- H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “CodeSearchNet Challenge: Evaluating the State of Semantic Code Search,” arXiv, 9 2019. [Online]. Available: http://arxiv.org/abs/1909.09436