Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks (2405.09075v1)

Published 15 May 2024 in cs.SE

Abstract: At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them. This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers. We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. S. Luan, D. Yang, C. Barnaby, K. Sen, and S. Chandra, “Aroma: Code recommendation via structural code search,” Proceedings of the ACM on Programming Languages, vol. 3, 10 2019.
  2. F. Silavong, S. Moran, A. Georgiadis, R. Saphal, and R. Otter, “Senatus - a fast and accurate code-to-code recommendation engine,” in MSR ’22, 2022, pp. 511–523.
  3. R. Holmes, R. J. Walker, and G. C. Murphy, “Strathcona example recommendation tool,” in ESEC/FSE ’13, 2005, pp. 237–240.
  4. A. Zagalsky, O. Barzilay, and A. Yehudai, “Example overflow: Using social media for code recommendation,” in RSSE ’12, 2012, pp. 38–42.
  5. AI Terms, “What is TabNine?” https://aiterms.net/tabnine/, online; accessed 4 November 2022.
  6. T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, and C. Willing, “Jupyter notebooks—a publishing format for reproducible computational workflows,” in ELPUB ’16, 2016, pp. 87–90.
  7. L. Quaranta, F. Calefato, and F. Lanubile, “Kgtorrent: A dataset of python jupyter notebooks from kaggle,” in MSR ’21, 2021, pp. 550–554.
  8. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” EMNLP ’20, pp. 1536–1547, Feb 2020.
  9. D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” arXiv, 3 2022. [Online]. Available: http://arxiv.org/abs/2203.03850
  10. N. Ritta, T. Settewong, R. G. Kula, C. Ragkhitwetsagul, T. Sunetnanta, and K. Matsumoto, “Reusing My Own Code: Preliminary Results for Competitive Coding in Jupyter Notebooks,” in APSEC ’22, 2022.
  11. H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “CodeSearchNet Challenge: Evaluating the State of Semantic Code Search,” arXiv, 9 2019. [Online]. Available: http://arxiv.org/abs/1909.09436

Summary

We haven't generated a summary for this paper yet.