Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Survey of Code Search Based on Deep Learning (2305.05959v2)

Published 10 May 2023 in cs.SE and cs.PL

Abstract: Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have been applied to code search with significant progress. Deep learning is now the leading paradigm for code search. In this survey, we provide a comprehensive overview of deep learning-based code search. We review the existing deep learning-based code search framework which maps query/code to vectors and measures their similarity. Furthermore, we propose a new taxonomy to illustrate the state-of-the-art deep learning-based code search in a three-steps process: query semantics modeling, code semantics modeling, and matching modeling which involves the deep learning model training. Finally, we suggest potential avenues for future research in this promising field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (103)
  1. code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3, POPL (2019), 40:1–40:29. https://doi.org/10.1145/3290353
  2. NS3: Neuro-symbolic Semantic Code Search. In NeurIPS. http://papers.nips.cc/paper_files/paper/2022/hash/43f5f6c5cb333115914c8448b8506411-Abstract-Conference.html
  3. Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017, Volume 2: Short Papers, Greg Kondrak and Taro Watanabe (Eds.). Asian Federation of Natural Language Processing, 314–319. https://aclanthology.org/I17-2053/
  4. Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations. In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 511–521. https://doi.org/10.1145/3404835.3462840
  5. CSSAM: Code Search via Attention Matching of Code Semantics and Structures. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, Tao Zhang, Xin Xia, and Nicole Novielli (Eds.). IEEE, 402–413. https://doi.org/10.1109/SANER56733.2023.00045
  6. When deep learning met code search. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 964–974. https://doi.org/10.1145/3338906.3340458
  7. Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 1273–1285. https://doi.org/10.1109/ICSE43902.2021.00116
  8. Cross-Domain Deep Code Search with Meta Learning. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 487–498. https://doi.org/10.1145/3510003.3510125
  9. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, 12th International Conference, FASE 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings (Lecture Notes in Computer Science, Vol. 5503), Marsha Chechik and Martin Wirsing (Eds.). Springer, 385–400. https://doi.org/10.1007/978-3-642-00593-0_26
  10. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021). arXiv:2107.03374 https://arxiv.org/abs/2107.03374
  11. Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 826–831. https://doi.org/10.1145/3238147.3240471
  12. Yi Cheng and Li Kuang. 2022. CSRS: code search with relevance matching and semantic matching. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17, 2022, Ayushi Rastogi, Rosalia Tufano, Gabriele Bavota, Venera Arnaoudova, and Sonia Haiduc (Eds.). ACM, 533–542. https://doi.org/10.1145/3524610.3527889
  13. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423
  14. Retriever and Ranker Framework with Probabilistic Hard Negative Sampling for Code Search. CoRR abs/2305.04508 (2023). https://doi.org/10.48550/arXiv.2305.04508 arXiv:2305.04508
  15. Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search. In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, Gianluca Demartini, Guido Zuccon, J. Shane Culpepper, Zi Huang, and Hanghang Tong (Eds.). ACM, 2994–2998. https://doi.org/10.1145/3459637.3482127
  16. Zachary Eberhart and Collin McMillan. 2022. Generating Clarifying Questions for Query Refinement in Source Code Search. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022. IEEE, 140–151. https://doi.org/10.1109/SANER53432.2022.00028
  17. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
  18. Some from here, some from there: cross-project code reuse in GitHub. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017, Jesús M. González-Barahona, Abram Hindle, and Lin Tan (Eds.). IEEE Computer Society, 291–301. https://doi.org/10.1109/MSR.2017.15
  19. Luca Di Grazia and Michael Pradel. 2023. Code Search: A Survey of Techniques for Finding Code. ACM Comput. Surv. 55, 11 (2023), 220:1–220:31. https://doi.org/10.1145/3565971
  20. Multimodal Representation for Neural Code Search. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2021, Luxembourg, September 27 - October 1, 2021. IEEE, 483–494. https://doi.org/10.1109/ICSME52107.2021.00049
  21. CRaDLe: Deep code retrieval based on semantic Dependency Learning. Neural Networks 141 (2021), 385–394. https://doi.org/10.1016/j.neunet.2021.04.019
  22. Accelerating Code Search with Deep Hashing and Code Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 2534–2544. https://doi.org/10.18653/v1/2022.acl-long.181
  23. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 933–944. https://doi.org/10.1145/3180155.3180167
  24. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 7212–7225. https://doi.org/10.18653/v1/2022.acl-long.499
  25. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ
  26. A Multi-Perspective Architecture for Semantic Code Search. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8563–8568. https://doi.org/10.18653/v1/2020.acl-main.758
  27. Towards Compositional Generalization in Code Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 10743–10750. https://aclanthology.org/2022.emnlp-main.737
  28. Geert Heyman and Tom Van Cutsem. 2020. Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent. CoRR abs/2008.12193 (2020). arXiv:2008.12193 https://arxiv.org/abs/2008.12193
  29. NL-based query refinement and contextualized code search results: A user study. In 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014, Antwerp, Belgium, February 3-6, 2014, Serge Demeyer, Dave W. Binkley, and Filippo Ricca (Eds.). IEEE Computer Society, 34–43. https://doi.org/10.1109/CSMR-WCRE.2014.6747190
  30. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131. https://doi.org/10.1145/2902362
  31. Revisiting Code Search in a Two-Stage Paradigm. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023, Tat-Seng Chua, Hady W. Lauw, Luo Si, Evimaria Terzi, and Panayiotis Tsaparas (Eds.). ACM, 994–1002. https://doi.org/10.1145/3539597.3570383
  32. Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 3 (2020), 2179–2217. https://doi.org/10.1007/s10664-019-09730-9
  33. CoSQA: 20, 000+ Web Queries for Code Search and Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 5690–5700. https://doi.org/10.18653/v1/2021.acl-long.442
  34. Deep learning the semantics of change sequences for query expansion. Softw. Pract. Exp. 49, 11 (2019), 1600–1617. https://doi.org/10.1002/spe.2736
  35. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436 (2019). arXiv:1909.09436 http://arxiv.org/abs/1909.09436
  36. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics. https://doi.org/10.18653/v1/p16-1195
  37. Learning and Evaluating Contextual Embedding of Source Code. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 5110–5121. http://proceedings.mlr.press/v119/kanade20a.html
  38. Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code?. In 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 1332–1336. https://doi.org/10.1109/ASE51524.2021.9678927
  39. Staffs Keele. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical Report. Ver. 2.3 EBSE Technical Report.
  40. Muhammad Khalifa. 2019. Semantic Source Code Search: A Study of the Past and a Glimpse at the Future. CoRR abs/1908.06738 (2019). arXiv:1908.06738 http://arxiv.org/abs/1908.06738 Withdrawn..
  41. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 14967–14979. https://proceedings.neurips.cc/paper/2021/hash/7d6548bdc0082aacc950ed35e91fcccb-Abstract.html
  42. Neural Code Search Evaluation Dataset. CoRR abs/1908.09804 (2019). arXiv:1908.09804 http://arxiv.org/abs/1908.09804
  43. Exploring Representation-level Augmentation for Code Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 4924–4936. https://aclanthology.org/2022.emnlp-main.327
  44. CodeRetriever: Unimodal and Bimodal Contrastive Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022. Association for Computational Linguistics.
  45. Soft-Labeled Contrastive Pre-Training for Function-Level Code Representation. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 118–129. https://aclanthology.org/2022.findings-emnlp.9
  46. Semantic-Preserving Adversarial Code Comprehension. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (Eds.). International Committee on Computational Linguistics, 3017–3028. https://aclanthology.org/2022.coling-1.267
  47. Automating code review activities by large-scale pre-training. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1035–1047. https://doi.org/10.1145/3540250.3549081
  48. Adaptive Deep Code Search. In ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 48–59. https://doi.org/10.1145/3387904.3389278
  49. Deep Graph Matching and Searching for Semantic Code Retrieval. ACM Trans. Knowl. Discov. Data 15, 5 (2021), 88:1–88:21. https://doi.org/10.1145/3447571
  50. Opportunities and Challenges in Code Search Tools. ACM Comput. Surv. 54, 9 (2022), 196:1–196:40. https://doi.org/10.1145/3480027
  51. Neural query expansion for code search. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2019, Phoenix, AZ, USA, June 22, 2019, Tim Mattson, Abdullah Muzahid, and Armando Solar-Lezama (Eds.). ACM, 29–37. https://doi.org/10.1145/3315508.3329975
  52. GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search. IEEE Trans. Software Eng. 49, 4 (2023), 2839–2855. https://doi.org/10.1109/TSE.2022.3233901
  53. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
  54. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html
  55. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015, Myra B. Cohen, Lars Grunske, and Michael Whalen (Eds.). IEEE Computer Society, 260–270. https://doi.org/10.1109/ASE.2015.42
  56. MulCS: Towards a Unified Deep Representation for Multilingual Code Search. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, Tao Zhang, Xin Xia, and Nicole Novielli (Eds.). IEEE, 120–131. https://doi.org/10.1109/SANER56733.2023.00021
  57. Vadim Markovtsev and Waren Long. 2018. Public git archive: a big code dataset for all. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, Gothenburg, Sweden, May 28-29, 2018, Andy Zaidman, Yasutaka Kamei, and Emily Hill (Eds.). ACM, 34–37. https://doi.org/10.1145/3196398.3196464
  58. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011, Richard N. Taylor, Harald C. Gall, and Nenad Medvidovic (Eds.). ACM, 111–120. https://doi.org/10.1145/1985793.1985809
  59. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 1287–1293. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11775
  60. Query Expansion Based on Crowd Knowledge for Code Search. IEEE Trans. Serv. Comput. 9, 5 (2016), 771–783. https://doi.org/10.1109/TSC.2016.2560165
  61. SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 1–13. https://doi.org/10.1145/3510003.3510096
  62. OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774 arXiv:2303.08774
  63. Contrastive Learning with Keyword-based Data Augmentation for Code Search and Code Question Answering. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, Andreas Vlachos and Isabelle Augenstein (Eds.). Association for Computational Linguistics, 3591–3601. https://aclanthology.org/2023.eacl-main.262
  64. Guidelines for conducting systematic mapping studies in software engineering: An update. Inf. Softw. Technol. 64 (2015), 1–18. https://doi.org/10.1016/j.infsof.2015.03.007
  65. Mohammad Masudur Rahman. 2019. Supporting code search with context-aware, analytics-driven, effective query reformulation. In Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 226–229. https://doi.org/10.1109/ICSE-Companion.2019.00088
  66. Mohammad Masudur Rahman and Chanchal K. Roy. 2018. Effective Reformulation of Query for Code Search Using Crowdsourced Knowledge and Extra-Large Data Analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. IEEE Computer Society, 473–484. https://doi.org/10.1109/ICSME.2018.00057
  67. Automatic query reformulation for code search using crowdsourced knowledge. Empir. Softw. Eng. 24, 4 (2019), 1869–1924. https://doi.org/10.1007/s10664-018-9671-0
  68. Retrieval on source code: a neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, Justin Gottschlich and Alvin Cheung (Eds.). ACM, 31–41. https://doi.org/10.1145/3211346.3211353
  69. On the Effectiveness of Transfer Learning for Code Search. IEEE Trans. Software Eng. 49, 4 (2023), 1804–1822. https://doi.org/10.1109/TSE.2022.3192755
  70. How to better utilize code graphs in semantic code search?. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 722–733. https://doi.org/10.1145/3540250.3549087
  71. Improving Code Search with Co-Attentive Representation Learning. In ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 196–207. https://doi.org/10.1145/3387904.3389269
  72. Augmenting and structuring user queries to support efficient free-form code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 945. https://doi.org/10.1145/3180155.3182513
  73. Active inductive logic programming for code search. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 292–303. https://doi.org/10.1109/ICSE.2019.00044
  74. Solving the Search for Source Code. ACM Trans. Softw. Eng. Methodol. 23, 3 (2014), 26:1–26:45. https://doi.org/10.1145/2581377
  75. Code Search based on Context-aware Code Translation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 388–400. https://doi.org/10.1145/3510003.3510140
  76. On the Importance of Building High-quality Training Datasets for Neural Code Search. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 1609–1620. https://doi.org/10.1145/3510003.3510160
  77. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 3104–3112. https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html
  78. Multi-modal Attention Network Learning for Semantic Source Code Retrieval. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 13–25. https://doi.org/10.1109/ASE.2019.00012
  79. You see what I want you to see: poisoning vulnerabilities in neural code search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1233–1245. https://doi.org/10.1145/3540250.3549153
  80. What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2377–2388. https://doi.org/10.1145/3510003.3510050
  81. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 397–407. https://doi.org/10.1145/3238147.3238206
  82. Enriching query semantics for code search with reinforcement learning. Neural Networks 145 (2022), 22–32. https://doi.org/10.1016/j.neunet.2021.09.025
  83. Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 287–298. https://doi.org/10.1145/3510003.3510062
  84. Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention. IEEE Trans. Software Eng. 48, 2 (2022), 102–119. https://doi.org/10.1109/TSE.2020.2979701
  85. TranS^3: A Transformer-based Framework for Unifying Code Summarization and Code Search. CoRR abs/2003.03238 (2020). arXiv:2003.03238 https://arxiv.org/abs/2003.03238
  86. Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556 (2021).
  87. CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 1066–1077. https://doi.org/10.18653/v1/2022.findings-naacl.80
  88. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708. https://doi.org/10.18653/v1/2021.emnlp-main.685
  89. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Trans. Softw. Eng. Methodol. 31, 2 (2022), 32:1–32:58. https://doi.org/10.1145/3485275
  90. Huaiguang Wu and Yang Yang. 2019. Code Search Based on Alteration Intent. IEEE Access 7 (2019), 56796–56802. https://doi.org/10.1109/ACCESS.2019.2913560
  91. Two-Stage Attention-Based Model for Code Search with Textual and Structural Features. In 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Honolulu, HI, USA, March 9-12, 2021. IEEE, 342–353. https://doi.org/10.1109/SANER50967.2021.00039
  92. Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries. In 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020, Kostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, and Minghui Zhou (Eds.). IEEE, 344–354. https://doi.org/10.1109/SANER48275.2020.9054840
  93. A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts. In 29th IEEE/ACM International Conference on Program Comprehension, ICPC 2021, Madrid, Spain, May 20-21, 2021. IEEE, 1–12. https://doi.org/10.1109/ICPC52881.2021.00010
  94. CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 2203–2214. https://doi.org/10.1145/3308558.3313632
  95. StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 1693–1703. https://doi.org/10.1145/3178876.3186081
  96. Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning. In WWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 2309–2319. https://doi.org/10.1145/3366423.3380295
  97. Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, Gothenburg, Sweden, May 28-29, 2018, Andy Zaidman, Yasutaka Kamei, and Emily Hill (Eds.). ACM, 476–486. https://doi.org/10.1145/3196398.3196408
  98. deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search. ACM Trans. Softw. Eng. Methodol. 32, 2 (2023), 34:1–34:27. https://doi.org/10.1145/3546066
  99. Expanding Queries for Code Search Using Semantically Related API Class-names. IEEE Transactions on Software Engineering 44, 11 (2018), 1070–1082. https://doi.org/10.1109/TSE.2017.2750682
  100. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 11328–11339. http://proceedings.mlr.press/v119/zhang20ae.html
  101. Jie Zhao and Huan Sun. 2020. Adversarial Training for Code Retrieval with Question-Description Relevance Regularization. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4049–4059. https://doi.org/10.18653/v1/2020.findings-emnlp.361
  102. Multilingual Code Snippets Training for Program Translation. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. AAAI Press, 11783–11790. https://ojs.aaai.org/index.php/AAAI/article/view/21434
  103. OCoR: An Overlapping-Aware Code Retriever. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, 883–894. https://doi.org/10.1145/3324884.3416530
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yutao Xie (10 papers)
  2. Jiayi Lin (14 papers)
  3. Hande Dong (9 papers)
  4. Lei Zhang (1689 papers)
  5. Zhonghai Wu (29 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.