Survey of Code Search Based on Deep Learning (2305.05959v2)
Abstract: Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic similarity between the query and code. Deep learning, being able to extract complex semantics information, has achieved great success in this field. Recently, various deep learning methods, such as graph neural networks and pretraining models, have been applied to code search with significant progress. Deep learning is now the leading paradigm for code search. In this survey, we provide a comprehensive overview of deep learning-based code search. We review the existing deep learning-based code search framework which maps query/code to vectors and measures their similarity. Furthermore, we propose a new taxonomy to illustrate the state-of-the-art deep learning-based code search in a three-steps process: query semantics modeling, code semantics modeling, and matching modeling which involves the deep learning model training. Finally, we suggest potential avenues for future research in this promising field.
- code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3, POPL (2019), 40:1–40:29. https://doi.org/10.1145/3290353
- NS3: Neuro-symbolic Semantic Code Search. In NeurIPS. http://papers.nips.cc/paper_files/paper/2022/hash/43f5f6c5cb333115914c8448b8506411-Abstract-Conference.html
- Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017, Volume 2: Short Papers, Greg Kondrak and Taro Watanabe (Eds.). Asian Federation of Natural Language Processing, 314–319. https://aclanthology.org/I17-2053/
- Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations. In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 511–521. https://doi.org/10.1145/3404835.3462840
- CSSAM: Code Search via Attention Matching of Code Semantics and Structures. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, Tao Zhang, Xin Xia, and Nicole Novielli (Eds.). IEEE, 402–413. https://doi.org/10.1109/SANER56733.2023.00045
- When deep learning met code search. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 964–974. https://doi.org/10.1145/3338906.3340458
- Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 1273–1285. https://doi.org/10.1109/ICSE43902.2021.00116
- Cross-Domain Deep Code Search with Meta Learning. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 487–498. https://doi.org/10.1145/3510003.3510125
- SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, 12th International Conference, FASE 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings (Lecture Notes in Computer Science, Vol. 5503), Marsha Chechik and Martin Wirsing (Eds.). Springer, 385–400. https://doi.org/10.1007/978-3-642-00593-0_26
- Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021). arXiv:2107.03374 https://arxiv.org/abs/2107.03374
- Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 826–831. https://doi.org/10.1145/3238147.3240471
- Yi Cheng and Li Kuang. 2022. CSRS: code search with relevance matching and semantic matching. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17, 2022, Ayushi Rastogi, Rosalia Tufano, Gabriele Bavota, Venera Arnaoudova, and Sonia Haiduc (Eds.). ACM, 533–542. https://doi.org/10.1145/3524610.3527889
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423
- Retriever and Ranker Framework with Probabilistic Hard Negative Sampling for Code Search. CoRR abs/2305.04508 (2023). https://doi.org/10.48550/arXiv.2305.04508 arXiv:2305.04508
- Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search. In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, Gianluca Demartini, Guido Zuccon, J. Shane Culpepper, Zi Huang, and Hanghang Tong (Eds.). ACM, 2994–2998. https://doi.org/10.1145/3459637.3482127
- Zachary Eberhart and Collin McMillan. 2022. Generating Clarifying Questions for Query Refinement in Source Code Search. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022. IEEE, 140–151. https://doi.org/10.1109/SANER53432.2022.00028
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
- Some from here, some from there: cross-project code reuse in GitHub. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017, Jesús M. González-Barahona, Abram Hindle, and Lin Tan (Eds.). IEEE Computer Society, 291–301. https://doi.org/10.1109/MSR.2017.15
- Luca Di Grazia and Michael Pradel. 2023. Code Search: A Survey of Techniques for Finding Code. ACM Comput. Surv. 55, 11 (2023), 220:1–220:31. https://doi.org/10.1145/3565971
- Multimodal Representation for Neural Code Search. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2021, Luxembourg, September 27 - October 1, 2021. IEEE, 483–494. https://doi.org/10.1109/ICSME52107.2021.00049
- CRaDLe: Deep code retrieval based on semantic Dependency Learning. Neural Networks 141 (2021), 385–394. https://doi.org/10.1016/j.neunet.2021.04.019
- Accelerating Code Search with Deep Hashing and Code Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 2534–2544. https://doi.org/10.18653/v1/2022.acl-long.181
- Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 933–944. https://doi.org/10.1145/3180155.3180167
- UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 7212–7225. https://doi.org/10.18653/v1/2022.acl-long.499
- GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ
- A Multi-Perspective Architecture for Semantic Code Search. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8563–8568. https://doi.org/10.18653/v1/2020.acl-main.758
- Towards Compositional Generalization in Code Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 10743–10750. https://aclanthology.org/2022.emnlp-main.737
- Geert Heyman and Tom Van Cutsem. 2020. Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent. CoRR abs/2008.12193 (2020). arXiv:2008.12193 https://arxiv.org/abs/2008.12193
- NL-based query refinement and contextualized code search results: A user study. In 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014, Antwerp, Belgium, February 3-6, 2014, Serge Demeyer, Dave W. Binkley, and Filippo Ricca (Eds.). IEEE Computer Society, 34–43. https://doi.org/10.1109/CSMR-WCRE.2014.6747190
- On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131. https://doi.org/10.1145/2902362
- Revisiting Code Search in a Two-Stage Paradigm. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023, Tat-Seng Chua, Hady W. Lauw, Luo Si, Evimaria Terzi, and Panayiotis Tsaparas (Eds.). ACM, 994–1002. https://doi.org/10.1145/3539597.3570383
- Deep code comment generation with hybrid lexical and syntactical information. Empir. Softw. Eng. 25, 3 (2020), 2179–2217. https://doi.org/10.1007/s10664-019-09730-9
- CoSQA: 20, 000+ Web Queries for Code Search and Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 5690–5700. https://doi.org/10.18653/v1/2021.acl-long.442
- Deep learning the semantics of change sequences for query expansion. Softw. Pract. Exp. 49, 11 (2019), 1600–1617. https://doi.org/10.1002/spe.2736
- CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436 (2019). arXiv:1909.09436 http://arxiv.org/abs/1909.09436
- Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics. https://doi.org/10.18653/v1/p16-1195
- Learning and Evaluating Contextual Embedding of Source Code. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 5110–5121. http://proceedings.mlr.press/v119/kanade20a.html
- Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code?. In 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 1332–1336. https://doi.org/10.1109/ASE51524.2021.9678927
- Staffs Keele. 2007. Guidelines for performing systematic literature reviews in software engineering. Technical Report. Ver. 2.3 EBSE Technical Report.
- Muhammad Khalifa. 2019. Semantic Source Code Search: A Study of the Past and a Glimpse at the Future. CoRR abs/1908.06738 (2019). arXiv:1908.06738 http://arxiv.org/abs/1908.06738 Withdrawn..
- DOBF: A Deobfuscation Pre-Training Objective for Programming Languages. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 14967–14979. https://proceedings.neurips.cc/paper/2021/hash/7d6548bdc0082aacc950ed35e91fcccb-Abstract.html
- Neural Code Search Evaluation Dataset. CoRR abs/1908.09804 (2019). arXiv:1908.09804 http://arxiv.org/abs/1908.09804
- Exploring Representation-level Augmentation for Code Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 4924–4936. https://aclanthology.org/2022.emnlp-main.327
- CodeRetriever: Unimodal and Bimodal Contrastive Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022. Association for Computational Linguistics.
- Soft-Labeled Contrastive Pre-Training for Function-Level Code Representation. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 118–129. https://aclanthology.org/2022.findings-emnlp.9
- Semantic-Preserving Adversarial Code Comprehension. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (Eds.). International Committee on Computational Linguistics, 3017–3028. https://aclanthology.org/2022.coling-1.267
- Automating code review activities by large-scale pre-training. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1035–1047. https://doi.org/10.1145/3540250.3549081
- Adaptive Deep Code Search. In ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 48–59. https://doi.org/10.1145/3387904.3389278
- Deep Graph Matching and Searching for Semantic Code Retrieval. ACM Trans. Knowl. Discov. Data 15, 5 (2021), 88:1–88:21. https://doi.org/10.1145/3447571
- Opportunities and Challenges in Code Search Tools. ACM Comput. Surv. 54, 9 (2022), 196:1–196:40. https://doi.org/10.1145/3480027
- Neural query expansion for code search. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2019, Phoenix, AZ, USA, June 22, 2019, Tim Mattson, Abdullah Muzahid, and Armando Solar-Lezama (Eds.). ACM, 29–37. https://doi.org/10.1145/3315508.3329975
- GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search. IEEE Trans. Software Eng. 49, 4 (2023), 2839–2855. https://doi.org/10.1109/TSE.2022.3233901
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
- CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html
- CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015, Myra B. Cohen, Lars Grunske, and Michael Whalen (Eds.). IEEE Computer Society, 260–270. https://doi.org/10.1109/ASE.2015.42
- MulCS: Towards a Unified Deep Representation for Multilingual Code Search. In IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, Tao Zhang, Xin Xia, and Nicole Novielli (Eds.). IEEE, 120–131. https://doi.org/10.1109/SANER56733.2023.00021
- Vadim Markovtsev and Waren Long. 2018. Public git archive: a big code dataset for all. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, Gothenburg, Sweden, May 28-29, 2018, Andy Zaidman, Yasutaka Kamei, and Emily Hill (Eds.). ACM, 34–37. https://doi.org/10.1145/3196398.3196464
- Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011, Richard N. Taylor, Harald C. Gall, and Nenad Medvidovic (Eds.). ACM, 111–120. https://doi.org/10.1145/1985793.1985809
- Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 1287–1293. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11775
- Query Expansion Based on Crowd Knowledge for Code Search. IEEE Trans. Serv. Comput. 9, 5 (2016), 771–783. https://doi.org/10.1109/TSC.2016.2560165
- SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 1–13. https://doi.org/10.1145/3510003.3510096
- OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774 arXiv:2303.08774
- Contrastive Learning with Keyword-based Data Augmentation for Code Search and Code Question Answering. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, Andreas Vlachos and Isabelle Augenstein (Eds.). Association for Computational Linguistics, 3591–3601. https://aclanthology.org/2023.eacl-main.262
- Guidelines for conducting systematic mapping studies in software engineering: An update. Inf. Softw. Technol. 64 (2015), 1–18. https://doi.org/10.1016/j.infsof.2015.03.007
- Mohammad Masudur Rahman. 2019. Supporting code search with context-aware, analytics-driven, effective query reformulation. In Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 226–229. https://doi.org/10.1109/ICSE-Companion.2019.00088
- Mohammad Masudur Rahman and Chanchal K. Roy. 2018. Effective Reformulation of Query for Code Search Using Crowdsourced Knowledge and Extra-Large Data Analytics. In 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018. IEEE Computer Society, 473–484. https://doi.org/10.1109/ICSME.2018.00057
- Automatic query reformulation for code search using crowdsourced knowledge. Empir. Softw. Eng. 24, 4 (2019), 1869–1924. https://doi.org/10.1007/s10664-018-9671-0
- Retrieval on source code: a neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, Justin Gottschlich and Alvin Cheung (Eds.). ACM, 31–41. https://doi.org/10.1145/3211346.3211353
- On the Effectiveness of Transfer Learning for Code Search. IEEE Trans. Software Eng. 49, 4 (2023), 1804–1822. https://doi.org/10.1109/TSE.2022.3192755
- How to better utilize code graphs in semantic code search?. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 722–733. https://doi.org/10.1145/3540250.3549087
- Improving Code Search with Co-Attentive Representation Learning. In ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 196–207. https://doi.org/10.1145/3387904.3389269
- Augmenting and structuring user queries to support efficient free-form code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 945. https://doi.org/10.1145/3180155.3182513
- Active inductive logic programming for code search. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 292–303. https://doi.org/10.1109/ICSE.2019.00044
- Solving the Search for Source Code. ACM Trans. Softw. Eng. Methodol. 23, 3 (2014), 26:1–26:45. https://doi.org/10.1145/2581377
- Code Search based on Context-aware Code Translation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 388–400. https://doi.org/10.1145/3510003.3510140
- On the Importance of Building High-quality Training Datasets for Neural Code Search. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 1609–1620. https://doi.org/10.1145/3510003.3510160
- Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 3104–3112. https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html
- Multi-modal Attention Network Learning for Semantic Source Code Retrieval. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 13–25. https://doi.org/10.1109/ASE.2019.00012
- You see what I want you to see: poisoning vulnerabilities in neural code search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1233–1245. https://doi.org/10.1145/3540250.3549153
- What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2377–2388. https://doi.org/10.1145/3510003.3510050
- Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 397–407. https://doi.org/10.1145/3238147.3238206
- Enriching query semantics for code search with reinforcement learning. Neural Networks 145 (2022), 22–32. https://doi.org/10.1016/j.neunet.2021.09.025
- Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 287–298. https://doi.org/10.1145/3510003.3510062
- Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention. IEEE Trans. Software Eng. 48, 2 (2022), 102–119. https://doi.org/10.1109/TSE.2020.2979701
- TranS^3: A Transformer-based Framework for Unifying Code Summarization and Code Search. CoRR abs/2003.03238 (2020). arXiv:2003.03238 https://arxiv.org/abs/2003.03238
- Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556 (2021).
- CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 1066–1077. https://doi.org/10.18653/v1/2022.findings-naacl.80
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708. https://doi.org/10.18653/v1/2021.emnlp-main.685
- A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Trans. Softw. Eng. Methodol. 31, 2 (2022), 32:1–32:58. https://doi.org/10.1145/3485275
- Huaiguang Wu and Yang Yang. 2019. Code Search Based on Alteration Intent. IEEE Access 7 (2019), 56796–56802. https://doi.org/10.1109/ACCESS.2019.2913560
- Two-Stage Attention-Based Model for Code Search with Textual and Structural Features. In 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Honolulu, HI, USA, March 9-12, 2021. IEEE, 342–353. https://doi.org/10.1109/SANER50967.2021.00039
- Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries. In 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020, Kostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, and Minghui Zhou (Eds.). IEEE, 344–354. https://doi.org/10.1109/SANER48275.2020.9054840
- A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts. In 29th IEEE/ACM International Conference on Program Comprehension, ICPC 2021, Madrid, Spain, May 20-21, 2021. IEEE, 1–12. https://doi.org/10.1109/ICPC52881.2021.00010
- CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 2203–2214. https://doi.org/10.1145/3308558.3313632
- StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 1693–1703. https://doi.org/10.1145/3178876.3186081
- Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning. In WWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 2309–2319. https://doi.org/10.1145/3366423.3380295
- Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, Gothenburg, Sweden, May 28-29, 2018, Andy Zaidman, Yasutaka Kamei, and Emily Hill (Eds.). ACM, 476–486. https://doi.org/10.1145/3196398.3196408
- deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search. ACM Trans. Softw. Eng. Methodol. 32, 2 (2023), 34:1–34:27. https://doi.org/10.1145/3546066
- Expanding Queries for Code Search Using Semantically Related API Class-names. IEEE Transactions on Software Engineering 44, 11 (2018), 1070–1082. https://doi.org/10.1109/TSE.2017.2750682
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 11328–11339. http://proceedings.mlr.press/v119/zhang20ae.html
- Jie Zhao and Huan Sun. 2020. Adversarial Training for Code Retrieval with Question-Description Relevance Regularization. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4049–4059. https://doi.org/10.18653/v1/2020.findings-emnlp.361
- Multilingual Code Snippets Training for Program Translation. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. AAAI Press, 11783–11790. https://ojs.aaai.org/index.php/AAAI/article/view/21434
- OCoR: An Overlapping-Aware Code Retriever. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, 883–894. https://doi.org/10.1145/3324884.3416530
- Yutao Xie (10 papers)
- Jiayi Lin (14 papers)
- Hande Dong (9 papers)
- Lei Zhang (1689 papers)
- Zhonghai Wu (29 papers)