Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Test Case Recommendations with Distributed Representation of Code Syntactic Features (2310.03174v1)

Published 4 Oct 2023 in cs.LG and cs.SE

Abstract: Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. M. Mirzaaghaei, F. Pastore, and M. Pezze, “Supporting test suite evolution through test case adaptation,” in 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.   IEEE, 2012, pp. 231–240.
  2. G. Fraser and A. Arcuri, “Evosuite: automatic test suite generation for object-oriented software,” in Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, 2011, pp. 416–419.
  3. C. Pacheco and M. D. Ernst, “Randoop: feedback-directed random testing for java,” in Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, 2007, pp. 815–816.
  4. G. Fraser, M. Staats, P. McMinn, A. Arcuri, and F. Padberg, “Does automated unit test generation really help software testers? a controlled empirical study,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 24, no. 4, pp. 1–49, 2015.
  5. S. Shamshiri, R. Just, J. M. Rojas, G. Fraser, P. McMinn, and A. Arcuri, “Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t),” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2015, pp. 201–211.
  6. D. DeFreez, A. V. Thakur, and C. Rubio-González, “Path-based function embedding and its application to specification mining,” arXiv preprint arXiv:1802.07779, 2018.
  7. M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, “Suggesting accurate method and class names,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015, pp. 38–49.
  8. N. D. Bui, Y. Yu, and L. Jiang, “Treecaps: Tree-based capsule networks for source code processing,” in Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021.
  9. U. Alon, M. Zilberstein, O. Levy, and E. Yahav, “code2vec: Learning distributed representations of code,” Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–29, 2019.
  10. M. Allamanis, “The adverse effects of code duplication in machine learning models of code,” in Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, 2019, pp. 143–153.
  11. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
  12. K. Kavitha, S. L. Kumar, P. Pravalika, K. Sruthi, R. Lalitha, and N. K. Rao, “Fashion compatibility using convolutional neural networks,” Materials Today: Proceedings, 2020.
  13. B. RamyaSree, B. Ramakrishna, M. Harshitha, A. Kavya, P. Reshvanth, and N. K. Rao, “Code component retrieval using code2vec,” in 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC).   IEEE, 2021, pp. 1044–1048.
  14. E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,” IEEE transactions on software engineering, vol. 41, no. 5, pp. 507–525, 2014.
  15. I. Bluemke and A. Malanowska, “Software testing effort estimation and related problems: A systematic literature review,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–38, 2021.
  16. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020.
  17. M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, and N. Sundaresan, “Unit test case generation with transformers and focal context,” 2020.
  18. Z. Chen and M. Monperrus, “A literature study of embeddings on source code,” arXiv preprint arXiv:1904.03061, 2019.
  19. V. Efstathiou and D. Spinellis, “Semantic source code models using identifier embeddings,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).   IEEE, 2019, pp. 29–33.
  20. A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, “Learning and evaluating contextual embedding of source code,” in International Conference on Machine Learning.   PMLR, 2020, pp. 5110–5121.
  21. X. Gu, H. Zhang, and S. Kim, “Deep code search,” in 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).   IEEE, 2018, pp. 933–944.
  22. D. Coimbra, S. Reis, R. Abreu, C. Păsăreanu, and H. Erdogmus, “On using distributed representations of source code for the detection of c security vulnerabilities,” arXiv preprint arXiv:2106.01367, 2021.
  23. A. R. Lahitani, A. E. Permanasari, and N. A. Setiawan, “Cosine similarity to determine similarity measure: Study case in online essay assessment,” in 2016 4th International Conference on Cyber and IT Service Management, 2016, pp. 1–6.
  24. V. I. Levenshtein et al., “Binary codes capable of correcting deletions, insertions, and reversals,” in Soviet physics doklady, vol. 10, no. 8.   Soviet Union, 1966, pp. 707–710.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mosab Rezaei (2 papers)
  2. Hamed Alhoori (32 papers)
  3. Mona Rahimi (2 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.