Papers
Topics
Authors
Recent
2000 character limit reached

FedCSD: A Federated Learning Based Approach for Code-Smell Detection

Published 31 May 2023 in cs.SE, cs.AI, and cs.LG | (2306.00038v3)

Abstract: This paper proposes a Federated Learning Code Smell Detection (FedCSD) approach that allows organizations to collaboratively train federated ML models while preserving their data privacy. These assertions have been supported by three experiments that have significantly leveraged three manually validated datasets aimed at detecting and examining different code smell scenarios. In experiment 1, which was concerned with a centralized training experiment, dataset two achieved the lowest accuracy (92.30%) with fewer smells, while datasets one and three achieved the highest accuracy with a slight difference (98.90% and 99.5%, respectively). This was followed by experiment 2, which was concerned with cross-evaluation, where each ML model was trained using one dataset, which was then evaluated over the other two datasets. Results from this experiment show a significant drop in the model's accuracy (lowest accuracy: 63.80\%) where fewer smells exist in the training dataset, which has a noticeable reflection (technical debt) on the model's performance. Finally, the last and third experiments evaluate our approach by splitting the dataset into 10 companies. The ML model was trained on the company's site, then all model-updated weights were transferred to the server. Ultimately, an accuracy of 98.34% was achieved by the global model that has been trained using 10 companies for 100 training rounds. The results reveal a slight difference in the global model's accuracy compared to the highest accuracy of the centralized model, which can be ignored in favour of the global model's comprehensive knowledge, lower training cost, preservation of data privacy, and avoidance of the technical debt problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Refactoring: improving the design of existing code. Addison-Wesley Professional, 1999.
  2. Francisco Pérez. Refactoring Planning for Design Smell Correction in Object-Oriented Software. PhD thesis, School of Engineering, Valladolid University, 2011.
  3. Software smell detection techniques: A systematic literature review. Journal of Software: Evolution and Process, 33(3):e2320, 2021.
  4. Software Design Smell detection: a systematic mapping study. Software Quality Journal, 2018. ISSN 0963-9314. doi:10.1007/s11219-018-9424-8. URL http://dx.doi.org/10.1007/s11219-018-9424-8.
  5. AntiPatterns: refactoring software, architectures, and projects in crisis. John Wiley & Sons, Inc., 1998.
  6. Ward Cunningham. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 4(2):29–30, 1993.
  7. Detecting defects in object oriented designs using design metrics. In J. Conf. on Knowledge-Based Software Engineering, pages 61–72, 2006.
  8. A metric-based approach for anti-pattern detection in UML designs. Computer and Information Science, pages 17–33, 2011.
  9. iPlasma: An integrated platform for quality assessment of object-oriented design. In Intl. Conf. Software Maintenance - Industrial and Tool Volume, pages 77–80, 2005.
  10. DECOR: a tool for the detection of design defects. In Intl. Conf. on Automated Software Engineering, pages 527–528, 2007.
  11. Matthew James Munro. Product metrics for automatic identification of "bad smell" design problems in java source-code. In Intl. Conf. Software Metrics, pages 15–15, 2005.
  12. Raed Shatnawi. Deriving metrics thresholds using log transformation. J. Software: Evolution and Process, 27(2):95–113, 2015.
  13. Exploratory study of the impact of project domain and size category on the detection of the god class design smell. Software Quality Journal, 2021a. ISSN 0963-9314. doi:10.1007/s11219-021-09550-5.
  14. Ids: an immune-inspired approach for the detection of software design smells. In Intl. Conf. Quality of Information and Communications Technology, pages 343–348, 2010.
  15. BDTEX: A GQM-based Bayesian approach for the detection of antipatterns. J. Systems and Software, 84(4):559–572, 2011.
  16. Jochen Kreimer. Adaptive detection of design flaws. Electronic Notes in Theoretical Computer Science, 141(4):117–136, 2005.
  17. Bad-smell prediction from software design model using machine learning techniques. In Intl. J. Conf. on Computer Science and Software Engineering, pages 331–336, 2011.
  18. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software, 169:110693, 2020.
  19. Comparing heuristic and machine learning approaches for metric-based code smell detection. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pages 93–104, 2019.
  20. Bad smell detection using machine learning techniques: A systematic literature review. Arabian Journal for Science and Engineering, 45, 01 2020. doi:10.1007/s13369-019-04311-w.
  21. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology, 108:115–138, 4 2019.
  22. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21(3):1143–1191, 2016.
  23. Machine learning from theory to algorithms: An overview. Journal of Physics: Conference Series, 1142:012012, 11 2018. doi:10.1088/1742-6596/1142/1/012012.
  24. A study of dealing class imbalance problem with machine learning methods for code smell severity detection using pca-based feature selection technique. Scientific Reports, 13(1):16245, 2023.
  25. Code smell detection using ensemble machine learning algorithms. Applied Sciences, 12(20):10321, 2022.
  26. A novel approach for code smell detection: an empirical study. IEEE Access, 9:162869–162883, 2021.
  27. Big data analytics from the rich cloud to the frugal edge. In 2023 IEEE International Conference on Edge Computing and Communications (EDGE), pages 319–329. IEEE, 2023.
  28. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020a. doi:10.1109/MSP.2020.2975749.
  29. A survey on federated learning. Knowledge-Based Systems, 216:106775, 2021.
  30. A review of code smell mining techniques. J. Software: Evolution and Process, 27(11):867–895, 2015.
  31. How far are we from reproducible research on code smell detection? a systematic literature review. Information and Software Technology, 144:106783, 2022.
  32. Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media, 2007.
  33. Khalid Alkharabsheh. Improving design smell detection for adoption in industry. PhD thesis, CITIUS, Universidade de Santiago de Compostela, 2019.
  34. Khalid Alkharabsheh. An empirical study on the co-occurrence of design smells in the same software module:god class case study. In 2021 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pages 1–6, 2021. doi:10.1109/JEEIT53412.2021.9634144.
  35. Analysing agreement among different evaluators in god class and feature envy detection. IEEE Access, 9:145191–145211, 2021b. doi:10.1109/ACCESS.2021.3123123.
  36. Prioritization of god class design smell: A multi-criteria based approach. Journal of King Saud University - Computer and Information Sciences, 2022a. ISSN 1319-1578. doi:https://doi.org/10.1016/j.jksuci.2022.09.011. URL https://www.sciencedirect.com/science/article/pii/S1319157822003329.
  37. A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of god class. Information and Software Technology, 143:106736, 2022b.
  38. Comparación de herramientas de detección de design smells. In Jornadas de Ingeniería del Software y Bases de Datos, pages 159–172, 2016a.
  39. Sobre el grado de acuerdo entre evaluadores en la detección de design smells. In Jornadas de Ingeniería del Software y Bases de Datos, pages 143–157, 2016b.
  40. Size and frequency of class change from a refactoring perspective. In Int. Conf. on Software Evolvability, pages 23–28, 2007.
  41. Inter-smell relations in industrial and open source systems: A replication and comparative analysis. In Intl. Conf. on Software Maintenance and Evolution, pages 121–130, 2015.
  42. Support vector machines for anti-pattern detection. In Intl. Conf. Automated Software Engineering, pages 278–281, 2012.
  43. Towards detecting software performance anti-patterns using classification techniques. ACM SIGSOFT Software Engineering Notes, 39(1):1–4, 2014.
  44. An in-depth investigation of large-scale rdf relational schema optimizations using spark-sql. 2021.
  45. Big data resource management & networks: Taxonomy, survey, and future directions. IEEE Communications Surveys & Tutorials, 2021a.
  46. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020b.
  47. Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials, 2020.
  48. Trading private range counting over big iot data. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 144–153. IEEE, 2019.
  49. Realizing the heterogeneity: A self-organized federated learning framework for iot. IEEE Internet of Things Journal, 2020.
  50. Towards federated learning at scale: System design. arXiv preprint arXiv:1902.01046, 2019.
  51. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282. PMLR, 2017.
  52. Security by design for big data frameworks over cloud computing. IEEE Transactions on Engineering Management, 2021b.
  53. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM workshop on artificial intelligence and security, pages 1–11, 2019.
  54. Towards federated learning approach to determine data relevance in big data. In 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), pages 184–192. IEEE, 2019.
  55. A federated interactive learning iot-based health monitoring platform. In European Conference on Advances in Databases and Information Systems, pages 235–246. Springer, 2021.
  56. Efficient secure building blocks with application to privacy preserving machine learning algorithms. IEEE Access, 9:8324–8353, 2021.
  57. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  58. Active machine learning adversarial attack detection in the user feedback process. IEEE Access, 9:36908–36923, 2021.
  59. Leveraging federated learning & blockchain to counter adversarial attacks in incremental learning. In 10th International Conference on the Internet of Things Companion, pages 1–5, 2020.
  60. Handling non-iid data in federated learning: An experimental evaluation towards unified metrics. In 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pages 0762–0770. IEEE, 2023.
  61. An introduction to design science, volume 10. Springer, 2014.
  62. Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Engg., 14(2):131–164, 2009.
  63. Experimentation in Software Engineering. Springer, 2012. ISBN 978-3-642-29043-5.
  64. Feras M Awaysheh. From the cloud to the edge towards a distributed and light weight secure big data pipelines for iot applications. In Trust, Security and Privacy for Big Data, pages 50–68. CRC Press, 2022.
  65. Scalable federated machine learning with fedn. arXiv preprint arXiv:2103.00148, 2021.
  66. Assessing the influence of size category of the project in god class detection, an experimental approach based on machine learning. In Angelo Perkusich, editor, The 31st International Conference on Software Engineering and Knowledge Engineering, SEKE 2019, Hotel Tivoli, Lisbon, Portugal, July 10-12, 2019, pages 361–472. KSI Research Inc. and Knowledge Systems Institute Graduate School, 2019. doi:10.18293/SEKE2019-140. URL https://doi.org/10.18293/SEKE2019-140.
  67. Replication package of raw data, scripts and all necessary material for replication, 2021. URL https://drive.google.com/drive/folders/1_Q7i52QPb-MogNzW6vpePWSNkYyA1gKX?usp=sharing.
  68. Forecasting of covid-19 positive cases in indonesia using long short-term memory (lstm). Procedia Computer Science, 216:177–185, 2023.
  69. Snic science cloud (ssc): A national-scale cloud infrastructure for swedish academia. In 2017 IEEE 13th International Conference on e-Science (e-Science), pages 219–227, 2017. doi:10.1109/eScience.2017.35.
  70. Vulnerabilities in federated learning. IEEE Access, 9:63229–63249, 2021.
  71. Deep model poisoning attack on federated learning. Future Internet, 13(3):73, 2021.
  72. Data poisoning in sequential and parallel federated learning. In Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics, pages 24–34, 2022.
  73. Egia: An external gradient inversion attack in federated learning. IEEE Transactions on Information Forensics and Security, 2023.
  74. Subject membership inference attacks in federated learning. arXiv preprint arXiv:2206.03317, 2022.
  75. Beyond random noise: Insights on anonymization strategies from a latent bandit study. arXiv preprint arXiv:2310.00221, 2023.
Citations (7)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.