Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detecting Security-Relevant Methods using Multi-label Machine Learning (2403.07501v1)

Published 12 Mar 2024 in cs.LG

Abstract: To detect security vulnerabilities, static analysis tools need to be configured with security-relevant methods. Current approaches can automatically identify such methods using binary relevance machine learning approaches. However, they ignore dependencies among security-relevant methods, over-generalize and perform poorly in practice. Additionally, users have to nevertheless manually configure static analysis tools using the detected methods. Based on feedback from users and our observations, the excessive manual steps can often be tedious, error-prone and counter-intuitive. In this paper, we present Dev-Assist, an IntelliJ IDEA plugin that detects security-relevant methods using a multi-label machine learning approach that considers dependencies among labels. The plugin can automatically generate configurations for static analysis tools, run the static analysis, and show the results in IntelliJ IDEA. Our experiments reveal that Dev-Assist's machine learning approach has a higher F1-Measure than related approaches. Moreover, the plugin reduces and simplifies the manual effort required when configuring and using static analysis tools.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. “Dos and Don’ts of Machine Learning in Computer Security” 46.23.01; LK 01 In Proc. of the USENIX Security Symposium 2022, 2022
  2. Philippe Arteau, David Formánek and Tomáš Polešovský “Find-sec-bugs Resources”, https://github.com/find-sec-bugs/find-sec-bugs/tree/master/findsecbugs-plugin/src/main/resources/injection-sinks, 2020
  3. Steven Arzt, Siegfried Rasthofer and Eric Bodden “SuSi: A Tool for the Fully Automated Classification and Categorization of Android Sources and Sinks” In Network and Distributed System Security Symposium 2013, NDSS’13, 2013
  4. Edited Michael C.Fanning and Laurence J. Golding “Static Analysis Results Interchange Format (SARIF) Version 2.1.0 Plus Errata 01” In 2019 International Engineering Conference (IEC), 2023 URL: https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html.%20Latest%20stage:%20https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html.
  5. The MITRE Corporation “CWE Top 25 Most Dangerous Software Weaknesses” Accessed on November 14, 2023, 2023 URL: https://cwe.mitre.org/top25/
  6. Hackerone “Hacker-Powered Security Report: Industry Insights” Accessed on November 14, 2023, 2022 URL: https://www.hackerone.com/reports/6th-annual-hacker-powered-security-report
  7. Bart Jacobs and Coen De Roover “Summer School on Security Testing and Verification”, 2022 URL: https://cybersecurity-research.be/summer-school-security-testing-and-verification-2022
  8. JetBrains “About Qodana”, 2023 URL: https://www.jetbrains.com/help/qodana/about-qodana.html
  9. JetBrains “Actions”, 2023 URL: https://plugins.jetbrains.com/docs/intellij/basic-action-system.html
  10. JetBrains “Problems tool window”, 2023 URL: https://www.jetbrains.com/help/idea/problems-tool-window.htmll
  11. JetBrains “Program Structure Interface (PSI)”, 2022 URL: https://plugins.jetbrains.com/docs/intellij/psi.html
  12. JetBrains “Statistics: Product Versions in Use”, 2023 URL: https://plugins.jetbrains.com/docs/marketplace/product-versions-in-use-statistics.html
  13. Bartosz Krawczyk “Learning from imbalanced data: open challenges and future directions” In Progress in Artificial Intelligence 5.4 Springer, 2016, pp. 221–232
  14. “The IntelliJ Platform: A Framework for Building Plugins and Mining Software Data” In 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), 2021, pp. 14–17 DOI: 10.1109/ASEW52652.2021.00016
  15. “The Soot framework for Java program analysis: a retrospective” In Cetus Users and Compiler Infrastructure Workshop (CETUS 2011), 2011
  16. Niels Landwehr, Mark Hall and Eibe Frank “Logistic model trees” In Machine learning 59 Springer, 2005, pp. 161–205
  17. OWASP “Andoid 13” Online; accessed December 2023, https://developer.android.com/about/versions/13, 2023
  18. OWASP “WebGoat” Online; accessed January 2020, https://github.com/WebGoat/WebGoat, 2020
  19. Goran Piskachev, Lisa Nguyen Quang Do and Eric Bodden “Codebase-Adaptive Detection of Security-Relevant Methods” In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019 Beijing, China: Association for Computing Machinery, 2019, pp. 181–191
  20. “SWANAssist: Semi-Automated Detection of Code-Specific, Security-Relevant Methods” In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE ’19 San Diego, California: IEEE Press, 2020, pp. 1094–1097 DOI: 10.1109/ASE.2019.00110
  21. Goran Piskachev, Ranjith Krishnamurthy and Eric Bodden “SecuCheck: Engineering configurable taint analysis for software developers” In 2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM), 2021, pp. 24–29 DOI: 10.1109/SCAM52516.2021.00012
  22. “Fluently Specifying Taint-Flow Queries with FluentTQL” In Empirical Softw. Engg. 27.5 USA: Kluwer Academic Publishers, 2022 DOI: 10.1007/s10664-022-10165-y
  23. Wisam A. Qader, Musa M. Ameen and Bilal I. Ahmed “An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges” In 2019 International Engineering Conference (IEC), 2019, pp. 200–204 DOI: 10.1109/IEC47844.2019.8950616
  24. Jesse Read, Bernhard Pfahringer and Geoff Holmes “Multi-label Classification Using Ensembles of Pruned Sets” In 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 995–1000 DOI: 10.1109/ICDM.2008.74
  25. “MEKA: A Multi-label/Multi-target Extension to Weka” In Journal of Machine Learning Research 17.21, 2016, pp. 1–5 URL: http://jmlr.org/papers/v17/12-164.html
  26. L Sampaio “Which methods should be considered “Sources”, “Sinks” or “Sanitization”?” Accessed 05.03.2020, https://thecodemaster.net/methods-considered-sources-sinks-sanitization/, 2014
  27. Darius Sas, Marco Bessi and Francesca A. Fontana “Automatic Detection of Sources and Sinks in Arbitrary Java Libraries” In 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2018, pp. 103–112
  28. “Multi-label classification: An overview” In International Journal of Data Warehousing and Mining (IJDWM) 3.3 IGI Global, 2007, pp. 1–13
  29. “OWASP code review guide v1. 1” In The OWASP Foundation Guidelines, 2008
  30. Marcel Wever, Felix Mohr and Eyke Hüllermeier “Automated Multi-Label Classification based on ML-Plan”, 2018 arXiv:1811.04060 [cs.LG]
  31. “AutoML for multi-label classification: Overview and empirical evaluation” In IEEE transactions on pattern analysis and machine intelligence 43.9 IEEE, 2021, pp. 3037–3054
  32. “Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques” San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2016
  33. “Binary Relevance for Multi-Label Learning: An Overview” In Front. Comput. Sci. 12.2 Berlin, Heidelberg: Springer-Verlag, 2018, pp. 191–202 DOI: 10.1007/s11704-017-7031-7
  34. “Binary relevance for multi-label learning: an overview” In Frontiers of Computer Science 12 Springer, 2018, pp. 191–202
  35. “A review on multi-label learning algorithms” In IEEE transactions on knowledge and data engineering 26.8 IEEE, 2013, pp. 1819–1837

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets