Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection (2401.07466v1)

Published 15 Jan 2024 in cs.SE and cs.AI

Abstract: Software, while beneficial, poses potential cybersecurity risks due to inherent vulnerabilities. Detecting these vulnerabilities is crucial, and deep learning has shown promise as an effective tool for this task due to its ability to perform well without extensive feature engineering. However, a challenge in deploying deep learning for vulnerability detection is the limited availability of training data. Recent research highlights the deep learning efficacy in diverse tasks. This success is attributed to instruction fine-tuning, a technique that remains under-explored in the context of vulnerability detection. This paper investigates the capability of models, specifically a recent LLM, to generalize beyond the programming languages used in their training data. It also examines the role of natural language instructions in enhancing this generalization. Our study evaluates the model performance on a real-world dataset to predict vulnerable code. We present key insights and lessons learned, contributing to understanding the deep learning application in software vulnerability detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Cvefixes: automated collection of vulnerabilities and their fixes from open-source software. In PROMISE, pages 30–39. ACM.
  2. Deep learning based vulnerability detection: Are we there yet? IEEE Trans. Software Eng., 48(9):3280–3296.
  3. Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. In RAID, pages 654–668. ACM.
  4. David R Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society Series B: Statistical Methodology, 20(2):215–232.
  5. Tri Dao. 2023. Flashattention-2: Faster attention with better parallelism and work partitioning. CoRR, abs/2307.08691.
  6. A C/C++ code vulnerability dataset with code changes and CVE summaries. In MSR, pages 508–512. ACM.
  7. Michael Fu and Chakkrit Tantithamthavorn. 2022. Linevul: A transformer-based line-level vulnerability prediction. In MSR, pages 608–620. ACM.
  8. Vulrepair: a t5-based automated software vulnerability repair. In ESEC/SIGSOFT FSE, pages 935–947. ACM.
  9. Unnatural instructions: Tuning language models with (almost) no human labor. In ACL (1), pages 14409–14428. Association for Computational Linguistics.
  10. Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net.
  11. Mistral 7b. CoRR, abs/2310.06825.
  12. Justin M. Johnson and Taghi M. Khoshgoftaar. 2019. Survey on deep learning with class imbalance. J. Big Data, 6:27.
  13. Vuldeepecker: A deep learning-based system for vulnerability detection. In NDSS. The Internet Society.
  14. Efficient privilege de-escalation for ad libraries in mobile apps. In MobiSys, pages 89–103. ACM.
  15. Libradar: fast and accurate detection of third-party libraries in android apps. In ICSE (Companion Volume), pages 653–656. ACM.
  16. Cross-task generalization via natural language crowdsourcing instructions. In ACL (1), pages 3470–3487. Association for Computational Linguistics.
  17. Orca 2: Teaching small language models how to reason. CoRR, abs/2311.11045.
  18. Addetect: Automated detection of android ad libraries using semantic analysis. In ISSNIP, pages 1–6. IEEE.
  19. Crossvul: a cross-language vulnerability dataset with commit data. In ESEC/SIGSOFT FSE, pages 1565–1569. ACM.
  20. Code llama: Open foundation models for code. CoRR, abs/2308.12950.
  21. Automated vulnerability detection in source code using deep representation learning. In ICMLA, pages 757–762. IEEE.
  22. Synthetic prompting: Generating chain-of-thought demonstrations for large language models. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 30706–30775. PMLR.
  23. Deepdfa: Dataflow analysis-guided efficient graph learning for vulnerability detection. CoRR, abs/2212.08108.
  24. An empirical study of deep learning models for vulnerability detection. In ICSE, pages 2237–2248. IEEE.
  25. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  26. Self-instruct: Aligning language models with self-generated instructions. In ACL (1), pages 13484–13508. Association for Computational Linguistics.
  27. Codet5+: Open code large language models for code understanding and generation. CoRR, abs/2305.07922.
  28. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In EMNLP (1), pages 8696–8708. Association for Computational Linguistics.
  29. Finetuned language models are zero-shot learners. In ICLR. OpenReview.net.
  30. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  31. ATVHUNTER: reliable version detection of third-party libraries for vulnerability identification in android applications. In ICSE, pages 1695–1707. IEEE.
  32. Libid: reliable identification of obfuscated third-party android libraries. In ISSTA, pages 55–65. ACM.
  33. Detecting third-party libraries in android applications with high precision and recall. In SANER, pages 141–152. IEEE Computer Society.
  34. D2A: A dataset built for ai-based vulnerability detection methods using differential analysis. In ICSE (SEIP), pages 111–120. IEEE.
  35. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NeurIPS, pages 10197–10207.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Imam Nur Bani Yusuf (7 papers)
  2. Lingxiao Jiang (36 papers)
Citations (2)