Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic feature learning for vulnerability prediction (1708.02368v1)

Published 8 Aug 2017 in cs.SE

Abstract: Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most likely locations of such code vulnerabilities in large code bases. Most of them rely on manually designing features (e.g. complexity metrics or frequencies of code tokens) that represent the characteristics of the code. However, all suffer from challenges in sufficiently capturing both semantic and syntactic representation of source code, an important capability for building accurate prediction models. In this paper, we describe a new approach, built upon the powerful deep learning Long Short Term Memory model, to automatically learn both semantic and syntactic features in code. Our evaluation on 18 Android applications demonstrates that the prediction power obtained from our learned features is equal or even superior to what is achieved by state of the art vulnerability prediction models: 3%--58% improvement for within-project prediction and 85% for cross-project prediction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hoa Khanh Dam (17 papers)
  2. Truyen Tran (112 papers)
  3. Trang Pham (17 papers)
  4. Shien Wee Ng (2 papers)
  5. John Grundy (127 papers)
  6. Aditya Ghose (22 papers)
Citations (95)